Setting Up Automatic Transaction Categorization Rules in Beancount

Hey everyone! I have been using Beancount for about 3 months now and I am starting to get really frustrated with manually categorizing every single transaction. I am a DevOps engineer by day, so I am totally comfortable writing scripts, but I want to make sure I am doing this the “right” Beancount way before I build something from scratch.

My Current Pain Point

Every week I download CSVs from my bank, run bean-extract, and then manually go through 40-60 transactions to assign the correct expense accounts. Some of these are SO obvious:

  • “SPOTIFY” is always Expenses:Subscriptions:Music
  • “SHELL OIL” is always Expenses:Transport:Gas
  • “WHOLE FOODS” is always Expenses:Food:Groceries
  • “COMCAST” is always Expenses:Housing:Internet

But I am still typing these out (or copy-pasting) every single time. There has to be a better way, right?

What I Have Tried So Far

Approach 1: Hardcoded Rules in My Importer

I added a simple mapping dictionary to my importer:

CATEGORIZATION_RULES = {
    "SPOTIFY": "Expenses:Subscriptions:Music",
    "NETFLIX": "Expenses:Subscriptions:Streaming",
    "SHELL OIL": "Expenses:Transport:Gas",
    "CHEVRON": "Expenses:Transport:Gas",
    "WHOLE FOODS": "Expenses:Food:Groceries",
    "TRADER JOE": "Expenses:Food:Groceries",
    "COMCAST": "Expenses:Housing:Internet",
    "PG&E": "Expenses:Housing:Utilities",
}

def categorize(description):
    desc_upper = description.upper()
    for keyword, account in CATEGORIZATION_RULES.items():
        if keyword in desc_upper:
            return account
    return "Expenses:Uncategorized"

This works for the obvious stuff, but it has problems:

  1. Substring matching is fragile. “SHELL” matches “SHELL OIL” but also “MICHELLE” (my friend who Venmos me).
  2. The list keeps growing. I am at 35 rules and counting. It feels like it is going to become unmaintainable.
  3. Some merchants use different names. The same restaurant might show up as “SQ *BURGERPLACE”, “SQUARE BURGERPLACE”, or “BURGERPLACE INC”.

Approach 2: I Read About smart_importer

I saw the smart_importer package mentioned in a few posts here. It uses machine learning to predict categories. But I only have 3 months of data — is that enough for ML to work? And honestly, the idea of a black-box ML model making financial decisions makes me a little nervous.

What I Want

Ideally, I would love a system that:

  1. Has a rules engine for the obvious stuff (exact/regex matches)
  2. Has a learning component that gets smarter over time
  3. Flags uncertain categorizations for manual review instead of silently miscategorizing
  4. Works with the standard Beancount importer framework

Has anyone built something like this? What is the recommended approach in the community?

My Environment

  • Beancount v2 (Python)
  • Three bank accounts (checking, savings, credit card)
  • About 150-200 transactions per month
  • Running on macOS, comfortable with Python, familiar with basic ML concepts
  • Using Fava for viewing but not for data entry

Any guidance would be hugely appreciated! I know I could hack something together in a weekend, but I would rather learn from people who have been doing this for years and know the pitfalls.

Thanks in advance!

Sarah, you are asking exactly the right questions at exactly the right time. Three months in is when most people hit this wall, and your instinct to not just hack something together is good.

Here is my recommended three-tier approach, which I have been refining for four years:

Tier 1: Regex rules for high-confidence matches

Instead of simple substring matching, use regex with word boundaries:

import re

RULES = [
    (re.compile(r"\bSPOTIFY\b", re.I), "Expenses:Subscriptions:Music"),
    (re.compile(r"\bNETFLIX\b", re.I), "Expenses:Subscriptions:Streaming"),
    (re.compile(r"\b(SHELL|CHEVRON|BP|EXXON)\s*(OIL|GAS)?\b", re.I), "Expenses:Transport:Gas"),
    (re.compile(r"\b(WHOLE FOODS|TRADER JOE|SAFEWAY|KROGER)\b", re.I), "Expenses:Food:Groceries"),
    (re.compile(r"\bSQ\s*\*?\s*", re.I), None),  # Strip Square prefix, re-match
]

def categorize(description):
    for pattern, account in RULES:
        if pattern.search(description):
            if account:
                return account
    return None  # Fall through to Tier 2

This solves your SHELL vs MICHELLE problem. The word boundary anchor \b prevents false matches.

Tier 2: smart_importer for learned predictions

Three months of data IS enough to start with smart_importer. It uses a TF-IDF vectorizer on transaction descriptions, so even with limited data it can learn patterns. The key is to set a confidence threshold:

from smart_importer import PredictPostings

CONFIG = [
    PredictPostings(
        your_importer,
        predict_payees=False,
        predict_postings=True,
        overwrite=False,  # Never overwrite existing categorizations
    ),
]

The model improves as you add more data. After 6 months, you will see a real jump in accuracy.

Tier 3: Flag everything uncertain

For transactions that neither tier catches confidently, route them to Expenses:TODO:

2026-02-10 ! "Unknown Merchant" "Uncategorized transaction"
  Expenses:TODO    45.99 USD
  Assets:Checking

The ! flag marks it as pending review. Fava highlights these prominently so you do not forget. You can also run bean-query to find all TODO items:

bean-query main.beancount "SELECT date, narration, position WHERE account = 'Expenses:TODO'"

This three-tier approach gives you deterministic rules where possible, ML learning where appropriate, and explicit flagging for everything else. No silent miscategorizations.

Mike’s three-tier system is solid, but I want to push back slightly on using smart_importer at 3 months.

Unpopular opinion: you do not need ML for transaction categorization unless you have 500+ unique merchants. For most personal finance use cases, a well-maintained rules file covers 95%+ of transactions. ML adds complexity (scikit-learn dependency, model training time, debugging prediction errors) for marginal gains.

What I do instead is maintain a TSV file that maps merchant patterns to accounts:

# categorization_rules.tsv
PATTERN	ACCOUNT	CONFIDENCE
^SPOTIFY	Expenses:Subscriptions:Music	high
^SQ \*	Expenses:Food:Restaurants	medium
AMAZON	Expenses:Shopping:Online	low
^TRANSFER	SKIP	high

Then my importer reads this file and applies rules in order. The CONFIDENCE column determines whether to auto-assign or flag for review. “high” means auto-assign, “medium” means assign but mark with !, “low” means route to Expenses:TODO.

The advantage of a TSV file over Python code: your non-programmer partner or accountant can edit it. Try asking someone to modify a regex in a Python class versus editing a spreadsheet.

One thing we all agree on: never let any automation silently miscategorize a transaction. The cost of reviewing a flagged transaction is 5 seconds. The cost of discovering a miscategorization during tax prep is much higher.

@newbie_accountant — with 150-200 transactions a month and three accounts, I bet you have maybe 60-70 unique merchants. A rules file of 70 entries is perfectly maintainable. Start there, and only graduate to ML if it becomes insufficient.

Adding a practical perspective from someone who does this professionally for multiple clients.

The biggest categorization challenge is not the matching logic — it is the edge cases. Amazon is the classic example. Is this Amazon purchase groceries (Amazon Fresh), household supplies, electronics, or a business expense? The merchant name is just “AMAZON” or “AMZN” regardless.

My solution for clients: use the transaction amount as a secondary signal.

AMOUNT_RULES = {
    "AMAZON": [
        (lambda amt: 10 <= amt <= 30, "Expenses:Food:Groceries"),      # Likely Amazon Fresh
        (lambda amt: amt < 10, "Expenses:Shopping:Misc"),              # Small purchases
        (lambda amt: amt > 200, "Expenses:Shopping:Electronics"),      # Big ticket items
        (lambda amt: True, "Expenses:Shopping:Online"),                # Default
    ],
}

This is not perfect, but it is better than routing all Amazon transactions to one bucket. The important thing is to still flag these as medium-confidence so you review them.

Another tip for your importer setup: normalize merchant names BEFORE applying rules. Banks format merchant names inconsistently. Build a normalization function:

def normalize_merchant(raw_name):
    name = raw_name.upper().strip()
    # Remove common prefixes
    for prefix in ["SQ *", "TST*", "PP*", "PAYPAL *", "VENMO *"]:
        if name.startswith(prefix):
            name = name[len(prefix):].strip()
    # Remove trailing numbers (store IDs)
    name = re.sub(r"\s*#?\d+$", "", name)
    return name

This way “SQ *BURGERPLACE”, “BURGERPLACE #1234”, and “BURGERPLACE” all normalize to “BURGERPLACE” and hit the same rule.

I process about 3,000 transactions per month across all my clients. Without this normalization step, the rules file would be 3x larger.