Hey everyone! I have been using Beancount for about 3 months now and I am starting to get really frustrated with manually categorizing every single transaction. I am a DevOps engineer by day, so I am totally comfortable writing scripts, but I want to make sure I am doing this the “right” Beancount way before I build something from scratch.
My Current Pain Point
Every week I download CSVs from my bank, run bean-extract, and then manually go through 40-60 transactions to assign the correct expense accounts. Some of these are SO obvious:
- “SPOTIFY” is always Expenses:Subscriptions:Music
- “SHELL OIL” is always Expenses:Transport:Gas
- “WHOLE FOODS” is always Expenses:Food:Groceries
- “COMCAST” is always Expenses:Housing:Internet
But I am still typing these out (or copy-pasting) every single time. There has to be a better way, right?
What I Have Tried So Far
Approach 1: Hardcoded Rules in My Importer
I added a simple mapping dictionary to my importer:
CATEGORIZATION_RULES = {
"SPOTIFY": "Expenses:Subscriptions:Music",
"NETFLIX": "Expenses:Subscriptions:Streaming",
"SHELL OIL": "Expenses:Transport:Gas",
"CHEVRON": "Expenses:Transport:Gas",
"WHOLE FOODS": "Expenses:Food:Groceries",
"TRADER JOE": "Expenses:Food:Groceries",
"COMCAST": "Expenses:Housing:Internet",
"PG&E": "Expenses:Housing:Utilities",
}
def categorize(description):
desc_upper = description.upper()
for keyword, account in CATEGORIZATION_RULES.items():
if keyword in desc_upper:
return account
return "Expenses:Uncategorized"
This works for the obvious stuff, but it has problems:
- Substring matching is fragile. “SHELL” matches “SHELL OIL” but also “MICHELLE” (my friend who Venmos me).
- The list keeps growing. I am at 35 rules and counting. It feels like it is going to become unmaintainable.
- Some merchants use different names. The same restaurant might show up as “SQ *BURGERPLACE”, “SQUARE BURGERPLACE”, or “BURGERPLACE INC”.
Approach 2: I Read About smart_importer
I saw the smart_importer package mentioned in a few posts here. It uses machine learning to predict categories. But I only have 3 months of data — is that enough for ML to work? And honestly, the idea of a black-box ML model making financial decisions makes me a little nervous.
What I Want
Ideally, I would love a system that:
- Has a rules engine for the obvious stuff (exact/regex matches)
- Has a learning component that gets smarter over time
- Flags uncertain categorizations for manual review instead of silently miscategorizing
- Works with the standard Beancount importer framework
Has anyone built something like this? What is the recommended approach in the community?
My Environment
- Beancount v2 (Python)
- Three bank accounts (checking, savings, credit card)
- About 150-200 transactions per month
- Running on macOS, comfortable with Python, familiar with basic ML concepts
- Using Fava for viewing but not for data entry
Any guidance would be hugely appreciated! I know I could hack something together in a weekend, but I would rather learn from people who have been doing this for years and know the pitfalls.
Thanks in advance!