After two years of refining my Beancount importers, I’ve finally achieved what I call “90/10 automation” - 90% of my transactions auto-categorize correctly, and I only manually review 10%. Here’s how I built a system that learns from my historical decisions.
The Problem: Starting from Zero
When I first started with Beancount, every transaction needed manual categorization:
2025-01-15 * "AMAZON" ""
Assets:Bank:Checking -47.23 USD
Expenses:Uncategorized ; What category?
I was spending 30+ minutes per month just categorizing transactions. With 100+ transactions monthly, that’s a lot of tedious work.
The Solution: Layered Categorization
I now use a three-layer approach that progressively handles transactions:
Layer 1: Deterministic Rules (The Foundation)
For payees I ALWAYS know the category, I use explicit rules:
# In my importer config
DETERMINISTIC_RULES = {
# Exact matches - these never change
"SPOTIFY": "Expenses:Subscriptions:Music",
"NETFLIX": "Expenses:Subscriptions:Streaming",
"VERIZON WIRELESS": "Expenses:Phone",
"COMCAST": "Expenses:Internet",
"GEICO": "Expenses:Insurance:Auto",
# Rent and mortgage - critical to get right
"AVALON APARTMENTS": "Expenses:Housing:Rent",
# Payroll - always the same
"ACME CORP PAYROLL": "Income:Salary",
}
def categorize_deterministic(payee):
"""Apply exact-match rules first."""
payee_upper = payee.upper()
for pattern, account in DETERMINISTIC_RULES.items():
if pattern in payee_upper:
return account
return None
These rules are “set in stone” - they override everything else because I know they’re 100% correct.
Layer 2: Pattern Matching (The Workhorse)
For categories with variation, I use regex patterns:
import re
PATTERN_RULES = [
# Groceries - multiple stores
(r'WHOLE\s*FOODS|TRADER\s*JOE|SAFEWAY|KROGER|PUBLIX',
'Expenses:Food:Groceries'),
# Gas stations
(r'SHELL|CHEVRON|EXXON|BP\s|MOBIL|COSTCO\s*GAS',
'Expenses:Auto:Gas'),
# Restaurants - harder to enumerate
(r'DOORDASH|UBER\s*EATS|GRUBHUB|POSTMATES',
'Expenses:Food:Delivery'),
# General restaurant pattern
(r'CAFE|BISTRO|GRILL|PIZZA|SUSHI|BURGER|TACO|BBQ',
'Expenses:Food:Restaurant'),
# Amazon is tricky - could be anything
# Leave for ML layer
]
def categorize_pattern(payee):
"""Apply regex pattern matching."""
for pattern, account in PATTERN_RULES:
if re.search(pattern, payee, re.IGNORECASE):
return account
return None
Layer 3: Machine Learning (The Smart Layer)
For everything else, I use smart_importer to learn from my historical categorizations:
from smart_importer import PredictPostings, PredictPayees
from smart_importer.pipelines import get_pipeline
class MyBankImporter:
# ... importer setup ...
# Wrap with ML decorators
@PredictPostings()
@PredictPayees()
class SmartBankImporter(MyBankImporter):
pass
The smart_importer learns from your existing ledger. Key insight: it only needs 2-3 examples per category to start making decent predictions.
Training Your ML Model
The Bootstrap Problem
When you’re starting fresh, you have no training data. Here’s my approach:
Week 1-4: Manual categorization
- Categorize everything by hand
- Be consistent with your account names
- This builds your training set
Month 2+: Enable ML
- Turn on smart_importer
- It learns from your month of data
- Review predictions and correct mistakes
Month 3+: Refinement
- Add deterministic rules for 100% certain categories
- ML handles the ambiguous cases
- Accuracy improves as data grows
Minimum Viable Training Data
# You need at least 2-3 examples per category
# Example: To categorize "AMAZON" correctly, you need:
2024-01-15 * "AMAZON" "Household supplies"
Expenses:Shopping:Online 45.00 USD
Liabilities:CreditCard
2024-02-20 * "AMAZON" "Books"
Expenses:Shopping:Online 23.99 USD
Liabilities:CreditCard
# Now smart_importer can predict future AMAZON transactions
My Complete Importer Stack
Here’s how I combine all three layers:
from beancount.ingest import importer
from smart_importer import PredictPostings
class BankImporter(importer.ImporterProtocol):
def extract(self, file, existing_entries):
transactions = self.parse_csv(file)
entries = []
for txn in transactions:
# Layer 1: Deterministic rules
account = categorize_deterministic(txn['payee'])
# Layer 2: Pattern matching
if account is None:
account = categorize_pattern(txn['payee'])
# Layer 3: Leave blank for ML
# smart_importer will fill this in
if account is None:
account = 'Expenses:Uncategorized'
entry = self.create_transaction(txn, account)
entries.append(entry)
return entries
# Wrap with ML prediction
@PredictPostings()
class SmartBankImporter(BankImporter):
pass
Measuring Success
I track my categorization accuracy monthly:
2025-01-31 custom "import-metrics" "January"
transactions-imported: 127
auto-categorized-correct: 114 ; 89.8%
auto-categorized-wrong: 8
manual-categorization: 5
accuracy-rate: 89.8%
My target is 90%+ accuracy. When I dip below, I review what’s failing and add rules.
Common Gotchas
Gotcha 1: Payee Name Variations
Banks show the same merchant differently:
AMAZON.COM*123ABC
AMZN MKTP US*456
AMAZON PRIME*789
Solution: Normalize payees before matching:
def normalize_payee(payee):
"""Standardize payee names."""
payee = payee.upper()
# Remove transaction IDs
payee = re.sub(r'\*[A-Z0-9]+$', '', payee)
# Common variations
payee = re.sub(r'AMZN|AMAZON\.COM', 'AMAZON', payee)
return payee.strip()
Gotcha 2: Context Matters
“COSTCO” could be:
- Groceries (food purchase)
- Gas (Costco gas station)
- Membership fee
Smart_importer handles this better than rules because it learns from amount patterns and timing.
Gotcha 3: One-Time Expenses
ML struggles with one-off transactions. For big purchases, I just accept manual categorization.
The 90/10 Goal
My realistic targets:
| Transaction Type | Handling | % of Total |
|---|---|---|
| Recurring bills | Deterministic rules | 30% |
| Common merchants | Pattern matching | 35% |
| Variable spending | ML prediction | 25% |
| One-off/unusual | Manual review | 10% |
If you’re spending more than 10 minutes per month on categorization, you haven’t trained your importers enough.
Questions for Discussion
-
What’s your current categorization accuracy? I’m curious if others are hitting 90%+
-
Anyone using LLMs for categorization? I’ve seen Beanborg uses ChatGPT as a fallback - worth the API cost?
-
How do you handle splits? When one transaction needs multiple categories (like Costco groceries + household goods), what’s your approach?
Would love to hear other strategies for reducing manual categorization work!