The Problem: Spreadsheet Fatigue and Manual Categorization
I used to spend roughly 6 hours every month reconciling my finances. Download CSVs from four bank accounts, manually categorize each transaction, then copy everything into a spreadsheet to see if I was on track with my FIRE plan. The worst part? I would make mistakes — transposing digits, miscategorizing a restaurant charge as groceries, forgetting to log a recurring subscription. By the time I discovered errors, weeks had passed.
After switching to Beancount two years ago, I decided to automate the entire pipeline. What used to take 6 hours now takes about 15 minutes of review time. Here is my complete setup.
Step 1: Automated Bank Import with bean-extract
The foundation is the Beancount importer framework. I wrote custom importers for each of my accounts:
# importers/chase_checking.py
from beancount.ingest import importer
from beancount.core import data, amount
from beancount.core.number import D
from dateutil.parser import parse
import csv
class ChaseCheckingImporter(importer.ImporterProtocol):
def __init__(self, account, lastfour):
self.account = account
self.lastfour = lastfour
def identify(self, file):
return "Chase" in file.head() and self.lastfour in file.name
def extract(self, file, existing_entries=None):
entries = []
with open(file.name) as f:
for row in csv.DictReader(f):
meta = data.new_metadata(file.name, 0)
txn = data.Transaction(
meta,
parse(row["Posting Date"]).date(),
"*",
row.get("Description", ""),
"",
frozenset(),
frozenset(),
[
data.Posting(self.account, amount.Amount(D(row["Amount"]), "USD"), None, None, None, None),
],
)
entries.append(txn)
return entries
I have similar importers for Schwab, Amex, and my credit union. A cron job runs every night:
#\!/bin/bash
# scripts/nightly_import.sh
BEAN_DIR="$HOME/finance"
DOWNLOADS="$HOME/Downloads/bank_statements"
cd "$BEAN_DIR"
bean-extract config.py "$DOWNLOADS"/*.csv >> incoming.beancount
python3 scripts/auto_categorize.py incoming.beancount
Step 2: Machine-Learning Categorization with smart_importer
This is where it gets interesting. The smart_importer plugin uses scikit-learn to predict account categories based on your historical data:
# config.py
from smart_importer import PredictPostings
from importers.chase_checking import ChaseCheckingImporter
CONFIG = [
PredictPostings(
ChaseCheckingImporter(
account="Assets:Chase:Checking",
lastfour="4821"
)
),
]
After training on my first 3 months of data, it now correctly categorizes about 93% of transactions automatically. The remaining 7% are flagged for manual review — mostly unusual one-time purchases.
Step 3: Budget Enforcement with Custom Plugin
Here is the core of my budget automation — a Beancount plugin that checks spending against monthly targets:
# plugins/budget_check.py
from beancount.core import data
from collections import defaultdict
from decimal import Decimal
MONTHLY_BUDGET = {
"Expenses:Food:Groceries": Decimal("600"),
"Expenses:Food:Restaurants": Decimal("200"),
"Expenses:Housing:Utilities": Decimal("250"),
"Expenses:Transport:Gas": Decimal("150"),
"Expenses:Entertainment": Decimal("100"),
"Expenses:Subscriptions": Decimal("50"),
}
def budget_check(entries, options_map):
errors = []
monthly_totals = defaultdict(lambda: defaultdict(Decimal))
for entry in entries:
if isinstance(entry, data.Transaction):
month_key = entry.date.strftime("%Y-%m")
for posting in entry.postings:
if posting.account in MONTHLY_BUDGET:
monthly_totals[month_key][posting.account] += posting.units.number
for month, accounts in monthly_totals.items():
for account, spent in accounts.items():
budget = MONTHLY_BUDGET.get(account, Decimal("0"))
if spent > budget:
overage = spent - budget
errors.append(data.BudgetError(
{"filename": "<budget>", "lineno": 0},
f"Over budget in {month}: {account} spent {spent} vs budget {budget} (over by {overage})",
None
))
return entries, errors
Step 4: Weekly Email Reports
A Python script runs every Sunday that generates a spending summary:
# scripts/weekly_report.py
from beancount import loader
from beancount.query import query
entries, errors, options = loader.load_file("main.beancount")
result = query.run_query(
entries, options,
"""
SELECT account, sum(position) as total
WHERE date >= 2026-02-01 AND account ~ "Expenses"
GROUP BY account
ORDER BY sum(position) DESC
"""
)
# Format and send via email/Slack
Results After 18 Months
| Metric | Before | After |
|---|---|---|
| Monthly reconciliation time | 6 hours | 15 minutes |
| Categorization accuracy | ~85% (manual) | 93% (automated) |
| Budget overruns per quarter | 4-5 | 0-1 |
| Savings rate | 42% | 58% |
The savings rate improvement is the most meaningful. Not because automation saved me money directly, but because having real-time visibility into my spending made me naturally more intentional. When you see that you have spent 80% of your restaurant budget by the 15th, you recalibrate.
What I Would Do Differently
- Start with fewer categories — I initially had 47 expense categories. I consolidated to 22, and the ML model improved significantly.
- Use bean-check in CI — I now run
bean-checkas a pre-commit hook so my ledger is always balanced. - Version control everything — My entire finance directory is a Git repo (private, obviously). The diff history is invaluable for auditing.
Happy to answer questions about any part of this pipeline. The code examples are simplified but represent the actual structure I use daily.