How I Automated My Entire Budget Workflow in Beancount with Python Scripts

The Problem: Spreadsheet Fatigue and Manual Categorization

I used to spend roughly 6 hours every month reconciling my finances. Download CSVs from four bank accounts, manually categorize each transaction, then copy everything into a spreadsheet to see if I was on track with my FIRE plan. The worst part? I would make mistakes — transposing digits, miscategorizing a restaurant charge as groceries, forgetting to log a recurring subscription. By the time I discovered errors, weeks had passed.

After switching to Beancount two years ago, I decided to automate the entire pipeline. What used to take 6 hours now takes about 15 minutes of review time. Here is my complete setup.

Step 1: Automated Bank Import with bean-extract

The foundation is the Beancount importer framework. I wrote custom importers for each of my accounts:

# importers/chase_checking.py
from beancount.ingest import importer
from beancount.core import data, amount
from beancount.core.number import D
from dateutil.parser import parse
import csv

class ChaseCheckingImporter(importer.ImporterProtocol):
    def __init__(self, account, lastfour):
        self.account = account
        self.lastfour = lastfour

    def identify(self, file):
        return "Chase" in file.head() and self.lastfour in file.name

    def extract(self, file, existing_entries=None):
        entries = []
        with open(file.name) as f:
            for row in csv.DictReader(f):
                meta = data.new_metadata(file.name, 0)
                txn = data.Transaction(
                    meta,
                    parse(row["Posting Date"]).date(),
                    "*",
                    row.get("Description", ""),
                    "",
                    frozenset(),
                    frozenset(),
                    [
                        data.Posting(self.account, amount.Amount(D(row["Amount"]), "USD"), None, None, None, None),
                    ],
                )
                entries.append(txn)
        return entries

I have similar importers for Schwab, Amex, and my credit union. A cron job runs every night:

#\!/bin/bash
# scripts/nightly_import.sh
BEAN_DIR="$HOME/finance"
DOWNLOADS="$HOME/Downloads/bank_statements"

cd "$BEAN_DIR"
bean-extract config.py "$DOWNLOADS"/*.csv >> incoming.beancount
python3 scripts/auto_categorize.py incoming.beancount

Step 2: Machine-Learning Categorization with smart_importer

This is where it gets interesting. The smart_importer plugin uses scikit-learn to predict account categories based on your historical data:

# config.py
from smart_importer import PredictPostings
from importers.chase_checking import ChaseCheckingImporter

CONFIG = [
    PredictPostings(
        ChaseCheckingImporter(
            account="Assets:Chase:Checking",
            lastfour="4821"
        )
    ),
]

After training on my first 3 months of data, it now correctly categorizes about 93% of transactions automatically. The remaining 7% are flagged for manual review — mostly unusual one-time purchases.

Step 3: Budget Enforcement with Custom Plugin

Here is the core of my budget automation — a Beancount plugin that checks spending against monthly targets:

# plugins/budget_check.py
from beancount.core import data
from collections import defaultdict
from decimal import Decimal

MONTHLY_BUDGET = {
    "Expenses:Food:Groceries": Decimal("600"),
    "Expenses:Food:Restaurants": Decimal("200"),
    "Expenses:Housing:Utilities": Decimal("250"),
    "Expenses:Transport:Gas": Decimal("150"),
    "Expenses:Entertainment": Decimal("100"),
    "Expenses:Subscriptions": Decimal("50"),
}

def budget_check(entries, options_map):
    errors = []
    monthly_totals = defaultdict(lambda: defaultdict(Decimal))

    for entry in entries:
        if isinstance(entry, data.Transaction):
            month_key = entry.date.strftime("%Y-%m")
            for posting in entry.postings:
                if posting.account in MONTHLY_BUDGET:
                    monthly_totals[month_key][posting.account] += posting.units.number

    for month, accounts in monthly_totals.items():
        for account, spent in accounts.items():
            budget = MONTHLY_BUDGET.get(account, Decimal("0"))
            if spent > budget:
                overage = spent - budget
                errors.append(data.BudgetError(
                    {"filename": "<budget>", "lineno": 0},
                    f"Over budget in {month}: {account} spent {spent} vs budget {budget} (over by {overage})",
                    None
                ))
    return entries, errors

Step 4: Weekly Email Reports

A Python script runs every Sunday that generates a spending summary:

# scripts/weekly_report.py
from beancount import loader
from beancount.query import query

entries, errors, options = loader.load_file("main.beancount")

result = query.run_query(
    entries, options,
    """
    SELECT account, sum(position) as total
    WHERE date >= 2026-02-01 AND account ~ "Expenses"
    GROUP BY account
    ORDER BY sum(position) DESC
    """
)
# Format and send via email/Slack

Results After 18 Months

Metric Before After
Monthly reconciliation time 6 hours 15 minutes
Categorization accuracy ~85% (manual) 93% (automated)
Budget overruns per quarter 4-5 0-1
Savings rate 42% 58%

The savings rate improvement is the most meaningful. Not because automation saved me money directly, but because having real-time visibility into my spending made me naturally more intentional. When you see that you have spent 80% of your restaurant budget by the 15th, you recalibrate.

What I Would Do Differently

  1. Start with fewer categories — I initially had 47 expense categories. I consolidated to 22, and the ML model improved significantly.
  2. Use bean-check in CI — I now run bean-check as a pre-commit hook so my ledger is always balanced.
  3. Version control everything — My entire finance directory is a Git repo (private, obviously). The diff history is invaluable for auditing.

Happy to answer questions about any part of this pipeline. The code examples are simplified but represent the actual structure I use daily.

This is incredibly inspiring, Fred! I have been putting off automating my imports because the whole importer framework felt intimidating. But seeing your ChaseCheckingImporter code, it is way more approachable than I expected.

A couple of questions:

  1. How do you handle duplicate transactions? My bank sometimes posts a pending transaction and then the final one shows up a day or two later with a slightly different amount (looking at you, gas station pre-authorizations). Does your nightly cron job just append duplicates, or do you have deduplication logic?

  2. The smart_importer accuracy — 93% is great, but how do you handle the corrections? When you manually fix a miscategorized transaction, does the model learn from that correction automatically on the next run?

  3. How long did it take to set up from scratch? I am a DevOps engineer so Python is my comfort zone, but I have only been using Beancount for about 3 months. Would love a realistic time estimate before I commit a weekend to this.

Also, that savings rate jump from 42% to 58% is wild. I am sitting at around 25% and feeling stuck — sounds like visibility really is the secret sauce.

One thing I would add from the DevOps world: have you considered running bean-check in a GitHub Actions workflow instead of just pre-commit? That way you catch issues even if someone (future you after a long day) bypasses the hook with --no-verify. I do this for my infrastructure-as-code repos and it has saved me multiple times.

Great writeup, Fred. Your pipeline mirrors what I have been running for about four years now, with a few differences worth mentioning.

On deduplication (answering Sarah’s question too): I use a combination of Beancount’s built-in DuplicateDetector and a custom hash based on date + amount + first 20 characters of the description. The key insight is that you want fuzzy matching, not exact matching, because banks love to slightly reformat descriptions between pending and posted transactions.

Here is the core of my dedup logic:

from beancount.ingest import similar

def is_duplicate(entry, existing):
    return similar.find_similar_entries(
        [entry], existing,
        window_days=3,
        comparator=lambda e1, e2: (
            e1.postings[0].units == e2.postings[0].units
        )
    )

On the budget plugin approach: I want to gently push back on hardcoding budget amounts in the plugin source. What happens when your grocery budget changes? You have to edit Python code and restart.

I keep my budgets in the Beancount file itself using custom directives:

2026-01-01 custom "budget" Expenses:Food:Groceries "monthly" 600 USD
2026-01-01 custom "budget" Expenses:Food:Restaurants "monthly" 200 USD

Then the plugin reads those directives dynamically. This way, changing a budget is just editing a single line in your ledger file — same workflow as everything else. No code changes needed.

One more optimization: Instead of a weekly email report, I use a Fava extension that shows a budget dashboard in real time. The feedback loop is much tighter when you can check your budget status anytime with a browser bookmark rather than waiting for Sunday’s report.

@newbie_accountant — to answer your setup time question from experience: expect about 2 solid weekends for the basic pipeline (importers + auto-categorization), then another weekend to tune the ML model once you have enough history. Totally worth the investment.

The automation is impressive, Frederick, but I want to flag something from the tax side that people often overlook when building these pipelines.

Category accuracy matters for taxes, not just budgets. When your ML model miscategorizes a business meal as personal dining, or classifies a home office supply purchase under general shopping, you are potentially losing deductions. At 93% accuracy, that remaining 7% could include transactions with real tax implications.

My recommendation: add a separate validation layer specifically for tax-sensitive categories:

TAX_SENSITIVE_ACCOUNTS = [
    "Expenses:Business:",
    "Expenses:Medical:",
    "Expenses:Education:",
    "Expenses:Charity:",
    "Expenses:HomeOffice:",
]

def flag_tax_sensitive(entries, options_map):
    """Flag auto-categorized transactions in tax-sensitive accounts."""
    errors = []
    for entry in entries:
        if isinstance(entry, data.Transaction):
            if entry.meta.get("auto_categorized", False):
                for posting in entry.postings:
                    if any(posting.account.startswith(prefix) 
                           for prefix in TAX_SENSITIVE_ACCOUNTS):
                        entry.meta["review_needed"] = True
    return entries, errors

Also, for anyone using this for a side business or freelance income: the IRS requires that you maintain contemporaneous records. Having a Git history of when each transaction was recorded is actually excellent documentation if you ever face an audit. Just make sure your commit timestamps are reasonably close to the transaction dates.

One more thing: if you are claiming the home office deduction, your budget categories should separate business utilities from personal ones. A single Expenses:Housing:Utilities account will make your Schedule C a nightmare to prepare.