The automation is impressive, Frederick, but I want to flag something from the tax side that people often overlook when building these pipelines.
Category accuracy matters for taxes, not just budgets. When your ML model miscategorizes a business meal as personal dining, or classifies a home office supply purchase under general shopping, you are potentially losing deductions. At 93% accuracy, that remaining 7% could include transactions with real tax implications.
My recommendation: add a separate validation layer specifically for tax-sensitive categories:
TAX_SENSITIVE_ACCOUNTS = [
"Expenses:Business:",
"Expenses:Medical:",
"Expenses:Education:",
"Expenses:Charity:",
"Expenses:HomeOffice:",
]
def flag_tax_sensitive(entries, options_map):
"""Flag auto-categorized transactions in tax-sensitive accounts."""
errors = []
for entry in entries:
if isinstance(entry, data.Transaction):
if entry.meta.get("auto_categorized", False):
for posting in entry.postings:
if any(posting.account.startswith(prefix)
for prefix in TAX_SENSITIVE_ACCOUNTS):
entry.meta["review_needed"] = True
return entries, errors
Also, for anyone using this for a side business or freelance income: the IRS requires that you maintain contemporaneous records. Having a Git history of when each transaction was recorded is actually excellent documentation if you ever face an audit. Just make sure your commit timestamps are reasonably close to the transaction dates.
One more thing: if you are claiming the home office deduction, your budget categories should separate business utilities from personal ones. A single Expenses:Housing:Utilities account will make your Schedule C a nightmare to prepare.