One of the most time-consuming parts of bank transaction imports is categorization. After helping numerous clients set up Beancount with automated imports, I’ve developed a systematic approach to building categorization rules that handle most transactions automatically.
The Categorization Challenge
When you pull transactions from Plaid (or any bank feed), you get:
- A payee name (often cryptic: “CHECKCARD 0127 WHOLEFDS MKT #10847”)
- An amount
- A date
- Plaid’s AI-suggested category (helpful but not always accurate)
Your goal is to map these to proper Beancount accounts like Expenses:Food:Groceries.
My Categorization Strategy
Level 1: Exact Match Rules (Highest Priority)
For recurring, predictable transactions:
# Monthly subscriptions - exact amounts help identify
NETFLIX.COM|Expenses:Entertainment:Streaming
SPOTIFY.*|Expenses:Entertainment:Streaming
GITHUB.*|Expenses:Software:Development
# Utilities with predictable names
XCEL ENERGY.*|Expenses:Utilities:Electric
COMCAST.*|Expenses:Utilities:Internet
Level 2: Pattern-Based Rules
For common vendors with variable transaction descriptions:
# Groceries - multiple store formats
WHOLE FOODS.*|Expenses:Food:Groceries
WHOLEFDS.*|Expenses:Food:Groceries
TRADER JOE.*|Expenses:Food:Groceries
KROGER.*|Expenses:Food:Groceries
SAFEWAY.*|Expenses:Food:Groceries
# Gas stations
SHELL.*|Expenses:Transportation:Fuel
CHEVRON.*|Expenses:Transportation:Fuel
EXXON.*|Expenses:Transportation:Fuel
BP.*GAS.*|Expenses:Transportation:Fuel
# Amazon - tricky because it could be anything
# Use subcategories based on common patterns
AMAZON.COM.*AMZN.*|Expenses:Shopping:Online
AMAZON PRIME.*|Expenses:Entertainment:Streaming
AMZN MKTP.*|Expenses:Shopping:Online
Level 3: Category Fallbacks
For less common transactions, use broader patterns:
# Restaurants - catch-all after specific favorites
DOORDASH.*|Expenses:Food:Delivery
UBER EATS.*|Expenses:Food:Delivery
GRUBHUB.*|Expenses:Food:Delivery
.*RESTAURANT.*|Expenses:Food:DiningOut
.*CAFE.*|Expenses:Food:DiningOut
.*PIZZA.*|Expenses:Food:DiningOut
# Generic retail
.*PHARMACY.*|Expenses:Health:Pharmacy
.*HARDWARE.*|Expenses:Home:Maintenance
Advanced Techniques
Using Plaid Categories as Hints
Plaid provides categories like “Food and Drink > Restaurants”. You can use these in your template:
{date} * "{payee}"
plaid_category: "{category}"
{account} {amount} {currency}
{posting_account}
Then write a post-processing script that uses uncategorized transactions with Plaid hints:
def suggest_account_from_plaid_category(plaid_cat):
mapping = {
"Food and Drink > Restaurants": "Expenses:Food:DiningOut",
"Food and Drink > Groceries": "Expenses:Food:Groceries",
"Travel > Airlines": "Expenses:Travel:Flights",
"Transfer > Payroll": "Income:Salary",
}
return mapping.get(plaid_cat, "Expenses:Uncategorized")
Handling Regex Edge Cases
Some payee names are tricky. Here’s a debugging approach:
import re
test_payees = [
"CHECKCARD 0127 WHOLEFDS MKT #10847 CA",
"POS DEBIT VISA CHECKCARD 0814 SHELL OIL 57442",
"ELECTRONIC/ACH DEBIT XCEL ENERGY",
]
rules = [
(r"WHOLEFDS|WHOLE FOODS", "Expenses:Food:Groceries"),
(r"SHELL.*|CHEVRON.*", "Expenses:Transportation:Fuel"),
(r"XCEL ENERGY", "Expenses:Utilities:Electric"),
]
for payee in test_payees:
matched = False
for pattern, account in rules:
if re.search(pattern, payee, re.IGNORECASE):
print(f"{payee[:40]:40} -> {account}")
matched = True
break
if not matched:
print(f"{payee[:40]:40} -> UNCATEGORIZED")
Split Transaction Handling
Some transactions need manual splitting (e.g., Target where you bought groceries AND household items). I mark these for review:
# Big box stores - flag for manual review
TARGET.*|Expenses:Shopping:BigBox:NeedsReview
WALMART.*|Expenses:Shopping:BigBox:NeedsReview
COSTCO.*|Expenses:Shopping:BigBox:NeedsReview
Then I have a weekly task to review anything in :NeedsReview accounts and properly split them.
My Categorization Accuracy Over Time
After building rules for 6 months:
- Month 1: ~60% auto-categorized correctly
- Month 3: ~80% auto-categorized correctly
- Month 6: ~92% auto-categorized correctly
The key is reviewing uncategorized transactions weekly and adding new rules as patterns emerge.
Questions for Discussion
- How do others handle the “Amazon could be anything” problem?
- Anyone using machine learning for categorization (like smart_importer)?
- What’s your strategy for handling split transactions from big box stores?
Would love to hear how others approach this challenge!