I’ve been using Beancount for 3 years now to track my journey to FIRE, and one of the biggest initial hurdles was getting my financial data into Beancount in the first place. Not every bank plays nice with standard formats.
The Problem with “Weird” Data Sources
When I first started, my local credit union provided CSV exports, but they were… special. Columns in random order, date formats that changed between exports, transaction descriptions split across multiple fields, and my favorite: negative numbers for deposits (yes, really).
The standard importers couldn’t handle it. I tried tweaking CSV configurations, but eventually realized I needed to write a custom importer.
Why Standard Importers Fail
Most institutions provide data in one of these formats:
- OFX/QFX: The gold standard - structured, includes balances, rarely breaks
- CSV: The wild west - every bank does it differently
- PDF statements: The nightmare - presentation format, not data format
- Proprietary exports: Good luck with that
When you hit a “weird” data source, you’re usually dealing with CSV quirks or PDF-only statements. Here’s what I’ve encountered:
Credit Union CSV Quirks
- Non-standard column names (“Trans Date” vs “Date” vs “Posted Date”)
- Multiple date formats in the same file
- Missing or inconsistent transaction IDs
- Description fields that span multiple columns
Investment Platform PDF Statements
- No CSV export at all
- Tables with varying column layouts
- Multi-line transactions
- Summary sections mixed with transaction data
International Banks
- Date formats (DD/MM/YYYY vs MM/DD/YYYY)
- Currency symbols embedded in amounts
- Unicode characters in descriptions
- Time zones that affect posting dates
Building a Custom Importer: The Framework Approach
The good news: You don’t have to start from scratch. The beancount-reds-importers framework makes this much easier.
Here’s the basic structure:
from beancount_reds_importers.libreader import csvreader
from beancount.core import data, amount
import re
class WeirdCreditUnionImporter(csvreader.Importer):
def initialize_reader(self, file):
self.reader = csvreader.Reader(file.name)
self.reader.set_header_map({
'Trans Date': 'date',
'Description': 'payee',
'Amount': 'amount'
})
def parse_date(self, date_str):
# Handle their weird date format
return datetime.strptime(date_str, '%m-%d-%y').date()
def parse_amount(self, amount_str):
# Fix their backwards negative amounts
clean = amount_str.replace('$', '').replace(',', '')
value = Decimal(clean)
# They mark deposits as negative, so flip the sign
return -value if 'DEP' in self.current_row['Description'] else value
This isn’t a complete importer, but it shows the key parts: header mapping, date parsing, and amount handling with institution-specific quirks.
PDF Statements: The Nuclear Option
For institutions that only provide PDF statements, you need to extract text first. I’ve had success with pdfplumber:
import pdfplumber
def extract_transactions_from_pdf(pdf_path):
transactions = []
with pdfplumber.open(pdf_path) as pdf:
for page in pdf.pages:
# Extract tables
tables = page.extract_tables()
for table in tables:
for row in table:
if looks_like_transaction(row):
transactions.append(parse_row(row))
return transactions
Fair warning: PDFs are fragile. The layout changes, your importer breaks. This is why we beg institutions for OFX or at least clean CSV.
Testing Your Importer
The bean-identify and bean-extract commands are your best friends:
# Test if your importer recognizes the file
bean-identify importers.py statement.csv
# Extract transactions
bean-extract importers.py statement.csv > output.beancount
# Review before importing
cat output.beancount
I always do a manual review of extracted transactions before merging them into my main ledger. Trust, but verify.
Pro Tips
- Start simple: Get one statement working before adding features
- Version control your importers: They’re code, treat them like code
- Document quirks: Future you will forget why you added that regex
- Build test cases: Save sample statements for regression testing
- Consider the maintenance burden: Sometimes manual entry is faster than maintaining a brittle importer
The Community Needs You
If you’ve built custom importers, please share them on GitHub. I’ve learned so much from others’ importers for institutions I don’t even use - they taught me patterns and approaches.
I’m putting my importers (anonymized) on GitHub this week. If there’s interest, I can do a follow-up post on testing strategies and handling edge cases.
What’s the weirdest data source you’ve had to import from? Anyone dealt with POS systems like Square or international banks with multi-currency statements?