Just finished my 2024 taxes using a workflow that combines Beancount with Claude for analysis and report generation. Sharing what worked in case others want to try it.
The Problem
Every tax season I’d spend 8-10 hours:
- Categorizing gray-area expenses
- Calculating home office deduction
- Pulling together Schedule C numbers
- Double-checking everything for audit triggers
This year I tried using LLMs to speed things up.
My Workflow
Step 1: Export Beancount Data
BQL queries to extract everything the LLM needs:
SELECT date, narration, account, position
WHERE account ~ "Expenses" AND year = 2024
Export to JSON for easier LLM parsing.
Step 2: Claude for Analysis
I use Claude 3.5 Sonnet for:
Expense categorization review
- “Review these transactions and flag any that might be miscategorized for tax purposes”
- Found 12 expenses I had wrong (e.g., software subscription I marked as Office Supplies should be Computer & Internet)
Deduction optimization
- “Given these business expenses, identify any potential deductions I might be missing”
- Suggested I could deduct a portion of my phone bill (hadn’t thought of it)
Audit risk assessment
- “Flag any expenses that might trigger IRS scrutiny”
- Identified a $3,000 client dinner that needs better documentation
Step 3: Generate Reports
Ask Claude to format Schedule C categories:
Based on this expense data, generate a summary for Schedule C including:
- Line 8: Advertising
- Line 17: Legal and professional services
- Line 18: Office expense
...
Results
- Time spent: ~3 hours (down from 8-10)
- Found ~$2,000 in deductions I would have missed
- Much more confident in categorizations
Tools Used
- TaxGPT - I also tested this. It uses GPT-4o and Claude 3.5, specifically trained on tax law. Less hallucination risk than raw ChatGPT.
- Claude 3 Opus - Testing showed it performs best for tax return review tasks (per some Medium articles I found)
Concerns & Caveats
Hallucination risk is real. One study from Stanford Law found GPT-4 does well on tax questions but isn’t perfect. I verify everything Claude suggests against IRS publications.
This is NOT tax advice. I still have a CPA review my return. The LLM is for initial analysis and organization, not final decisions.
Privacy - Yes, I’m sending financial data to Anthropic. I’m comfortable with their privacy policy, but you might not be. Local LLMs are an alternative.
Questions for the Community
- Anyone using the beancount.io IRS audit preparation guide? How does it compare?
- Better ways to structure the BQL → LLM pipeline?
- Any Beancount plugins specifically for tax reporting?
Would love to hear other approaches.