I’ve been experimenting with building a personal finance AI agent that uses my Beancount ledger as its data source. Wanted to share my architecture and get feedback.
The Goal
Natural language queries against my financial data:
- “What did I spend on restaurants in Q3?”
- “Am I on track for my annual savings goal?”
- “Compare my utility bills year-over-year”
- “Flag any unusual transactions this month”
Current Architecture
┌─────────────┐ ┌──────────┐ ┌─────────┐
│ Telegram │────▶│ Agent │────▶│ Claude │
│ Bot │◀────│ Layer │◀────│ API │
└─────────────┘ └────┬─────┘ └─────────┘
│
┌────▼─────┐
│ Beancount│
│ + BQL │
└──────────┘
Components:
- Telegram Bot - Interface (inspired by the Beancount Telegram Bot project)
- Agent Layer - Translates NL → BQL, handles context
- Claude API - Powers the NL understanding and response generation
- Beancount - Source of truth, queried via BQL
What’s Working
Simple queries work great:
- “Total expenses last month” → translates to BQL, returns accurate number
- “List all Amazon purchases” → finds transactions, formats nicely
Proactive insights are the real win:
- “Your utility bills increased 30% vs last quarter - PG&E specifically went from $85/mo to $120/mo. Want to investigate?”
- “You’ve spent 80% of your dining budget with 10 days left in the month”
Research shows ML-integrated personal finance systems can hit ~97% recommendation accuracy. I’m not there yet, but the potential is clear.
Challenges
- Complex queries - “What percentage of my income goes to fixed vs variable expenses?” requires multiple BQL queries and reasoning
- Time context - “Last month” vs “past 30 days” vs “November” - LLM sometimes picks wrong interpretation
- Account hierarchy - Need to teach the model my specific chart of accounts
Open Questions
-
Local vs cloud LLM - Currently using Claude API, but sending financial data to the cloud feels uncomfortable. Has anyone tried local models (Llama 3, Mistral) for this?
-
Caching strategy - Re-running BQL for every query is slow. Thinking about pre-computing common aggregations.
-
Multi-step reasoning - For complex questions, should I use tool-calling to let the LLM iterate, or pre-define query patterns?
Anyone else building something similar? Would love to compare approaches.