Dext Just Launched an AI Agent That 'Learns How You Bookkeep'—Could We Build Something Like This for Beancount, But Where YOU Own the Model?

Last month (March 23), Dext launched AI Assist—an AI agent that watches how you categorize transactions, apply VAT/tax treatment, and handle specific suppliers, then learns from your decisions to automate those patterns across your entire client base. In January 2026 alone, Dext processed 31.4 million receipts globally—work that would have taken 2 million+ manual hours. Their AI Assist reduces that further by turning your professional judgment into reusable rules.

Here’s what caught my attention as someone who manages 20+ small business clients on Beancount:

What Dext AI Assist Actually Does

  • Learns from your edits: When you consistently recategorize a certain type of expense, it picks up the pattern
  • Goes beyond supplier name matching: It reads document content to decide categorization (not just “oh, this is from Staples, must be Office Supplies”)
  • Suggests automations you can review: Every suggestion is surfaced for approval before being applied—human stays in the loop
  • Applies patterns across clients: What you learn from Client A gets suggested for Client B

Sounds impressive. But here’s what bugs me: the model is Dext’s, not mine. My professional judgment—years of learning which categories work for specific industries, which edge cases trip up automation, which clients have unusual chart structures—all of that feeds their proprietary system. If I leave Dext, I lose the accumulated intelligence.

The Beancount Parallel

In our world, we already have something similar but more primitive:

  • Importer rules: I write Python scripts that categorize based on description patterns, amounts, and payees
  • Smart_importer plugin: Uses machine learning to predict categories based on historical transactions
  • beanhub-import: Declarative, rule-based import engine

But these are my rules, in my Git repo, under my control. The intelligence doesn’t live in someone else’s cloud. The question is: could we build something with Dext AI Assist’s sophistication (learning from corrections, reading document content, cross-client pattern transfer) while keeping the Beancount philosophy of transparency and ownership?

What I Think We’d Need

  1. A local ML model (fine-tuned on your ledger history) that predicts categories—not cloud-based, not vendor-locked
  2. Correction feedback loop: When you fix a misclassification, the model retrains (like how Dext learns from your edits)
  3. Cross-file pattern transfer: Learn patterns from one client’s ledger, suggest for another
  4. Explainable suggestions: Don’t just say “this is Office Supplies”—show why (“matched description pattern from 47 previous transactions in Client B’s ledger, confidence 94%”)
  5. Git-native workflow: Suggestions saved as pending transactions in a staging file, approved via Git diff review

The Economics Question

Dext AI Assist costs £5/month per user (launch offer). If a Beancount version took 40 hours to build and maintain, at /hour that’s ,000 initial investment. Break-even vs Dext: 50 months. But if it handles 5 clients at the Dext equivalent, break-even drops to 10 months. And you own it forever.

Questions for the community:

  1. Has anyone integrated local LLMs (Llama, Mistral, etc.) with Beancount for transaction categorization? How accurate is it compared to cloud AI?
  2. Is smart_importer still the best ML categorization plugin, or has something newer emerged?
  3. For bookkeepers managing multiple clients: do you transfer learned patterns between client ledgers, or does each client have completely separate importer rules?
  4. Would you pay for a managed “Beancount AI Assist” service that runs locally but provides Dext-level sophistication? What would it be worth?

The broader question: Is the future of bookkeeping automation proprietary AI agents (Dext, Pilot, Botkeeper) that own your professional judgment, or open-source alternatives where the intelligence stays with the practitioner?

Bob, this is a critically important question from a professional liability standpoint.

I’ve been watching the Dext AI Assist launch closely, and here’s what concerns me as a CPA:

The Judgment Portability Problem

When Dext’s AI “learns how you bookkeep,” it’s essentially encoding your professional judgment into their proprietary model. Think about what that means:

  • Your malpractice insurance covers your decisions. If Dext’s AI applies a pattern you taught it to a different client where it’s wrong, who’s liable? You “approved” the automation rule, but did you approve it for this specific context?
  • Engagement letter scope: My engagement letters specify that I exercise professional judgment for each client. If I’m delegating pattern-matching to a vendor’s AI, am I still fulfilling that commitment?
  • AICPA professional standards require us to maintain skepticism. An AI that says “94% confidence” is seductive—but that 6% could be a material misstatement.

Where Beancount’s Approach Wins

The transparency argument isn’t just philosophical—it’s a compliance requirement for my practice:

# I can show an auditor EXACTLY why this was categorized this way
# Rule: invoices from Amazon Business with "AWS" → Expenses:Technology:Cloud
2026-04-01 * "Amazon Web Services" "Monthly compute charges"
  Expenses:Technology:Cloud:AWS    1,247.83 USD
  Liabilities:CreditCard:Chase
; categorization-rule: aws_cloud_services_v3.py:line_47
; confidence: exact-match (payee + description keyword)

Compare this to Dext saying “AI categorized this based on learned patterns.” If the IRS asks why, what documentation do you have?

My Practical Setup

For my 15 CPA clients on Beancount, I use a tiered approach:

  1. Exact rules (70% of transactions): Payee + amount range → deterministic categorization. Zero ambiguity.
  2. Pattern rules (20%): Description regex matching with confidence thresholds. Anything below 85% goes to manual review queue.
  3. Manual review (10%): New vendors, unusual amounts, cross-category edge cases.

I’d love a local ML model for that middle 20%, but only if it provides auditable reasoning, not a black box confidence score. The smart_importer plugin is a good start but its explanations are basically “nearest neighbor in training data”—not something I’d show an auditor.

On the Economics

Bob, your break-even math is right, but you’re missing the risk-adjusted cost. If Dext’s AI miscategorizes something that triggers an audit finding, the remediation cost dwarfs £5/month. With your own rules in Git, you can git blame exactly when a rule was introduced and why. That’s worth more than the automation itself.

This is exactly the kind of project I’ve been tinkering with for my personal FIRE tracking. Let me share what’s actually working and what isn’t.

My Current Setup: Local LLM + Beancount

I’ve been running a fine-tuned Mistral 7B model locally (on my M3 MacBook Pro) for transaction categorization since January. Here’s the honest report card:

What works:

  • Recurring transactions: 97%+ accuracy on subscriptions, utilities, regular vendors. But honestly, regex rules already handle these at 100%.
  • Restaurant/food categorization: ~89% accuracy distinguishing groceries vs dining out vs coffee shops. This is where ML actually adds value over rules—natural language in transaction descriptions is messy.
  • Cross-account learning: Trained on both my personal ledger and my wife’s—it correctly applies patterns from one to the other for shared vendors.

What doesn’t:

  • New vendor cold start: First time seeing a vendor, accuracy drops to ~60%. Basically random guessing with extra steps.
  • Context-dependent categorization: Same vendor, different purposes. I buy from Home Depot for both rental property maintenance (Expenses:RealEstate:Maintenance) and personal projects (Expenses:Home:Improvement). The model can’t distinguish without looking at the amount and timing—and even then it’s shaky.
  • Inference speed: ~2 seconds per transaction. For my monthly import of ~400 transactions, that’s 13 minutes of churning. Not terrible, but not instant either.

The Technical Stack

For anyone wanting to try this:

# Simplified version of my categorization pipeline
# Full version: ~300 lines including feedback loop

from smart_importer import PredictPostings
from beancount import loader

# 1. Load historical ledger for training
entries, errors, options = loader.load_file('main.beancount')

# 2. PredictPostings uses sklearn under the hood
# I've monkey-patched it to also call my local Mistral
# for transactions where sklearn confidence < 0.8

# 3. Feedback loop: corrections saved to corrections.json
# Model retrains weekly via cron job

The smart_importer plugin is still the best starting point—it uses scikit-learn’s SGD classifier under the hood. But it hasn’t been updated much recently, and the feature extraction is basic (bag-of-words on payee/narration). A local LLM adds genuine value for the long-tail transactions.

On Dext’s Cross-Client Learning

This is where the proprietary vs open-source gap is real. Dext has data from 31.4 million receipts to train on. My local model has data from my ~15,000 transactions. The statistical advantage is enormous.

But here’s the counterargument: generic models make generic mistakes. Dext’s model knows that “AWS” is usually cloud computing. My model knows that in my ledger, there are three separate AWS accounts billed to three different cost centers, and the amount range determines which one. Generic training data can’t capture that.

What I’d Actually Pay For

Bob, to answer your question #4: I’d pay up to $20/month for a managed service that:

  • Runs locally (my data never leaves my machine)
  • Pre-trained on a large corpus of Beancount ledgers (with permission) for cold-start accuracy
  • Provides a feedback loop that improves with my corrections
  • Outputs Beancount transactions with metadata explaining the reasoning
  • Integrates with Git workflow (staging file → review → commit)

Basically Dext AI Assist’s UX, but with Beancount’s philosophy. That’s the sweet spot nobody’s built yet.

Great discussion. I want to push back slightly on the framing, because I think there’s a hidden assumption we should examine.

The “Build Our Own Dext” Trap

Every time a commercial product launches something impressive, the open-source community’s reflex is: “We should build that!” I’ve been in this community long enough to have watched several of these cycles:

  • Commercial receipt scanners → “Let’s build open-source OCR for Beancount!” (partially happened, mostly abandoned)
  • Plaid bank sync → “Let’s build our own bank API connector!” (beancount.io now offers this, but took years)
  • YNAB’s budgeting UX → “Let’s build envelope budgeting in Fava!” (fava-envelope exists but is niche)

The pattern: the initial enthusiasm is high, but the maintenance burden kills most projects. Dext has a team of engineers dedicated to AI Assist. We have… passionate volunteers who also have day jobs.

What We Should Actually Build

Instead of cloning Dext’s entire product, I think we should focus on what only the Beancount community can do well:

1. Shareable Rule Libraries (Not Models)

Rather than training ML models on ledger data (privacy nightmare), we should share categorization rule templates:

# community-rules/us-small-business/restaurants.rules
# Pattern: common restaurant POS systems
IF description MATCHES /SQ \*|TOAST TAB|CLOVER /
AND amount BETWEEN -500 AND -5
THEN account = Expenses:Food:DiningOut
CONFIDENCE = high

These are transparent (you can read them), modifiable (edit for your needs), shareable (contribute back without exposing financial data), and composable (stack multiple rule libraries).

2. Confidence-Based Review Queues

Fred’s point about context-dependent categorization is spot-on. The smart approach isn’t better AI—it’s better triage:

  • High confidence (>95%): Auto-categorize, log the rule that matched
  • Medium confidence (70-95%): Categorize but flag for batch review
  • Low confidence (<70%): Leave uncategorized, add to manual review queue

This is how I run my own ledgers (2 personal + 1 rental property). My monthly review takes about 20 minutes because I only look at the low-confidence bucket (~30-40 transactions out of ~500).

3. Migration Path Documentation

@bookkeeper_bob, your clients who are currently on Dext or similar platforms—what would it take for them to switch to a Beancount workflow? I suspect the answer isn’t “better AI” but “better onboarding.” The first month is brutal. After that, it’s actually faster.

The Philosophical Point

Alice nailed the accountability angle, but let me add one more layer: professional growth.

When Dext’s AI handles categorization for you, your judgment atrophies. You’re no longer practicing the pattern recognition that makes you a good bookkeeper. When you write and maintain your own rules, you’re continuously learning and encoding your knowledge explicitly.

I’ve been doing this for 4+ years now. My categorization rules are basically a documented expert system of my financial knowledge. That’s more valuable than any AI model—because I understand every line of it, and I can teach it to someone else.

The question isn’t “can we build a better Dext?” It’s “do we want to, or is our approach fundamentally different and that’s the point?”

Okay, I’m coming at this from a completely different angle as a software engineer who’s been using Beancount for about 6 months now.

This Is Just MLOps. We Already Know How to Do This.

Reading Bob’s wishlist—local ML model, correction feedback loop, cross-file pattern transfer, explainable suggestions, Git-native workflow—I’m thinking: this is a standard MLOps pipeline. The ML/AI industry solved these problems years ago. We just need to apply them to Beancount’s domain.

Here’s what the architecture would look like in engineering terms:

┌─────────────┐     ┌──────────────┐     ┌─────────────┐
│ Bank CSVs   │────▶│  Feature     │────▶│ Local Model │
│ (raw data)  │     │  Extraction  │     │ (Inference) │
└─────────────┘     └──────────────┘     └──────┬──────┘
                                                │
                    ┌──────────────┐     ┌──────▼──────┐
                    │  Corrections │◀────│  Staging    │
                    │  (feedback)  │     │  File (.bc) │
                    └──────┬───────┘     └─────────────┘
                           │
                    ┌──────▼───────┐
                    │  Retrain     │
                    │  (weekly)    │
                    └──────────────┘

This is literally a CI/CD pipeline for financial data. As someone who builds CI/CD systems for a living, the tooling already exists:

  • Feature extraction: scikit-learn’s TfidfVectorizer on transaction descriptions
  • Model: Start with logistic regression (fast, interpretable), graduate to a small transformer if needed
  • Serving: ONNX runtime for local inference (no cloud needed)
  • Feedback loop: Save corrections as labeled training data, retrain on schedule
  • Explainability: LIME or SHAP for feature importance on each prediction

@helpful_veteran: Respectful Pushback

I hear you on the “build our own Dext” trap, and you’re right that maintenance kills projects. But the reason past efforts died isn’t because the idea was wrong—it’s because the tooling wasn’t ready. In 2024, running a local LLM required a gaming GPU. In 2026, my MacBook Air runs Mistral 7B at 30 tokens/sec. The feasibility equation has fundamentally changed.

Also, your shareable rule library idea is great but it has a scaling problem: rules require someone to write them for every new pattern. ML models generate rules from data. The two approaches are complementary:

  1. Rules for known patterns (high confidence, deterministic)
  2. ML for unknown patterns (lower confidence, probabilistic)
  3. Rules generated FROM ML (when a pattern stabilizes, extract it as a rule)

That third step is the key insight. The ML model is a rule discovery engine, not a replacement for rules.

What I’d Actually Contribute

I’m a Python/Go developer with MLOps experience. If someone started a beancount-ai-categorizer project, I’d contribute:

  • ONNX model packaging for cross-platform local inference
  • A feedback loop that saves corrections and retrains
  • Integration with smart_importer as a fallback

Not promising to build the whole thing—I learned from @helpful_veteran’s warning about maintenance burden—but I’d happily contribute focused components.

Who else would contribute? And what’s the minimum viable version that’s actually useful? I think it’s simpler than people assume: a categorization model that trains on your existing ledger + a staging file workflow. Everything else is optimization.