AI Fluency is Now Table Stakes: How I'm Teaching My Team to Prompt, Review, and Govern AI

By 2026, I thought I’d be worrying about teaching my team the latest tax code changes. Instead, I’m teaching them how to write better prompts.

Last month, I hired a bright accounting grad—top of her class, passed the CPA exam on the first try. On day three, she asked me: “Do we have a prompt library for transaction categorization?” Not “Do we have a categorization guide?”—a prompt library. That’s when it hit me: AI fluency is no longer a nice-to-have. It’s table stakes.

The 2026 Reality: AI Fluency = Core Competency

The profession has fundamentally shifted. In 2020, knowing GAAP made you qualified. In 2026, knowing GAAP and how to prompt, review, and govern AI systems makes you qualified. Without both, you’re preparing for a career that no longer exists.

Here’s what changed: AI doesn’t just automate data entry anymore. It suggests journal entries, flags anomalies, drafts audit notes, and categorizes transactions with startling accuracy. But here’s the catch—it’s only as good as the humans who govern it.

Three Pillars of AI Competency for Accountants

After two years of integrating AI into my practice (and making plenty of mistakes), I’ve identified three non-negotiable skills:

1. Prompting: Teaching AI What You Need

Effective prompting isn’t just typing questions into ChatGPT. It’s understanding how to structure requests for repeatable, reliable outputs.

Example from my practice:

  • Bad prompt: “Categorize this transaction”
  • Good prompt: “Categorize this $127.50 charge from ‘AWS’ as either ‘Cloud Services:Production’ or ‘Cloud Services:Development’. Consider: production charges are typically >$100/month and occur on the 1st. Return: category, confidence score (0-1), reasoning.”

We maintain a shared prompt library in our practice management system. Common use cases: transaction categorization, document summarization, reconciliation anomaly detection, client communication drafting.

2. Reviewing: Professional Skepticism Applied to AI

This is where accountants have a natural advantage. We’re trained to question, verify, cross-check. Now we apply that skepticism to AI outputs.

My review framework:

  • High confidence (>0.9) + routine transaction (<$100): Auto-accept with spot-check audits
  • Medium confidence (0.7-0.9) OR significant amount (>$100): Manual review required
  • Low confidence (<0.7): Full investigation, treat as exception

The key insight: AI is pattern recognition, not understanding. It categorizes your daughter’s college bookstore charge as “office supplies” because it matches the pattern—technically correct, wrong for tax purposes.

3. Governing: Audit Trails for AI Decisions

Here’s where Beancount users have a massive advantage over cloud accounting software users.

In proprietary platforms: AI suggestion → click “accept” → it’s recorded, but you can’t see the decision trail.

In Beancount + AI workflow:

2026-03-10 * "AWS" "Cloud hosting - AI suggested: Cloud Services:Production (confidence: 0.92)" #ai-categorized
  Expenses:Cloud-Services:Production    127.50 USD
  Liabilities:Credit-Card:Amex         -127.50 USD

Every AI suggestion is documented. Git commits show exactly what AI recommended versus what you approved. If an auditor asks “How did you classify this?” you can point to the transaction metadata, the AI’s reasoning, and your review decision—all in plain text.

The Beancount + AI Sweet Spot

Plain text accounting is uniquely suited for AI governance:

  1. Transparency: Every AI decision is visible in the ledger file
  2. Auditability: Git history shows who approved what and when
  3. Ownership: Your data, your AI workflow, your rules—no vendor lock-in
  4. Integration: Build custom importers that incorporate AI suggestions with review flags
  5. Documentation: Transaction comments store AI confidence scores, reasoning, human overrides

I’ve built a workflow where our bank importer uses an LLM API to suggest categories, stores the confidence score as metadata, and flags low-confidence transactions for review in Fava. The entire decision trail is preserved in plain text.

How I’m Building AI Fluency in My Practice

1. Prompt Library Development
Every time someone writes an effective prompt, it goes in the shared library. We now have 30+ tested prompts for common accounting tasks.

2. Review Protocol Training
New hires spend week one learning why AI makes the suggestions it does—and where it fails. We use historical transactions with known errors as training data.

3. Governance Documentation
Every client has an “AI Workflow” section in their file documenting:

  • Which tasks use AI assistance
  • Review thresholds and approval authority
  • How AI decisions are recorded
  • Exception handling procedures

4. Continuous Learning
Monthly team meeting: “AI Wins and Fails.” We share what worked, what didn’t, and update our protocols.

The Uncomfortable Truth

Here’s what I tell every new hire: If you can’t explain how you validated an AI-generated output, you’re not doing accounting—you’re just clicking buttons.

The accountants who will thrive in 2026 and beyond aren’t the ones who resist AI or blindly embrace it. They’re the ones who understand how to harness AI’s speed while applying human judgment, professional skepticism, and ethical reasoning.

AI fluency isn’t replacing accounting skills. It’s the lens through which all accounting skills are now applied.

Your Turn

I’m sure I’m not alone in this journey. What AI + Beancount workflows are you using? How do you teach prompt engineering vs. traditional accounting concepts? Where have you seen AI fail in ways that surprised you?

I’d love to hear how others are building AI competency in their practices—and what governance frameworks you’ve found effective.

Alice, this resonates deeply. As a former IRS auditor, I keep thinking about what happens when the audit letter arrives and the taxpayer has to explain AI-driven decisions.

The Audit Question: “How Did You Arrive at This?”

In an audit, the IRS doesn’t care who made the decision—human or AI. They care about documentation and substantiation. When you claim a $5,000 home office deduction and AI suggested the categorization, the auditor will ask: “Walk me through how you determined this was legitimate.”

If your answer is “My accounting software AI said so,” you’re in trouble.

If your answer is “AI flagged this based on pattern X, I reviewed it against criteria Y and Z, confirmed it met IRS Publication 587 requirements, and documented my review here [shows Beancount transaction comment],” you’re in a much better position.

The Risk of Plausible-But-Wrong AI Suggestions

Your college bookstore example is perfect. AI is excellent at pattern matching but terrible at context and intent—the exact things that matter for tax compliance.

Real examples I’ve seen:

  • AI categorizing Costco purchases as “business supplies” because the client has a business account—but half the cart was groceries
  • AI suggesting meal deductions at 100% instead of 50% because it didn’t understand the entertainment rules
  • AI recommending aggressive depreciation schedules that technically comply with GAAP but trigger audit flags

The pattern was right. The tax treatment was wrong.

Documentation: Your Audit Defense

Here’s where Beancount’s plain text approach is genuinely superior for tax purposes.

When I work with clients using proprietary platforms, their audit trail looks like:

  • Date, amount, category
  • Maybe a receipt attachment
  • No visibility into why that category was chosen

When I work with Beancount users, the audit trail includes:

2026-02-15 * "Costco" "Business supplies - AI suggested: Office:Supplies (conf 0.88). Reviewed receipt: $127 office supplies, $43 personal groceries. Allocated proportionally." #ai-assisted #reviewed
  Expenses:Office:Supplies    127.00 USD
  Expenses:Personal:Groceries  43.00 USD  ; Not deductible
  Liabilities:Credit-Card    -170.00 USD

That transaction tells the whole story: AI suggestion, human review, allocation rationale, tax treatment. Git commits add another layer showing when the review happened and who approved it.

My Review Framework for Tax Compliance

I’ve adapted Alice’s confidence-based framework with tax-specific thresholds:

Auto-accept (with quarterly spot-checks):

  • High confidence (>0.9)
  • Routine/recurring transaction
  • Low tax risk (<$100, clear business purpose)

Manual review required:

  • ANY deduction-related transaction (meals, travel, home office, vehicle)
  • Mixed personal/business potential (Costco, Amazon, phone bills)
  • Transactions >$500
  • Any confidence <0.9

Full documentation + consultation:

  • Large capital expenses (depreciation implications)
  • International transactions (currency, tax treaty issues)
  • Crypto transactions (cost basis, character of income)
  • Any transaction that could trigger an audit flag

Question for You, Alice

You mentioned building an importer with AI suggestions and confidence scores. What happens when AI gives a plausible-but-wrong suggestion at high confidence?

For example, AI categorizes a $2,000 business coaching program as “Training:Professional Development” with 0.95 confidence—which seems right. But it’s actually a multi-level marketing scheme, making it non-deductible.

How do you build in safeguards against AI being confidently wrong about things that require domain expertise (tax law, GAAP interpretation, regulatory nuances)?

I’m wondering if the confidence score is the wrong metric entirely. Maybe we need a “tax risk score” or “requires CPA review” flag for certain transaction types, regardless of AI confidence.

What’s your experience been with false positives—where AI is confident but incorrect?

This hits home. I’ll be honest—when AI categorization tools started appearing, my first reaction was fear. “Great, now software is coming for the one thing I’m good at.”

But after reluctantly trying it (my wife basically forced me after watching me manually categorize 500 transactions one Saturday), I realized something important: AI doesn’t replace judgment. It accelerates the boring parts so you can focus on the judgment.

My AI Evolution: From Skeptic to Strategic User

2024: Manually categorized every transaction. Took 3-4 hours per month. Felt virtuous but exhausted.

Early 2025: Started using AI-assisted importer. Let it suggest, I review. Time dropped to 45 minutes. Felt guilty, like I was cheating.

Mid 2025: Built confidence thresholds. AI handles routine stuff, flags the weird stuff for me. Down to 20 minutes. Stopped feeling guilty.

Now (2026): AI learns from my corrections. I spend time on what actually matters—analyzing spending patterns, planning for big expenses, optimizing tax strategy. The categorization happens in the background.

The transformation wasn’t about the time savings (though that’s real). It was about mental energy. I’m no longer decision-fatigued from 500 micro-decisions. I save that energy for the decisions that actually matter.

Where AI is Brilliant (and Where It’s Useless)

AI is great at:

  • Recurring transactions (Netflix = Entertainment, every month, easy)
  • Clear merchant categories (Safeway = Groceries, obvious)
  • Consistent patterns (AWS charges on the 1st = Cloud Services)

AI is terrible at:

  • Context and intent (your college bookstore example, Alice)
  • One-time unusual charges (what IS that $247.83 charge from “Tech Services LLC”?)
  • Personal vs business allocation (my phone bill is 60% business, 40% personal—AI can’t know that)
  • Tax nuance (Tina’s MLM example is perfect)

The key insight: AI sees patterns. Humans understand purpose.

The Real Skill: Building Your “Trust But Verify” Muscle

Using Beancount already trains you for AI oversight, even if you don’t realize it.

When you manually enter transactions in Beancount:

  • You write the category → instant feedback if it doesn’t exist
  • Balance assertions catch errors → you learn to spot inconsistencies
  • You see the full account tree → you understand the taxonomy

When AI suggests categories:

  • Same process, just faster
  • You still review → same verification muscle
  • Balance assertions still catch errors → AI doesn’t bypass the safety net

If you’re already comfortable with Beancount’s “text file → validate → commit” workflow, you’re more prepared for AI oversight than QuickBooks users who just click “Yes” in a UI and hope for the best.

My Current Workflow: AI + Human Review

Here’s what works for me:

1. Import transactions with AI suggestions

python importers/chase.py --with-ai-categorization > new_transactions.bean

2. Review in Fava with confidence flags

  • Green (>0.9): Quick visual scan, auto-accept
  • Yellow (0.7-0.9): Read the transaction, verify category makes sense
  • Red (<0.7): Full investigation, often requires looking up the charge

3. Add comments for future context

2026-03-01 * "Tech Services LLC" "Annual SaaS renewal - AI unsure (0.62), verified via email receipt" #ai-low-confidence
  Expenses:Software:Subscriptions    247.83 USD
  Liabilities:Credit-Card           -247.83 USD

4. Commit with review metadata

git commit -m "March transactions - AI categorized 127/135 (94%), manual review 8 transactions"

Over time, AI learns from my corrections. That “Tech Services LLC” charge? Next year, AI will remember it’s a subscription and suggest the right category at 0.95 confidence.

The Surprise: What AI Gets Wrong Teaches You

This is what I didn’t expect: AI’s mistakes make you a better accountant.

Example: AI kept categorizing my daughter’s college bookstore charges as “Office Supplies” (pattern match: bookstore = supplies). After the third time correcting it, I realized:

  1. I needed a better category structure (“Education:Textbooks” vs “Office:Supplies”)
  2. I should document the why in my chart of accounts (“Education = tuition, fees, textbooks for dependents claimed on taxes”)
  3. I now have a rule: AI can suggest, but education-related charges always get manual review

AI’s confident-but-wrong suggestions forced me to clarify my own taxonomy and decision-making rules. Now I’m more consistent—and so is the AI.

For the Skeptics: Start Small

If you’re nervous about AI (I was!), here’s my recommendation:

Month 1: Just watch what AI suggests. Don’t auto-accept anything. Compare AI suggestion vs what you would have chosen manually. Build trust (or distrust) based on accuracy.

Month 2: Auto-accept only high-confidence (>0.95) recurring transactions you recognize. Review everything else manually.

Month 3: Expand auto-accept to >0.9 confidence for categories you trust AI on (groceries, gas, utilities). Still manually review complex stuff (business expenses, deductions).

Month 4+: Adjust thresholds based on your comfort and AI accuracy. Some people end up at 0.85, some stay at 0.95. It’s personal.

The goal isn’t to blindly trust AI. The goal is to calibrate your trust based on evidence.

Alice, Here’s My Question

You mentioned teaching your team prompt engineering. How do you teach someone to recognize when a prompt needs refinement?

Like, if I write a prompt and AI returns garbage, how do I know if the problem is:

  • My prompt was too vague
  • My prompt included conflicting instructions
  • The AI doesn’t have enough context
  • This task just isn’t suitable for AI

I can usually figure it out through trial and error, but I’m wondering if there’s a systematic way to diagnose prompt failures. Especially for less technical team members who might just think “AI doesn’t work” and give up.

What’s your framework for teaching prompt iteration?


Final thought: If you’re already using Beancount, you’re ahead of the curve. The “review every transaction, verify with balance assertions, commit with documentation” mindset is exactly what AI governance requires. You just didn’t know you were training for it.

The FIRE community has been obsessed with this exact topic—how to integrate AI automation while maintaining control and auditability. I’ve been running AI-assisted Beancount workflows for 18 months and have actual data on what works (and what doesn’t).

The Quantified AI Impact

Let me share real numbers from my Beancount ledger:

Pre-AI (2024):

  • Time spent on monthly categorization: 3.2 hours
  • Categorization errors caught during quarterly review: 8-12
  • Mental load: High (decision fatigue from 400+ micro-decisions)

With AI (2026):

  • Time spent on monthly categorization: 22 minutes
  • Categorization errors caught during quarterly review: 2-3 (and most are AI-flagged for review)
  • Mental load: Low (review ~30 flagged transactions, auto-accept the rest)

Time savings: 85%. But here’s what surprised me: the quality improvement. AI + review is more accurate than my tired manual categorization at 10 PM on Sunday.

My AI + Beancount Technical Workflow

Since this community loves the technical details, here’s my actual implementation:

1. LLM-Enhanced Transaction Importer

def categorize_with_ai(description, amount, merchant):
    prompt = f"""Categorize this transaction for personal finance tracking:
    
Merchant: {merchant}
Description: {description}
Amount: ${amount}
    
Return JSON with:
- category (use hierarchy like Expenses:Food:Groceries)
- confidence (0-1)
- reasoning (one sentence why)
- tax_relevant (boolean - could this matter for taxes?)

Context: This is a personal ledger tracking FIRE journey. 
Standard categories: Groceries, Dining, Transportation, Housing, Healthcare, etc.
"""
    
    response = openai.ChatCompletion.create(
        model="gpt-4",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.1  # Low temp for consistency
    )
    
    return json.loads(response.choices[0].message.content)

2. Confidence-Based Workflow

def generate_transaction(txn_data, ai_suggestion):
    confidence = ai_suggestion['confidence']
    category = ai_suggestion['category']
    
    if confidence > 0.9 and not ai_suggestion['tax_relevant']:
        # Auto-accept, but document
        comment = f"AI cat: {category} (conf {confidence:.2f})"
        tag = "#ai-auto"
    elif confidence > 0.7:
        # Suggest, require review
        comment = f"AI suggests: {category} (conf {confidence:.2f}). {ai_suggestion['reasoning']} [REVIEW REQUIRED]"
        tag = "#ai-review"
    else:
        # Low confidence, flag for manual
        comment = f"AI unsure (conf {confidence:.2f}). Manual categorization needed."
        tag = "#ai-manual"
    
    return format_beancount_transaction(txn_data, comment, tag)

3. Git-Based Governance

Every AI decision gets its own commit with metadata:

git commit -m "AI categorization batch 2026-03-14

Auto-accepted: 127 transactions (conf >0.9)
Flagged for review: 23 transactions (0.7-0.9)
Manual categorization: 8 transactions (conf <0.7)

AI model: gpt-4 (temp: 0.1)
Total processing time: 3.2 seconds
Human review time: 18 minutes
"

The git log becomes an audit trail of every AI decision and human override.

What I’ve Learned About False Positives

Alice and Tina, your question about AI being “confidently wrong” is crucial.

My false positive data (from 18 months of tracking):

  • High confidence (>0.9) errors: 2.1% - Rare but dangerous
  • Medium confidence (0.7-0.9) errors: 12.4% - Manageable with review
  • Low confidence (<0.7) errors: 31.8% - Expected, that’s why it’s flagged

The scary category is high-confidence errors. Examples:

  1. Amazon purchases: AI confidently categorized a $47 dog bed as “Expenses:Household:Furniture” (0.94 conf). Technically correct, but I track pet expenses separately for budgeting.

  2. Venmo transfers: AI categorized a $200 Venmo to my brother as “Expenses:Gifts” (0.91 conf). It was actually splitting a dinner bill—should be “Expenses:Dining:Restaurants”.

  3. Subscription renewals: AI miscategorized a $19/month charge from “Setapp” as “Expenses:Entertainment:Streaming” (0.88 conf). It’s actually productivity software.

The pattern: AI is pattern-matching on merchant names and amounts, but lacks context about my category structure and my financial goals.

The Solution: Category-Specific Rules

I’ve added a layer on top of AI confidence scores: category risk assessment.

Some categories get automatic review regardless of AI confidence:

HIGH_RISK_CATEGORIES = [
    "Expenses:Medical",      # Tax deductible
    "Expenses:Charitable",   # Tax deductible
    "Expenses:Business",     # Tax deductible
    "Income:Salary",         # Tax relevant
    "Income:Investment",     # Tax treatment varies
]

def requires_review(category, confidence, amount):
    if category in HIGH_RISK_CATEGORIES:
        return True
    if amount > 500:
        return True
    if confidence < 0.9:
        return True
    return False

This catches Tina’s MLM example: Even if AI suggests “Training:Professional” at 0.95 confidence, the $2,000 amount triggers manual review.

The Pricing Question (Great Question, Alice!)

You asked how to price AI-assisted services. Here’s my framework:

Don’t pass the savings to clients as lower prices. Pass them as better service.

Traditional bookkeeper:

  • Charges $500/month
  • Spends 10 hours manually categorizing
  • Delivers monthly P&L report
  • No time for proactive insights

AI-assisted bookkeeper:

  • Still charges $500/month (or more!)
  • Spends 2 hours reviewing AI categorization
  • Spends 8 hours on advisory: cash flow forecasting, tax optimization, scenario planning
  • Delivers monthly P&L + strategic recommendations

The client gets more value. You get better margins. AI isn’t about competing on price—it’s about competing on quality.

The Philosophical Shift: From Categorizer to Validator

Mike’s point about mental energy is spot-on. The real transformation isn’t time—it’s identity.

Pre-AI: “I am the person who categorizes transactions.”
With AI: “I am the person who validates financial data and provides strategic insights.”

The second role is higher value, more satisfying, and harder to automate.

For the FIRE community, this matters because our time is our most valuable asset. AI gives me back 3 hours a month. That’s 36 hours a year. At my consulting rate, that’s $7,200 in value—or 36 hours to spend with my kids.

Questions for the Group

  1. Alice: How do you handle category standardization across clients? If each client has a custom chart of accounts, how do you build prompt libraries that work universally?

  2. Tina: For tax-sensitive transactions, do you ever use AI for tax research (e.g., “Is this deductible?”), or is that too risky without human CPA oversight?

  3. Mike: Your “Month 1-4” rollout plan is great. Did you ever encounter a situation where AI accuracy decreased over time (e.g., as your spending patterns changed)?

Final Data Point

I track my “AI trust calibration” in my Beancount ledger itself:

2026-03-14 note Assets:ML-Model "Monthly AI accuracy review"
  ai_accuracy_high_conf: 0.979 ; 97.9% of >0.9 conf suggestions were correct
  ai_accuracy_medium_conf: 0.876
  ai_accuracy_low_conf: 0.682
  human_override_rate: 0.034 ; I override AI 3.4% of the time
  false_negative_rate: 0.012 ; AI flagged for review but was actually correct

This metadata lives in my ledger forever. If I’m ever audited or questioned, I can show exactly how I validated my categorization process.

Plain text + AI + Git = the most auditable financial system I’ve ever seen.

This conversation is both inspiring and terrifying. Inspiring because I see the potential. Terrifying because my clients are asking the same question every week: “Can’t AI just do your job?”

The Client Conversation Nobody Wants to Have

Last month, a long-time client (5 years, $400/month retainer) sent me an article about “AI bookkeeping services for $99/month.” The email subject line was literally: “Should we try this?”

We had the conversation. It was uncomfortable. But it forced me to articulate something I hadn’t fully thought through: The value I provide isn’t categorizing transactions. It’s knowing when the categories are wrong—and why that matters.

Here’s what I told them (and what actually worked):

The $12K Mistake Story

"Remember when you almost switched to that AI-only service? Let me tell you about a client who did.

They were thrilled. $99/month instead of my $400/month. AI categorized everything. Gave them pretty dashboards. Felt modern and sophisticated.

Tax season arrived. The AI had been categorizing their Costco trips as 100% business expenses. Every trip. For 18 months. The pattern made sense to AI: business credit card + warehouse store = business supplies.

Except half of every trip was groceries for their home. The IRS didn’t care that AI made the mistake. $12,000 in disallowed deductions. Plus penalties.

They’re back with me now, paying $450/month (I raised my rates). Because they learned: AI without oversight is expensive."

That conversation saved the client relationship. But it exposed something I hadn’t fully confronted: If I can’t explain why human judgment matters, I’m competing on price with $99/month robots.

How I’m Repositioning: From “Bookkeeper” to “Financial Data Validator”

I’ve changed how I talk about my services. Not because the work changed—but because the framing matters.

Old positioning (sounds like commodity work):

  • “I categorize your transactions”
  • “I reconcile your accounts”
  • “I prepare monthly financial statements”

New positioning (sounds like professional oversight):

  • “I validate your financial data for tax compliance and business decisions”
  • “I ensure your books will survive an IRS audit”
  • “I translate financial data into strategic insights”

Same work. Different value proposition.

Using AI to Demonstrate Value (Not Replace It)

Here’s what I’ve started doing: Show clients the AI suggestions I override, and explain why.

Monthly report now includes a section:

“AI Oversight This Month”

  • Transactions reviewed: 247
  • AI suggestions accepted: 214 (87%)
  • AI suggestions corrected: 33 (13%)
  • High-risk corrections (tax implications): 8

Then I list examples:

Example 1: AI categorized $3,500 equipment purchase as “Expenses:Office Equipment” (immediate expense). I reviewed depreciation rules and reclassified as “Assets:Equipment” with proper depreciation schedule. Tax impact: Deferred $2,800 deduction over 5 years, avoiding audit flag for large immediate expense.

Example 2: AI categorized Zoom subscription as “Expenses:Software:Business” at 100%. Client uses Zoom for both business calls and personal family calls. I allocated 70% business, 30% personal based on usage discussion. Audit protection: Documented mixed-use allocation with rationale.

Clients read this and think: “Oh. This is why I pay for a human.”

The Race-to-the-Bottom Fear

But I’ll be honest: I’m worried about competitors who undercut on price by using AI without disclosure or oversight.

Scenario I’m seeing:

  • New “bookkeeper” charges $150/month (half my rate)
  • Uses AI for 95% of the work
  • Spot-checks maybe 20 transactions
  • Delivers reports that look fine
  • Client is happy… until tax season or an audit

How do I compete with that? The client gets a year of “good enough” service at half price. By the time the problems surface, the budget bookkeeper is long gone and I’m cleaning up the mess (for even higher fees, because cleanup is harder than doing it right the first time).

My question for this group: How do you educate clients about the risks of AI-only bookkeeping without sounding like a Luddite who’s scared of technology?

What I’m Learning About AI + Beancount

I started using Beancount for my own business finances last year. The AI workflow Mike and Fred described is exactly what I’m building.

What surprised me: Beancount makes it easier to show my work.

With QuickBooks, if a client asks “Why did you categorize this as X?” my answer is: “That’s what the dropdown said.”

With Beancount + AI workflow, my answer is:

2026-03-01 * "AWS" "Cloud hosting - AI suggested: Office:Tech (0.89). Reviewed: this is production hosting for client-facing app, not internal tool. Correct category: Cost-of-Goods-Sold:Hosting" #ai-corrected
  Expenses:COGS:Hosting    247.83 USD
  Liabilities:Credit-Card  -247.83 USD

I can show:

  • What AI suggested
  • Why I disagreed
  • The correct category and reasoning
  • When the decision was made (git commit timestamp)

This is professional-grade documentation. It positions me as a thoughtful validator, not just a transaction processor.

The Skills I’m Teaching (To Myself and Clients)

Alice, you asked about AI fluency. Here’s what I’m learning:

1. Prompt refinement for client-specific rules

"Categorize this transaction: $127 Amazon purchase.
Client rules:

  • Amazon purchases <$50 = likely personal
  • Amazon purchases >$100 = likely business supplies
  • $50-$100 range = flag for review
  • Client has separate personal Amazon account, but sometimes uses business account accidentally

Return category, confidence, and reasoning."

2. Teaching clients to think in “AI review triggers”

Instead of: “Just expense everything from Costco.”

Now: “Costco trips get flagged for review. Send me a quick text after each trip: rough split between business and personal. I’ll document the allocation.”

Clients understand this. It’s collaborative, not adversarial.

3. Building prompt libraries for common client questions

I now have 20+ saved prompts for:

  • “Is this deductible?” → AI gives initial answer, I verify against IRS pubs
  • “How should I categorize X?” → AI suggests, I confirm
  • “What’s the tax treatment?” → AI provides research, I make final call

I’m faster. But clients see me using AI with professional judgment, not instead of professional judgment.

My Big Fear: The Liability Question

Here’s what keeps me up at night: When AI makes a mistake, who’s liable?

If I manually miscategorize something, it’s my error. Professional liability insurance covers it (and honestly, it’s a learning moment).

If AI miscategorizes something and I don’t catch it:

  • Is it still my error?
  • Does my insurance cover “failure to review AI output”?
  • Will clients understand the difference between “my mistake” and “I trusted the AI”?

For those of you using AI in client work: Have you talked to your insurance provider? Do they have AI-specific coverage or exclusions?

What Would Help Me Sleep Better

  1. Industry standards for AI oversight - Something like “AI suggestions must be reviewed by licensed professional for tax-relevant transactions”

  2. AI confidence score disclosure - If AI was involved, disclose the confidence level in the financial records

  3. Client education frameworks - How to explain AI + human collaboration without making it sound like I’m offloading work to robots

  4. Professional liability clarity - What does my insurance actually cover when AI is part of my workflow?

Hope Amidst the Fear

Despite all this anxiety, I do see the opportunity. Fred’s framework resonates: Don’t compete on price. Compete on quality.

The bookkeepers who will thrive in 2026 aren’t the ones who resist AI (that’s me in 2024—didn’t work). And it’s not the ones who replace themselves with AI (the $99/month services will burn clients eventually).

It’s the ones who use AI to level up from “transaction processor” to “financial data strategist.”

That’s the person clients will pay $450/month for. Because when the IRS comes knocking, they don’t want the cheapest bookkeeper. They want the one who can explain every decision and show their work.

And with Beancount + AI + Git, I can finally do that at scale.


Alice, one more question: How do you handle client confidentiality when using LLM APIs? Do you self-host models? Use enterprise agreements with data protection? I worry about sending client financial data to OpenAI, even if it’s just transaction descriptions.