When AI Miscategorizes $30K in Deductions: Who's Liable—You, the Software, or the Client?

Picture this: You’re using AI-powered accounting software to streamline your workflow. The AI confidently categorizes your client’s transactions. Your client approves them. You review the high-level numbers, everything looks reasonable. You file the return.

Six months later: IRS audit. Turns out the AI miscategorized $30,000 in personal expenses as legitimate business deductions. The IRS disallows them. Your client owes back taxes plus penalties—maybe $12,000-$15,000 total.

Who’s liable?

  • The CPA who reviewed and signed the return?
  • The software vendor whose AI made the error?
  • The client who approved the transactions?

I’ve been thinking about this a lot lately as AI becomes standard in our practice. Here’s what’s keeping me up at night:

The Liability Landscape in 2026

After researching this and talking to my E&O insurance broker, here’s the uncomfortable truth: professional standards haven’t changed just because we’re using AI tools.

When I sign a tax return under penalty of perjury, the IRS holds me accountable for accuracy regardless of what tools I used. “The AI said so” isn’t a defense. The software vendors explicitly disclaim liability for tax penalties in their terms of service. And while the client approved transactions, they relied on our professional judgment.

From a regulatory perspective, we’re in the same position we’ve always been: fully responsible for the work product. The Journal of Accountancy’s February 2026 article on AI risks made this crystal clear—CPAs remain accountable under existing professional standards, and regulators won’t accept “the AI miscategorized it” as an excuse.

The E&O Insurance Problem

Here’s where it gets worse: I just renewed my professional liability insurance, and the carrier added new language about AI exclusions. They’re trying to limit coverage for claims “in any way related, directly or indirectly” to AI usage.

So we might be in this bizarre situation where:

  1. We’re professionally required to stay current with technology
  2. AI is becoming industry standard
  3. But our liability insurance may not cover AI-related errors

My broker and I are still negotiating this, but it’s concerning.

What Constitutes “Reasonable Review”?

This is the question I’m wrestling with: What does “reasonable professional review” mean when AI is doing the categorization?

Is it reasonable to spot-check 10% of transactions? 25%? Do I need to manually review every single one, defeating the purpose of automation?

For context, I handle about 80 small business clients during tax season. AI categorization could save me 200+ hours. But if I need to review every transaction anyway, I’m not actually saving time—I’m just adding an AI step to my existing manual process.

Beancount’s Audit Trail Advantage

One reason I’m exploring Beancount more seriously: the plain text format creates an inherent audit trail. When AI categorizes transactions, I can:

  • See exactly what the AI did (not a black box)
  • Write queries to spot anomalies (unusually large deductions in new categories)
  • Track changes over time with version control
  • Document my review process with comments directly in the ledger

This feels more defensible than clicking “approve” in proprietary software where I can’t prove what I reviewed.

Questions for the Community

I’m curious how others are thinking about this:

  1. Have you modified your engagement letters to address AI usage and where liability sits?
  2. What’s your review workflow when using AI categorization? How much do you verify?
  3. Has your E&O insurance carrier asked about AI tools or changed your coverage?
  4. Do you see Beancount’s transparency as an advantage for professional liability?
  5. How do you explain this to clients who think AI is magic and don’t understand the risk?

I’m not anti-AI—I think it’s transformative. But I’m trying to use it responsibly while protecting my license and my clients. The liability framework feels unclear right now, and I’d love to hear how others are navigating this.

What am I missing? What’s your approach?

Alice, this is such an important conversation. As a former IRS auditor, I can tell you exactly how the IRS views this: they hold the preparer accountable, period.

“The Software Told Me” Doesn’t Work

During my years at the IRS, I saw countless preparers try to deflect responsibility with variations of “the software calculated it” or “TurboTax said this was deductible.” It never worked. When you sign a return as the preparer, you’re certifying that you’ve reviewed it and it’s accurate to the best of your knowledge.

With AI categorization, nothing changes from the IRS perspective. The examiner doesn’t care whether a human or an algorithm made the categorization decision. They care whether it’s correct according to the tax code.

What “Reasonable Review” Means (From an IRS Perspective)

The IRS expects preparers to exercise professional judgment. What that means practically:

  • Red flag review: You should have procedures to catch anomalies. A $30K deduction in a new category should trigger manual review.
  • Client context: You need to understand your client’s business well enough to spot miscategorizations. If you’re a nail salon, suddenly deducting heavy equipment should raise questions.
  • Documentation: You need to be able to demonstrate what you reviewed. “I trusted the AI” won’t satisfy an examiner.

This is where Beancount’s transaction-level documentation becomes valuable. Being able to show the examiner exactly what you reviewed, with comments explaining your reasoning, is powerful.

The Penalty Risk

Here’s what keeps me up at night as a practitioner: under IRC §6694, preparers can face penalties for unreasonable positions. If the IRS determines you didn’t exercise due diligence in reviewing AI categorizations, you could face penalties even if the client pays the tax.

The preparer penalty for understatement due to unreasonable positions is the greater of $1,000 or 50% of the income derived from preparing that return. For willful or reckless conduct, it’s the greater of $5,000 or 75% of income.

My Current Workflow

I use AI categorization, but with these safeguards:

  1. New client baseline: First year, I manually categorize everything to build understanding
  2. Anomaly queries: I wrote Beancount queries that flag transactions over certain thresholds or in unusual categories
  3. Client verification: I send clients category summaries to verify, with specific examples
  4. Document everything: Every review decision gets a comment in the beancount file

Is this perfect? No. Am I confident I could defend it in an audit? Yes.

What engagement letter language are you using, Alice? I’d love to see what others are doing to document the AI assistance while maintaining professional responsibility.

This conversation is hitting me at exactly the right time. I’ve been wrestling with how to talk to clients about AI tools without scaring them off or setting myself up for liability.

The Client Education Challenge

Here’s what I’m finding: clients hear “AI” and think it’s magic. They assume it’s more accurate than humans, infallible, cutting-edge. When I explain that I still need to review everything, they push back: “Then why am I paying for AI software?”

I had this exact conversation with a client last week. He wanted to know why my monthly fee wasn’t going down if AI is “doing the work.” I tried to explain that AI does the initial categorization but I’m still responsible for accuracy, and he said, “So you’re charging me to check the AI’s homework?”

He wasn’t wrong. That’s kind of exactly what I’m doing.

Does AI Actually Save Time?

I’m genuinely questioning this. Before AI:

  • I manually categorized ~800 transactions/month across 20 clients
  • Time: ~15-20 hours/month

With AI:

  • AI categorizes, I review
  • Time: ~12-15 hours/month (because now I’m reviewing AND fixing AI errors)

So I’m saving maybe 5 hours a month. But I’m paying $80/month for the AI tool, dealing with the client education headache, and worrying about liability.

Is the juice worth the squeeze?

Why I’m Looking at Beancount

Part of why I’m here: I’m wondering if Beancount’s plain text approach is actually the answer to the AI liability question.

Instead of using black-box AI categorization software, what if I:

  1. Use simple rule-based importers (“Target” → Expenses:Groceries)
  2. Have AI suggest categories for ambiguous transactions (via API, not embedded)
  3. Review and approve in plain text where I can see everything
  4. Version control the whole process
  5. Show clients exactly what I reviewed (transparency!)

This feels more defensible than clicking “looks good” in QuickBooks AI and hoping for the best.

The Engagement Letter Question

Alice, I’d really like to see what engagement letter language you’re drafting. Right now my engagement letters say nothing about AI, which probably isn’t sustainable.

I’m thinking about adding something like:

“Bookkeeping services may utilize AI-assisted categorization tools to improve efficiency. All AI-generated categorizations are subject to professional review before finalization. Client remains responsible for providing accurate transaction descriptions and supporting documentation. Bookkeeper maintains final responsibility for accuracy of financial records.”

But I honestly don’t know if that protects me, protects the client, or protects nobody.

What are others using?

Thank you all for these thoughtful responses. This is exactly the kind of wisdom I was hoping for.

Key Themes I’m Hearing

1. Professional responsibility hasn’t changed (Tina’s point)
The IRS doesn’t care about our tools—they care about accuracy. “The AI said so” is not a defense, and preparer penalties apply regardless.

2. Transparency is crucial (Mike’s workflow)
Being able to see, query, and version control AI decisions makes them auditable. Beancount’s plain text format isn’t just technically elegant—it’s professionally defensible.

3. Time savings are real but modest (Bob’s question)
AI categorization isn’t eliminating review work—it’s changing it from manual entry to validation. The time savings exist, but they’re not revolutionary.

Engagement Letter Language (First Draft)

Based on this discussion and my broker conversations, here’s what I’m drafting:

Use of Technology Tools: Our firm may utilize artificial intelligence and automated categorization tools to improve efficiency in processing financial transactions. All technology-assisted work products are subject to professional review and supervision before finalization. Use of AI tools does not diminish our professional responsibility for accuracy, nor does it transfer liability to software vendors.

Client Responsibilities: Client agrees to (1) provide accurate and complete transaction information, (2) review categorization summaries and report discrepancies promptly, and (3) maintain supporting documentation for all transactions. Client acknowledges that AI categorization requires sufficient transaction detail to be effective.

Limitation on AI Liability: Client understands that AI tools are assistive technologies, not autonomous decision-makers. Final professional judgment rests with the CPA/preparer. Our professional liability insurance covers our work product, which includes AI-assisted services subject to our review and supervision.

Still working on this with my attorney, but wanted to share the draft.

Next Steps for Me

  1. Insurance clarity: Getting explicit confirmation from E&O carrier that AI-assisted work (with documented review procedures) is covered
  2. Review procedures documentation: Writing down exactly what constitutes “reasonable review” for different transaction types
  3. Beancount validation queries: Building the anomaly detection queries Mike mentioned
  4. Client communication: Creating a one-pager explaining AI in bookkeeping without the hype

Community Resource Idea

Would there be interest in creating a shared resource on AI validation workflows? Things like:

  • Sample Beancount queries for anomaly detection
  • Review procedure checklists
  • Engagement letter templates
  • Documentation best practices

I’m happy to start a wiki page or shared document if others want to contribute.

Tina, your point about preparer penalties really drove this home. We’re in a profession where the stakes are high—our licenses, our clients’ financial health, potential penalties. AI is a powerful tool, but it doesn’t change the fundamentals of professional responsibility.

Thanks for the reality check, everyone. This community is invaluable.

This hits close to home. I had a near-miss with AI categorization last year that scared me straight.

My “Oh Crap” Moment

I was using a popular AI tool to categorize my rental property expenses. The AI confidently categorized a $4,200 personal vacation charge (hotel in Hawaii) as “property maintenance” because the description included “property stay.”

I almost missed it. I was doing my usual quick review, scanning for anything obviously wrong, and it just… looked fine. Line item: “Waikiki Property - 7 nights” categorized as maintenance. The dollar amount wasn’t crazy for property work.

I only caught it because I happened to remember that specific vacation and did a double-take. If it had been a smaller amount or something less memorable, it would’ve sailed through.

That was my wake-up call: AI doesn’t understand context, and “quick review” isn’t good enough.

My Current Validation Workflow

Here’s what I do now, using Beancount:

Step 1: AI categorizes
I let the AI do its thing with my bank imports. This is still valuable—it gets me 90% of the way there.

Step 2: Anomaly queries
I run Beancount queries every month to find unusual patterns. This shows my top expense accounts and flags anything unusual.

Step 3: Category sanity checks
I maintain a list of “normal” expense ranges for my accounts. If “Office Supplies” suddenly jumps from $200/month to $2,000, I review every transaction in that category.

Step 4: Git diff review
Because Beancount is plain text, I can use git diff to see exactly what the AI changed versus last month. New categories or unusual patterns stand out.

Why This Works Better Than Proprietary Tools

With closed-source accounting software, I was clicking “approve” on AI categorizations in a black box. I couldn’t easily:

  • See patterns across time
  • Write custom anomaly detection
  • Prove what I reviewed
  • Version control my review process

Beancount’s transparency makes the AI’s work auditable. When the AI miscategorizes something, I can see it, fix it, and document why I changed it.

The Time Trade-off

Is this more work than blind trust in AI? Yes. Is it more work than manual categorization? No—still saving probably 5-6 hours per month.

The key insight: AI should accelerate my review, not replace it. The AI does the grunt work, I do the judgment calls.

Alice, to your question about what constitutes “reasonable review”—I think it’s risk-based. High-dollar transactions, new categories, unusual patterns: manual review. Routine recurring expenses under $100: spot checks are probably fine.

But I’m just a hobbyist tracking rental properties, not a CPA with 80 clients. I’m curious how this scales for professionals.