The Anti-Bot Approach: Why Manual Review Still Matters in RPA-Automated Bookkeeping

I’ve been running Martinez Bookkeeping Services for a decade now, and over the past year, I’ve watched RPA (Robotic Process Automation) tools sweep through the accounting world like wildfire. Every week, at least one client asks me: “Bob, can’t we just let the bots handle everything?”

I get it. The promise is seductive. Invoice processing bots that read PDFs, extract data, categorize transactions, and populate your books—all while you sleep. No more manual data entry. No more late nights during month-end close. Just set it and forget it, right?

Well, not quite. And I learned this the hard way.

The Wake-Up Call

Three months ago, I set up an RPA workflow for one of my restaurant clients using a popular automation platform. The bot would scan incoming invoices from suppliers, extract vendor names, amounts, dates, and line items, then auto-categorize everything into our Beancount ledger. For the first few weeks, I was in heaven. What used to take me 4-5 hours of manual entry every week dropped to maybe 30 minutes of spot-checking.

Then came the month-end reconciliation. I ran my standard balance assertion checks in Beancount—something I religiously do at the end of every month—and boom. The assertion failed. Off by $2,847.

I dug in and discovered the bot had been miscategorizing split invoices. When a supplier charged for both food inventory AND kitchen equipment on the same invoice, the bot would lump everything into “Cost of Goods Sold” instead of properly splitting between COGS and Capital Equipment. For three months, this restaurant’s books showed inflated food costs and missing asset purchases.

If I hadn’t had those balance assertions in place, this could have created a nightmare during tax season or, worse, during a potential audit.

RPA is Powerful, But Not Infallible

Here’s what I’ve learned: RPA tools are phenomenal at handling repetitive, rule-based tasks. They’re tireless, fast, and incredibly efficient. But they’re also literal—they do exactly what you tell them to do, even when that’s not what you actually want.

The problem isn’t the technology itself. The problem is treating automation as a replacement for judgment rather than a tool that extends your capabilities.

My Current Workflow: Automation + Validation

After that wake-up call, I redesigned my approach. Now, here’s what happens:

  1. RPA handles the grunt work: Bots extract data from invoices, bank statements, and receipts
  2. Beancount balance assertions act as checkpoints: At every month-end, I have assertions for key accounts
  3. Manual review of flagged items: When assertions fail or when the bot marks something as “uncertain,” I review it personally
  4. Quarterly deep dives: Every quarter, I manually audit a random sample of transactions to catch patterns the bot might be missing

Think of it like the aviation industry: autopilot flies the plane 95% of the time, but pilots are always there, monitoring systems, ready to intervene when something goes sideways.

The Balance Assertion Safety Net

If you’re using Beancount with any level of automation—whether it’s full RPA or just custom import scripts—balance assertions are your best friend. They force your ledger to match reality at specific points in time.

For example:

2026-02-28 balance Assets:Bank:Checking  15234.67 USD
2026-02-28 balance Liabilities:CreditCard:Amex  -3421.18 USD

If your automated imports have been quietly making mistakes—duplicating transactions, miscategorizing, or missing entries—these assertions will catch it before it compounds. It’s like having a smoke detector in your house. Most of the time, nothing’s wrong. But when something IS wrong, you want to know immediately.

The Question I’m Wrestling With

So here’s what I’m curious about from this community: How are you balancing automation with oversight in your Beancount workflows?

  • Are you using RPA tools, or sticking with simpler import scripts?
  • What validation mechanisms do you rely on beyond balance assertions?
  • Have you ever had automation fail you in a significant way?
  • Where do you draw the line between “automate this” and “I need human eyes on this”?

I’m not anti-automation. Far from it. RPA has saved me and my clients hundreds of hours this year alone. But I think we’re at a critical moment where the hype around “full automation” needs to be tempered with the reality that financial data is too important to trust blindly to bots.

The goal isn’t to eliminate human involvement—it’s to free humans to focus on the high-judgment work that actually matters.

What’s your take?

Bob, this hits home in a way that’s honestly a little scary. As a CPA, I can’t tell you how many times I’ve heard the siren song of “full automation” at industry conferences. The vendors all promise the same thing: accuracy rates above 95%, seamless integrations, audit-ready outputs. And you know what? Sometimes it’s true. But that other 5%? That’s where professional liability lives.

The Citigroup Lesson We Should All Remember

You mentioned your $2,847 discrepancy. Let me share something that should terrify anyone considering blind automation: In 2020, Citigroup accidentally wired $900 million to Revlon’s lenders due to a combination of automated system design flaws and human error. The bank’s automated payment system had confusing UI/UX, and when a staffer tried to make a small interest payment, the system sent the entire loan principal instead.

The kicker? A federal court ruled that the recipients could keep the money because it looked like a legitimate loan repayment. Citigroup lost nearly a billion dollars because they over-trusted their automation and didn’t have adequate human validation checkpoints.

Now, I know we’re not dealing with $900 million wire transfers in our Beancount ledgers. But the principle is identical: automated systems can fail in catastrophic ways when we don’t understand their limitations.

Why CPAs Can’t Outsource Judgment to Bots

Here’s my reality as a CPA: I carry professional liability insurance because I’m expected to apply professional judgment to financial matters. When I sign off on a client’s financials or prepare their tax return, I’m certifying that I’ve reviewed the information and believe it to be accurate.

I can use RPA to extract invoice data. I can use AI to suggest categorizations. But I cannot—legally, ethically, or practically—outsource my professional judgment to a bot. If the IRS audits my client and asks, “Why did you categorize this $15,000 payment as a business expense?” the answer cannot be “Because the AI said so.”

The 2026 standard the industry is converging on is “Audit-Ready AI”—systems that are auditable, explainable, and secure. If you can’t explain how your automation made a decision, it’s not audit-ready. And if it’s not audit-ready, you’re taking on risk you probably don’t fully understand.

My Workflow: RPA as Assistant, Not Autopilot

I use RPA tools extensively in my practice—primarily for data extraction from invoices, bank statements, and receipts. But here’s my non-negotiable workflow:

  1. RPA extracts and categorizes based on rules and historical patterns
  2. Beancount balance assertions checkpoint every month-end for all material accounts (checking, savings, credit cards, key A/R and A/P accounts)
  3. Manual review of exceptions: Anything the bot flags as uncertain or that fails a balance assertion gets human eyes
  4. Month-end reconciliation ritual: I spend 1-2 hours every month-end reviewing transaction patterns, looking for anomalies the bot might miss
  5. Quarterly deep audit: Random sampling of 50-100 transactions to verify categorization accuracy

The balance assertions are absolutely critical. They’re like the bulkheads in a ship—if one section springs a leak (bad automation), the assertions prevent the entire ship from sinking (your books becoming unreliable).

The Balance Assertion Pattern I Swear By

For anyone wondering what this looks like in practice, here’s my standard pattern for a small business client:

; Month-end balance assertions
2026-02-28 balance Assets:Bank:BusinessChecking     45678.23 USD
2026-02-28 balance Assets:Bank:Savings              20000.00 USD
2026-02-28 balance Liabilities:CreditCard:Chase    -5432.10 USD
2026-02-28 balance Assets:AccountsReceivable       12500.00 USD
2026-02-28 balance Liabilities:AccountsPayable     -8900.00 USD

These take me literally 5 minutes to add each month (I pull the real balances from online banking and accounting systems), but they’ve caught automation errors at least a dozen times in the past year. The bot might miss a duplicate transaction, miscategorize a refund, or fail to import a bank fee. The balance assertion catches it immediately.

Trust, But Verify—Always

I love RPA. It’s saved my practice hundreds of hours and allowed me to serve more clients without hiring additional staff. But I will never, ever trust it blindly.

Your aviation analogy is perfect, Bob. Autopilot is incredible technology that makes flying safer and more efficient. But you don’t see pilots leaving the cockpit during flight. They’re monitoring systems, reviewing instrumentation, and ready to take manual control the instant something looks wrong.

That’s exactly how we should approach RPA in accounting: as a powerful assistant that handles the repetitive work, freeing us to focus on the high-judgment tasks that actually require professional expertise.

The moment we stop validating our automation is the moment we start taking on risk we can’t quantify.

This is exactly the conversation we need to be having right now. I’m someone who lives and breathes automation—I’ve built custom importers, written Python scripts to categorize transactions, and even experimented with AI categorization tools. But I learned a very expensive lesson about blind trust in automation.

My $1,200 Tax Reporting Mistake

About 18 months ago, I got excited about a new AI-powered transaction categorization tool that integrated with Beancount. It promised to learn from my historical patterns and auto-categorize everything with “industry-leading accuracy.” I fed it two years of transaction history, and it started categorizing my bank and credit card imports automatically.

For three months, I barely looked at the actual transactions. I trusted the AI. I glanced at my Fava dashboard occasionally, saw that everything looked reasonable, and moved on with my life.

Then tax season arrived.

When I generated my Schedule C for my side consulting work, I noticed something weird: my business mileage deduction was about 40% lower than I expected. I dug into the transactions and discovered the AI had been categorizing my monthly parking garage fee (which I use exclusively for client meetings) as “Personal:Transportation” instead of “Business:Travel:Parking.”

For three months, it miscategorized $400/month. That’s $1,200 in legitimate business expenses that almost didn’t make it onto my tax return. At my marginal tax rate, that would have cost me roughly $350 in unnecessary taxes.

Not catastrophic, but definitely a wake-up call.

The False Sense of Security

Here’s what’s insidious about automation: when it works 95% of the time, you start to assume it works 100% of the time. Your brain gets lazy. You stop checking. You trust the patterns.

Bob’s restaurant client example is perfect—split invoices are exactly the kind of edge case that trips up automation. The bot doesn’t “understand” context. It doesn’t know that a single invoice might represent two fundamentally different types of expenses. It just applies pattern matching.

And when the pattern doesn’t perfectly match the reality? Mistakes happen silently.

My Current Workflow: Automate 95%, Review the Critical 5%

After my tax mistake, I completely redesigned my approach:

  1. Automation handles the obvious stuff: RPA imports transactions, AI suggests categorizations for routine expenses (groceries, utilities, subscriptions)
  2. Monthly BQL anomaly detection: I run custom Beancount queries to find outliers:
    • Unusually large transactions
    • New merchants I haven’t seen before
    • Categories with unusual spending patterns (e.g., “Why did I spend 50% more on ‘Dining’ this month?”)
  3. Balance assertions at every month-end: Checking, savings, credit cards, investment accounts
  4. Manual review of business expenses: I never, ever let automation categorize business expenses without my review. Ever.
  5. Quarterly reconciliation against external sources: I pull official statements and verify my Beancount balances match reality

The key insight: automation should handle the boring, high-volume, low-stakes transactions. Humans should handle anything that requires judgment or has tax/financial implications.

The BQL Query That Saves Me

For anyone curious, here’s a Beancount query I run monthly to catch anomalies:

SELECT 
  date, 
  payee, 
  account, 
  COST(position) as amount
WHERE 
  account ~ 'Expenses' 
  AND COST(position) > 200 USD
  AND date >= 2026-03-01
ORDER BY COST(position) DESC;

This shows me all expense transactions over $200 for the current month. Any large expense gets a manual review to ensure it’s categorized correctly. Takes me maybe 10 minutes a month, but it’s caught multiple automation errors.

Trust Your Data, Not Your Tools

Alice’s point about “audit-ready AI” is spot-on. If I can’t explain to the IRS why I categorized something a certain way, it’s not good enough. “My AI told me to” isn’t a defense.

The beauty of Beancount—and plain text accounting in general—is that it gives you full transparency. I can git log my transaction history. I can see exactly when a categorization changed. I can trace every decision.

But that transparency only helps if you actually look at it.

RPA is incredible. Automation is the future. But the goal isn’t to replace human judgment—it’s to free humans from drudgery so they can focus on the work that actually requires thought.

Automate the boring. Validate the critical. Always.