I’ve been thinking a lot about the future of plain text accounting lately, and I wanted to start a discussion about something that’s been bugging me: format fragmentation and AI tool compatibility.
The Problem I Ran Into
A few months ago, I got excited about building an AI-powered transaction categorization tool for the PTA community. You know, something that could learn from your past transactions and automatically suggest categories for new ones. Seemed like a perfect project to give back to this community that’s taught me so much over the past 4 years.
But then I hit a wall I didn’t expect: parsing nightmare.
See, Beancount, hledger, and Ledger—our three main plain text accounting systems—each use different syntax for the exact same concepts:
- Date formats: Beancount uses
YYYY-MM-DD, but some Ledger files useYYYY/MM/DD - Account declarations: Beancount requires explicit
opendirectives, hledger auto-detects accounts - Metadata: Beancount uses
@tags, hledger uses semicolon comments - Transaction syntax: Subtle differences in spacing, indentation, and posting format
I ended up building three separate parsers. That’s triple the development work, triple the testing, triple the maintenance. For a side project built by one person, it was simply unsustainable.
The Bigger Picture
This isn’t just about my tool. It affects the entire ecosystem:
-
AI training data fragments: When AI models train on Beancount syntax, they struggle with hledger files (and vice versa). We could have much better AI tools if training data could aggregate across all PTA users instead of being siloed by format.
-
Tool developers must choose: Most small tools can’t afford to support all three formats, so they pick one—which means 2/3 of the community can’t use them.
-
Newcomer confusion: Someone investigating PTA has to choose a tool before they even understand accounting. Format incompatibility makes switching later very costly.
A Question for the Community
Should we consider standardizing on a common plain text accounting format?
I’m not saying “everyone must use Beancount syntax” or “hledger’s format is the one true way.” I’m asking: Is there value in the community coming together to define a shared base format that all tools could support?
Potential Benefits
- AI tools work across all PTA systems
- Easier to switch tools if your needs change
- Shared documentation, tutorials, and examples
- Lower barrier to entry for new users
- Stronger ecosystem overall (rising tide lifts all boats)
Real Challenges
- Existing users have years of ledger files in their current format
- Migration could introduce errors (scary for financial data)
- Each format has philosophical reasons for its design choices
- Who would decide what the standard looks like?
- How do we maintain tool diversity while standardizing format?
Possible Approaches?
I’ve been thinking about a few paths forward:
-
PTA Standard Format + Transpilers: Define a core standard format, create converters to/from each tool’s native format. Tools could keep their unique features but share a common base.
-
Gradual Convergence: Tools could slowly adopt each other’s best practices over time without forced migration.
-
Format Extensions: Keep core syntax shared, allow tool-specific extensions for advanced features.
-
Do Nothing: Maybe format diversity is actually a feature, not a bug? Different approaches serve different needs.
What Do You Think?
I’m genuinely curious what this community thinks. Those of you who’ve been using plain text accounting for years—is compatibility with other tools important to you? Or is the specific format you’ve chosen essential to your workflow?
For those building tools—would a common format help or hurt innovation?
And for newcomers—did format incompatibility affect your tool choice?
Looking forward to hearing everyone’s thoughts! ![]()
Sources & Further Reading: