33 posts tagged with "Plain-Text Accounting"

AIMachine LearningLLMData SciencePlain-Text AccountingBeancountQueriesAutomation

TAPAS: Weakly Supervised Table QA Without SQL, and What It Means for Beancount

TAPAS (Google Research, ACL 2020) answers table questions by selecting cells and applying scalar aggregations — no SQL generated. This post analyzes the architecture, its 12-point SQA accuracy gain, and why the cell-selection paradigm fits small Beancount ledger queries but breaks down at scale.

AILLMDatabaseQueriesBeancountPlain-Text AccountingMachine Learning

DIN-SQL: Decomposed In-Context Learning for Text-to-SQL

DIN-SQL (NeurIPS 2023) decomposes text-to-SQL into schema linking, complexity classification, and SQL generation stages, lifting GPT-4 from 67.4% to 85.3% execution accuracy on Spider without fine-tuning — and the same decomposition strategy maps directly onto natural language interfaces for Beancount's BQL query language.

BeancountAILLMDatabaseQueriesMachine LearningPlain-Text Accounting

BIRD Benchmark: The Real-Database Gap in LLM Text-to-SQL

The BIRD benchmark (NeurIPS 2023) tests LLMs on 95 real databases — GPT-4 reaches only 54.89% execution accuracy with domain hints and 34.88% without, a 20-point gap that directly shapes what a natural-language BQL interface for Beancount would need to solve.

AILLMMachine LearningBeancountPlain-Text AccountingData ScienceQueries

GraphRAG: From Local to Global Query-Focused Summarization

Microsoft's GraphRAG builds a Leiden-partitioned entity graph over a text corpus and precomputes community summaries to answer global sensemaking questions that standard vector RAG cannot handle — but a 2025 bias audit shows its 72–83% win rates collapse after correcting for position and length artifacts in LLM-as-judge evaluation.

AILLMMachine LearningBeancountPlain-Text AccountingTechnologyRAG

StructRAG (ICLR 2025): Picking the Right Document Structure Beats GraphRAG by 28 Points

StructRAG (ICLR 2025) routes each query to a task-appropriate structure type — table, graph, catalogue, algorithm, or chunk — before reasoning, scoring 28 points higher than GraphRAG on the Loong benchmark while running 22× faster, with the DPO-trained router alone accounting for a 15-point accuracy gain.

AIMachine LearningLLMBeancountData SciencePlain-Text Accounting

Fusion-in-Decoder: How Multi-Passage Retrieval Improves Generative QA

Izacard and Grave's FiD architecture independently encodes retrieved passages then fuses them in the decoder, outperforming RAG-Sequence by 4–11 points on NQ and TriviaQA. This post examines the design and its implications for Beancount ledger QA, where multi-entry synthesis across transactions is the norm.

AILLMMachine LearningAutomationPlain-Text AccountingBeancountFinance

IRCoT: Interleaving Retrieval with Chain-of-Thought for Multi-Step QA

IRCoT interleaves BM25 retrieval with each step of a chain-of-thought reasoning loop, achieving +11.3 retrieval recall and +7.1 F1 on HotpotQA over one-step RAG — and shows a 3B model can beat GPT-3 175B when retrieval strategy is right.

AIMachine LearningLLMData SciencePlain-Text AccountingBeancount

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

Lewis et al.'s NeurIPS 2020 paper introduced the hybrid RAG architecture—a BART-large generator paired with a FAISS-indexed retriever over 21 million Wikipedia passages—achieving 44.5 EM on Natural Questions and establishing the parametric/non-parametric split that now underlies most production AI systems. This review covers RAG-Sequence vs. RAG-Token trade-offs, the retrieval collapse failure mode, and what stale indexes mean for financial AI built on append-only Beancount ledgers.

AILLMMachine LearningAutomationBeancountPlain-Text AccountingDecision-making

LATS: Language Agent Tree Search — 추론, 행동, 계획을 하나의 프레임워크로 통합

LATS(Language Agent Tree Search, ICML 2024)는 ReAct, Tree of Thoughts, Reflexion을 단일 MCTS 프레임워크로 통합하여 GPT-4와 함께 HumanEval에서 92.7%의 pass@1을 달성했습니다. Git 기반의 Beancount 장부의 경우, 운영 환경에서 LATS를 제한하는 상태 복원 요구 사항을 아주 쉽게 충족할 수 있습니다.

AIMachine LearningLLMTechnologyFinanceBeancountPlain-Text Accounting

Self-RAG: Adaptive Retrieval and Self-Critique for LLMs

Self-RAG (ICLR 2024 Oral) trains a language model to decide when to retrieve and then grade its own results using four reflection tokens — reaching 55.8% on PopQA and 80.2 FactScore on biographies while outperforming ChatGPT on five benchmarks. Analysis covers the mechanism, ablation results, reproducibility limits, and implications for finance AI agents over Beancount ledgers.

Everything About Plain-Text Accounting

TAPAS: Weakly Supervised Table QA Without SQL, and What It Means for Beancount

DIN-SQL: Decomposed In-Context Learning for Text-to-SQL

BIRD Benchmark: The Real-Database Gap in LLM Text-to-SQL

GraphRAG: From Local to Global Query-Focused Summarization

StructRAG (ICLR 2025): Picking the Right Document Structure Beats GraphRAG by 28 Points

Fusion-in-Decoder: How Multi-Passage Retrieval Improves Generative QA

IRCoT: Interleaving Retrieval with Chain-of-Thought for Multi-Step QA

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

LATS: Language Agent Tree Search — 추론, 행동, 계획을 하나의 프레임워크로 통합

Self-RAG: Adaptive Retrieval and Self-Critique for LLMs

Get started with Beancount.io

Getting Started

Features

Community

Legal