Skip to main content
Machine Learning

Everything About Machine Learning

85 articles
Machine learning techniques for financial data analysis and automation

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

Lewis et al.'s NeurIPS 2020 paper introduced the hybrid RAG architecture—a BART-large generator paired with a FAISS-indexed retriever over 21 million Wikipedia passages—achieving 44.5 EM on Natural Questions and establishing the parametric/non-parametric split that now underlies most production AI systems. This review covers RAG-Sequence vs. RAG-Token trade-offs, the retrieval collapse failure mode, and what stale indexes mean for financial AI built on append-only Beancount ledgers.

ConvFinQA: Multi-Turn Financial QA and the 21-Point Gap Between Models and Human Experts

ConvFinQA (EMNLP 2022) extends FinQA into multi-turn conversation over S&P 500 earnings reports, finding that the best fine-tuned model achieves 68.9% execution accuracy versus 89.4% for human experts—and drops to 52.4% on hybrid multi-aspect conversations where models must carry numerical context across different financial topics.

FinQA: The Benchmark Measuring AI Numerical Reasoning on Financial Reports

FinQA (EMNLP 2021) built 8,281 QA pairs from S&P 500 earnings reports requiring multi-step arithmetic programs. Neural models scored 61% at release versus 91% for human experts; accuracy collapses to 22% on three-or-more-step programs. The failure modes — domain constants, cross-modality grounding, chain length — map directly to the challenges Beancount agents face today.

FinanceBench: Why Vector-Store RAG Fails on Real Financial Documents

FinanceBench evaluates 16 AI configurations against 10,231 questions from real SEC filings; shared-vector-store RAG answers correctly only 19% of the time, and even GPT-4-Turbo with the oracle passage reaches just 85% accuracy — showing that numerical reasoning, not retrieval, is the binding constraint for enterprise finance AI.