35 posts tagged with "Finance"

AIMachine LearningLLMFinanceFinancial ReportingBeancount

FinQA: The Benchmark Measuring AI Numerical Reasoning on Financial Reports

FinQA (EMNLP 2021) built 8,281 QA pairs from S&P 500 earnings reports requiring multi-step arithmetic programs. Neural models scored 61% at release versus 91% for human experts; accuracy collapses to 22% on three-or-more-step programs. The failure modes — domain constants, cross-modality grounding, chain length — map directly to the challenges Beancount agents face today.

AILLMMachine LearningAutomationFinanceBeancount

DSPy: Replacing Brittle Prompt Engineering with Compiled LLM Pipelines

DSPy replaces hand-crafted prompt strings with declarative signatures and a metric-driven compiler—boosting Llama2-13b from 9.4% to 46.9% on GSM8K math reasoning and offering a more maintainable path for production finance AI pipelines.

AIMachine LearningLLMTechnologyFinanceBeancountPlain-Text Accounting

Self-RAG: Adaptive Retrieval and Self-Critique for LLMs

Self-RAG (ICLR 2024 Oral) trains a language model to decide when to retrieve and then grade its own results using four reflection tokens — reaching 55.8% on PopQA and 80.2 FactScore on biographies while outperforming ChatGPT on five benchmarks. Analysis covers the mechanism, ablation results, reproducibility limits, and implications for finance AI agents over Beancount ledgers.

LLMAIMachine LearningBeancountPlain-Text AccountingFinanceAutomation

HippoRAG: Neurobiologically Inspired Long-Term Memory for LLMs

HippoRAG (NeurIPS 2024) builds a knowledge graph from OpenIE triples and applies Personalized PageRank at query time, reaching 89.1% Recall@5 on 2WikiMultiHopQA versus 68.2% for ColBERTv2—with direct implications for querying complex financial ledgers across multi-year transaction histories.

LLMAIMachine LearningFinanceFintechBeancountPlain-Text Accounting

BloombergGPT and the Limits of Domain-Specific LLMs in Finance

Bloomberg trained a 50B-parameter LLM on 569B tokens of financial data and beat general models on sentiment and table-reasoning benchmarks — then GPT-4 matched it without any finance-specific pretraining. What the $10M experiment reveals about domain pretraining trade-offs, tokenization of numbers, and why tool-use is more reliable than model internals for accounting agents.

AILLMAutomationBeancountFinanceReconciliationMulti-Agent

AutoGen: Multi-Agent Conversation Frameworks for Finance AI

AutoGen (Wu et al., 2023) introduces a multi-agent conversation framework where LLM-backed agents pass messages to complete tasks; a two-agent setup lifts MATH benchmark accuracy from 55% to 69%, and a dedicated SafeGuard agent improves unsafe-code detection by up to 35 F1 points — findings directly applicable to building safe, modular Beancount automation pipelines.

AILLMMachine LearningAutomationBeancountPlain-Text AccountingTechnologyFinance

MemGPT: Virtual Context Management for LLM Agents

MemGPT applies OS-style virtual memory paging to LLMs, using three-tier storage — working memory, recall, and archival — to give agents persistent recall across sessions; on multi-session chat benchmarks, MemGPT with GPT-4 achieves 92.5% accuracy versus a 32.1% fixed-context baseline.

LLMAIMachine LearningAutomationBeancountFinanceTrust

LLMs Cannot Self-Correct Reasoning Yet — ICLR 2024 Findings and Finance AI Implications

Huang et al. (ICLR 2024) show that LLMs asked to review their own reasoning without external feedback consistently degrade accuracy — GPT-4 drops from 95.5% to 91.5% on GSM8K — and what this means for designing reliable Beancount journal entry agents.

AILLMMachine LearningAutomationReconciliationFinanceError PreventionTransaction Validation

CRITIC: Why LLM Self-Correction Requires External Tool Feedback

CRITIC (ICLR 2024) achieves 7.7 F1 gains on open-domain QA and a 79.2% toxicity reduction by grounding LLM revision in external tool signals — a verify-then-correct loop that maps directly onto write-back safety for Beancount finance agents.

AILLMMachine LearningAutomationFinanceData ScienceAnalytics

Себесъгласуваност: Изборът чрез мнозинство повишава точността на веригата от мисли

Себесъгласуваността заменя „алчното“ декодиране на веригата от мисли с гласуване с мнозинство върху N извлечени пътища на разсъждение — повишавайки точността на GPT-3 върху GSM8K със 17,9 процентни пункта без допълнително обучение — и се прилага директно към многостъпкови финансови изчисления, където единичното декодиране на модела е ненадеждно.

Everything About Finance

FinQA: The Benchmark Measuring AI Numerical Reasoning on Financial Reports

DSPy: Replacing Brittle Prompt Engineering with Compiled LLM Pipelines

Self-RAG: Adaptive Retrieval and Self-Critique for LLMs

HippoRAG: Neurobiologically Inspired Long-Term Memory for LLMs

BloombergGPT and the Limits of Domain-Specific LLMs in Finance

AutoGen: Multi-Agent Conversation Frameworks for Finance AI

MemGPT: Virtual Context Management for LLM Agents

LLMs Cannot Self-Correct Reasoning Yet — ICLR 2024 Findings and Finance AI Implications

CRITIC: Why LLM Self-Correction Requires External Tool Feedback

Себесъгласуваност: Изборът чрез мнозинство повишава точността на веригата от мисли

Get started with Beancount.io

Getting Started

Features

Community

Legal