4 posts tagged with "Forecasting"

AILLMAutomationReconciliationBeancountCash FlowFinancial ManagementForecasting

Can LLM Agents Be CFOs? EnterpriseArena's 132-Month Simulation Reveals a Wide Gap

EnterpriseArena runs 11 LLMs through a 132-month CFO simulation tracking survival, terminal valuation, and book-closing rates. Only Qwen3.5-9B survives 80% of runs; GPT-5.4 and DeepSeek-V3.1 hit 0%. Human experts achieve 100% survival at 5× the terminal value. The critical bottleneck: LLMs skip ledger reconciliation 80% of the time, acting on stale financial state.

LLMAIFinanceMachine LearningForecastingDecision-makingData Science

InvestorBench: Benchmarking LLM Agents on Financial Trading Decisions

InvestorBench (ACL 2025) tests 13 LLM backbones on backtested stock, crypto, and ETF trading using cumulative return and Sharpe ratio — not QA accuracy. Qwen2.5-72B tops the stock leaderboard at 46.15% CR; finance-tuned models backfire on equities. Model size predicts performance more reliably than domain fine-tuning.

AIMachine LearningForecastingData ScienceLLMFinanceBeancount

LLMs Are Not Useful for Time Series Forecasting: What NeurIPS 2024 Means for Finance AI

A NeurIPS 2024 Spotlight paper ablates three LLM-based time series forecasting methods — OneFitsAll, Time-LLM, and CALF — and finds that removing the language model improves accuracy in most cases, with up to a 1,383× training speedup. For finance AI applications like Beancount balance prediction, lightweight purpose-built models consistently beat repurposed LLMs.

AILLMMachine LearningFinanceForecastingData ScienceBeancount

FinBen: Benchmarking LLMs Across 36 Financial Tasks — Implications for Accounting AI

FinBen evaluates 15 LLMs across 36 financial datasets at NeurIPS 2024, finding GPT-4 reaches 0.63 Exact Match on numerical QA and 0.54 on stock movement forecasting — near chance. Here is what those numbers mean for building a reliable accounting agent on a Beancount ledger.

Everything About Forecasting

Can LLM Agents Be CFOs? EnterpriseArena's 132-Month Simulation Reveals a Wide Gap

InvestorBench: Benchmarking LLM Agents on Financial Trading Decisions

LLMs Are Not Useful for Time Series Forecasting: What NeurIPS 2024 Means for Finance AI

FinBen: Benchmarking LLMs Across 36 Financial Tasks — Implications for Accounting AI

Get started with Beancount.io

Getting Started

Features

Community

Legal