Can LLM Agents Be CFOs? EnterpriseArena's 132-Month Simulation Reveals a Wide Gap
EnterpriseArena runs 11 LLMs through a 132-month CFO simulation tracking survival, terminal valuation, and book-closing rates. Only Qwen3.5-9B survives 80% of runs; GPT-5.4 and DeepSeek-V3.1 hit 0%. Human experts achieve 100% survival at 5× the terminal value. The critical bottleneck: LLMs skip ledger reconciliation 80% of the time, acting on stale financial state.
