4 posts tagged with "Decision-making"

AILLMAutomationMachine LearningBeancountDecision-makingPlain-Text AccountingTrust

Uncertainty-Aware Deferral for LLM Agents: When to Escalate from Small to Large Models

ReDAct runs a small model by default and escalates to an expensive model only when token-level perplexity signals uncertainty, achieving 64% cost savings over GPT-5.2-only while matching or exceeding its accuracy — a directly applicable pattern for Beancount transaction-categorization agents.

LLMAIFinanceMachine LearningForecastingDecision-makingData Science

InvestorBench: Benchmarking LLM Agents on Financial Trading Decisions

InvestorBench (ACL 2025) tests 13 LLM backbones on backtested stock, crypto, and ETF trading using cumulative return and Sharpe ratio — not QA accuracy. Qwen2.5-72B tops the stock leaderboard at 46.15% CR; finance-tuned models backfire on equities. Model size predicts performance more reliably than domain fine-tuning.

AILLMMachine LearningAutomationBeancountPlain-Text AccountingDecision-making

LATS: Language Agent Tree Search — 추론, 행동, 계획을 하나의 프레임워크로 통합

LATS(Language Agent Tree Search, ICML 2024)는 ReAct, Tree of Thoughts, Reflexion을 단일 MCTS 프레임워크로 통합하여 GPT-4와 함께 HumanEval에서 92.7%의 pass@1을 달성했습니다. Git 기반의 Beancount 장부의 경우, 운영 환경에서 LATS를 제한하는 상태 복원 요구 사항을 아주 쉽게 충족할 수 있습니다.

AILLMMachine LearningAutomationPlain-Text AccountingDecision-making

Tree of Thoughts: Deliberate Problem Solving with LLM Search

Tree of Thoughts (ToT) achieves 74% on Game of 24 vs 4% for standard GPT-4 CoT by organizing LLM reasoning into a branching search tree with pruning and backtracking — with direct implications for multi-step financial classification and tax optimization in Beancount workflows.

Everything About Decision-making

Uncertainty-Aware Deferral for LLM Agents: When to Escalate from Small to Large Models

InvestorBench: Benchmarking LLM Agents on Financial Trading Decisions

LATS: Language Agent Tree Search — 추론, 행동, 계획을 하나의 프레임워크로 통합

Tree of Thoughts: Deliberate Problem Solving with LLM Search

Get started with Beancount.io

Getting Started

Features

Community

Legal