85 posts tagged with "Machine Learning"

AILLMMachine LearningAutomationFinanceBeancount

DSPy: Replacing Brittle Prompt Engineering with Compiled LLM Pipelines

DSPy replaces hand-crafted prompt strings with declarative signatures and a metric-driven compiler—boosting Llama2-13b from 9.4% to 46.9% on GSM8K math reasoning and offering a more maintainable path for production finance AI pipelines.

AILLMMachine LearningAutomationBeancountPlain-Text AccountingDecision-making

LATS: Language Agent Tree Search — 추론, 행동, 계획을 하나의 프레임워크로 통합

LATS(Language Agent Tree Search, ICML 2024)는 ReAct, Tree of Thoughts, Reflexion을 단일 MCTS 프레임워크로 통합하여 GPT-4와 함께 HumanEval에서 92.7%의 pass@1을 달성했습니다. Git 기반의 Beancount 장부의 경우, 운영 환경에서 LATS를 제한하는 상태 복원 요구 사항을 아주 쉽게 충족할 수 있습니다.

AIMachine LearningLLMTechnologyFinanceBeancountPlain-Text Accounting

Self-RAG: Adaptive Retrieval and Self-Critique for LLMs

Self-RAG (ICLR 2024 Oral) trains a language model to decide when to retrieve and then grade its own results using four reflection tokens — reaching 55.8% on PopQA and 80.2 FactScore on biographies while outperforming ChatGPT on five benchmarks. Analysis covers the mechanism, ablation results, reproducibility limits, and implications for finance AI agents over Beancount ledgers.

AILLMMachine LearningAutomationBeancountPlain-Text AccountingReconciliation

Voyager: Skill Libraries as the Foundation for Lifelong AI Agent Learning

Voyager, a GPT-4-powered Minecraft agent from NVIDIA and Caltech, demonstrates that a persistent code skill library enables genuine lifelong learning without fine-tuning — discovering 3.3× more items than prior state-of-the-art. The pattern maps directly onto long-horizon Beancount ledger automation, though financial correctness demands staging layers that game sandboxes never require.

LLMAIMachine LearningBeancountPlain-Text AccountingFinanceAutomation

HippoRAG: Neurobiologically Inspired Long-Term Memory for LLMs

HippoRAG (NeurIPS 2024) builds a knowledge graph from OpenIE triples and applies Personalized PageRank at query time, reaching 89.1% Recall@5 on 2WikiMultiHopQA versus 68.2% for ColBERTv2—with direct implications for querying complex financial ledgers across multi-year transaction histories.

AILLMMachine LearningAutomationBeancountPlain-Text AccountingTechnology

AgentBench：评估作为代理的 LLM —— 对金融 AI 可靠性的启示

AgentBench（Liu 等人，ICLR 2024）在 8 个交互式环境中对 27 个大语言模型进行了基准测试 —— GPT-4 的综合得分为 4.01，而表现最好的开源模型仅为 0.96。三种主要的失败模式（知识图谱失败中 67.9% 为超出任务限制、数据库失败中 53.3% 为格式错误以及无效操作）直接对应了在真实账本上部署 Beancount 回写代理的风险。

LLMAIMachine LearningFinanceFintechBeancountPlain-Text Accounting

BloombergGPT and the Limits of Domain-Specific LLMs in Finance

Bloomberg trained a 50B-parameter LLM on 569B tokens of financial data and beat general models on sentiment and table-reasoning benchmarks — then GPT-4 matched it without any finance-specific pretraining. What the $10M experiment reveals about domain pretraining trade-offs, tokenization of numbers, and why tool-use is more reliable than model internals for accounting agents.

AILLMMachine LearningAutomationPython APIDevelopersBeancount

Gorilla: How Retrieval-Aware Training Reduces LLM API Hallucinations from 78% to 11%

Gorilla (Patil et al., NeurIPS 2024) fine-tunes a 7B LLaMA model with Retriever-Aware Training on retrieved API documentation, cutting hallucination rates from 78% to 11% versus GPT-4 zero-shot — with direct implications for finance AI write-back agents where wrong account names or inverted signs are correctness failures, not annoyances.

AILLMMachine LearningAutomationBeancountPlain-Text AccountingTechnologyFinance

MemGPT: Virtual Context Management for LLM Agents

MemGPT applies OS-style virtual memory paging to LLMs, using three-tier storage — working memory, recall, and archival — to give agents persistent recall across sessions; on multi-session chat benchmarks, MemGPT with GPT-4 achieves 92.5% accuracy versus a 32.1% fixed-context baseline.

AILLMAutomationMachine LearningOpen SourceDevelopersPlain-Text AccountingBeancount

SWE-agent: How Interface Design Unlocks Automated Software Engineering

SWE-agent (NeurIPS 2024) introduces Agent-Computer Interfaces (ACIs) — purpose-built layers between LLMs and software environments — showing a 10.7-percentage-point improvement over raw shell access and 12.47% resolution on SWE-bench with GPT-4 Turbo. Interface design, not model capability, is the primary bottleneck for autonomous coding agents.

Everything About Machine Learning

DSPy: Replacing Brittle Prompt Engineering with Compiled LLM Pipelines

LATS: Language Agent Tree Search — 추론, 행동, 계획을 하나의 프레임워크로 통합

Self-RAG: Adaptive Retrieval and Self-Critique for LLMs

Voyager: Skill Libraries as the Foundation for Lifelong AI Agent Learning

HippoRAG: Neurobiologically Inspired Long-Term Memory for LLMs

AgentBench：评估作为代理的 LLM —— 对金融 AI 可靠性的启示

BloombergGPT and the Limits of Domain-Specific LLMs in Finance

Gorilla: How Retrieval-Aware Training Reduces LLM API Hallucinations from 78% to 11%

MemGPT: Virtual Context Management for LLM Agents

SWE-agent: How Interface Design Unlocks Automated Software Engineering

Get started with Beancount.io

Getting Started

Features

Community

Legal