跳到主要内容

Transaction Validation

关于一切 Transaction Validation

4 篇文章

Validating and verifying financial transactions using language model agents

返回所有帖子查看所有标签

LLMBeancountPlain-Text AccountingAIMachine LearningFinancial LiteracyDouble-EntryTransaction Validation

LLM 在 Beancount DSL 生成中得分仅为 2.3%：LLMFinLiteracy 基准测试

LLMFinLiteracy 基准测试发现，五个约 7B 参数的权重开放模型生成完全正确的 Beancount 交易的成功率仅为 2.3%。失败原因集中在会计推理而非语法上，这表明“编译器在环”反馈是构建可靠回写代理的关键缺失环节。

AILLMAutomationSecurityMachine LearningTransaction ValidationTrust

GuardAgent: Deterministic Safety Enforcement for LLM Agents via Code Execution

GuardAgent (ICML 2025) places a separate LLM agent between a target agent and its environment, verifying every proposed action by generating and running Python code — achieving 98.7% policy enforcement accuracy while preserving 100% task completion, versus 81% accuracy and 29–71% task failure for prompt-embedded safety rules.

AILLMMachine LearningAutomationBeancountTransaction Validation

多智能体 LLM 辩论：真实的准确率提升、未受控的计算开销与集体幻觉

深入解读 Du 等人的 ICML 2024 多智能体辩论论文——该研究报告称算术准确率提升了 14.8 个百分点——同时参考了 2025 年的反驳研究（显示在同等预算下，单智能体表现与辩论持平），并分析了为何集体幻觉（占辩论失败案例的 65%）会对 AI 辅助的账本提交构成特定风险。

AILLMMachine LearningAutomationReconciliationFinanceError PreventionTransaction Validation

CRITIC：为什么大模型自我修正需要外部工具反馈

CRITIC (ICLR 2024) 通过将大语言模型（LLM）的修订建立在外部工具信号的基础上，在开放域问答中实现了 7.7 的 F1 值提升，并减少了 79.2% 的有害内容——这种“先验证后修正”的循环直接对应了 Beancount 金融代理的回写安全机制。

开启 Beancount.io 之旅

使用我们的开源复式记账系统掌控你的财务。今天就开始你的账本。

免费开始使用查看定价

入门指南

功能特性

社区

法律合规

© 2019 - 2026 Beancount.io

在 App Store 下载

在 Google Play 获取

秉承透明理念 • 版本控制 • AI 驱动