8 posts tagged with "Trust"

LLMAIMachine LearningTrustFinanceData ScienceHallucination Detection

LLM Confidence and Calibration: A Survey of What the Research Actually Shows

A systematic survey of LLM confidence estimation and calibration methods—white-box logit approaches, consistency-based SelfCheckGPT, and semantic entropy—reveals that verbalized confidence scores from GPT-4 achieve only ~62.7% AUROC, barely above chance, with direct implications for deploying uncertainty-aware agents in finance and accounting.

AILLMAutomationMachine LearningBeancountDecision-makingPlain-Text AccountingTrust

Uncertainty-Aware Deferral for LLM Agents: When to Escalate from Small to Large Models

ReDAct runs a small model by default and escalates to an expensive model only when token-level perplexity signals uncertainty, achieving 64% cost savings over GPT-5.2-only while matching or exceeding its accuracy — a directly applicable pattern for Beancount transaction-categorization agents.

AILLMSecurityAutomationBeancountComplianceTrust

Verifiably Safe Tool Use for LLM Agents: STPA Meets MCP

CMU and NC State researchers propose using System-Theoretic Process Analysis (STPA) and a capability-enhanced Model Context Protocol to derive formal safety specifications for LLM agent tool use, with Alloy-based verification demonstrating absence of unsafe flows in a calendar scheduling case study.

AILLMSecurityAutomationMachine LearningTrustCompliance

AGrail: Adaptive Safety Guardrails for LLM Agents That Learn Across Tasks

AGrail (ACL 2025) introduces a two-LLM cooperative guardrail that adapts safety checks at inference time via test-time adaptation, achieving 0% prompt injection attack success and 95.6% benign action preservation on Safe-OS — compared to GuardAgent and LLaMA-Guard blocking up to 49.2% of legitimate actions.

AILLMMachine LearningSecurityComplianceAutomationTrustDevelopers

ShieldAgent: Verifiable Safety Policy Reasoning for LLM Agents

ShieldAgent (ICML 2025) replaces LLM-based guardrails with probabilistic rule circuits built on Markov Logic Networks, achieving 90.4% accuracy on agent attacks with 64.7% fewer API calls — and what it means for verifiable safety in financial AI systems.

AILLMAutomationSecurityMachine LearningTransaction ValidationTrust

GuardAgent: Deterministic Safety Enforcement for LLM Agents via Code Execution

GuardAgent (ICML 2025) places a separate LLM agent between a target agent and its environment, verifying every proposed action by generating and running Python code — achieving 98.7% policy enforcement accuracy while preserving 100% task completion, versus 81% accuracy and 29–71% task failure for prompt-embedded safety rules.

LLMAIMachine LearningAutomationBeancountFinanceTrust

LLMs Cannot Self-Correct Reasoning Yet — ICLR 2024 Findings and Finance AI Implications

Huang et al. (ICLR 2024) show that LLMs asked to review their own reasoning without external feedback consistently degrade accuracy — GPT-4 drops from 95.5% to 91.5% on GSM8K — and what this means for designing reliable Beancount journal entry agents.

LLMAIMachine LearningFinanceFinancial ReportingTrustBeancountData Science

PHANTOM (NeurIPS 2025): Measuring LLM Hallucination Detection in Financial Documents

PHANTOM (NeurIPS 2025) is the first benchmark to measure LLM hallucination detection on real SEC filings across context lengths up to 30,000 tokens. Qwen3-30B-A3B-Thinking leads with F1=0.882; 7B models score near random guessing — with direct implications for autonomous accounting agents.

Everything About Trust

LLM Confidence and Calibration: A Survey of What the Research Actually Shows

Uncertainty-Aware Deferral for LLM Agents: When to Escalate from Small to Large Models

Verifiably Safe Tool Use for LLM Agents: STPA Meets MCP

AGrail: Adaptive Safety Guardrails for LLM Agents That Learn Across Tasks

ShieldAgent: Verifiable Safety Policy Reasoning for LLM Agents

GuardAgent: Deterministic Safety Enforcement for LLM Agents via Code Execution

LLMs Cannot Self-Correct Reasoning Yet — ICLR 2024 Findings and Finance AI Implications

PHANTOM (NeurIPS 2025): Measuring LLM Hallucination Detection in Financial Documents

Get started with Beancount.io

Getting Started

Features

Community

Legal