4 posts tagged with "Security"

AILLMSecurityAutomationBeancountComplianceTrust

Verifiably Safe Tool Use for LLM Agents: STPA Meets MCP

CMU and NC State researchers propose using System-Theoretic Process Analysis (STPA) and a capability-enhanced Model Context Protocol to derive formal safety specifications for LLM agent tool use, with Alloy-based verification demonstrating absence of unsafe flows in a calendar scheduling case study.

AILLMSecurityAutomationMachine LearningTrustCompliance

AGrail: Adaptive Safety Guardrails for LLM Agents That Learn Across Tasks

AGrail (ACL 2025) introduces a two-LLM cooperative guardrail that adapts safety checks at inference time via test-time adaptation, achieving 0% prompt injection attack success and 95.6% benign action preservation on Safe-OS — compared to GuardAgent and LLaMA-Guard blocking up to 49.2% of legitimate actions.

AILLMMachine LearningSecurityComplianceAutomationTrustDevelopers

ShieldAgent: Verifiable Safety Policy Reasoning for LLM Agents

ShieldAgent (ICML 2025) replaces LLM-based guardrails with probabilistic rule circuits built on Markov Logic Networks, achieving 90.4% accuracy on agent attacks with 64.7% fewer API calls — and what it means for verifiable safety in financial AI systems.

AILLMAutomationSecurityMachine LearningTransaction ValidationTrust

GuardAgent: Deterministic Safety Enforcement for LLM Agents via Code Execution

GuardAgent (ICML 2025) places a separate LLM agent between a target agent and its environment, verifying every proposed action by generating and running Python code — achieving 98.7% policy enforcement accuracy while preserving 100% task completion, versus 81% accuracy and 29–71% task failure for prompt-embedded safety rules.

Everything About Security

Verifiably Safe Tool Use for LLM Agents: STPA Meets MCP

AGrail: Adaptive Safety Guardrails for LLM Agents That Learn Across Tasks

ShieldAgent: Verifiable Safety Policy Reasoning for LLM Agents

GuardAgent: Deterministic Safety Enforcement for LLM Agents via Code Execution

Get started with Beancount.io

Getting Started

Features

Community

Legal