4 篇博文含有标签「Security」

AILLMSecurityAutomationBeancountComplianceTrust

LLM 智能体可验证的安全工具使用：当 STPA 遇上 MCP

CMU 和北卡罗来纳州立大学的研究人员提出利用系统理论过程分析 (STPA) 和能力增强的模型上下文协议 (MCP) 为 LLM 智能体工具使用推导形式化安全规范，并通过基于 Alloy 的验证在日历调度案例研究中证明了不存在不安全流。

AILLMSecurityAutomationMachine LearningTrustCompliance

AGrail：跨任务学习的 LLM 智能体自适应安全护栏

AGrail (ACL 2025) 引入了一种双 LLM 协作护栏，通过测试时自适应（TTA）在推理阶段调整安全检查。在 Safe-OS 上实现了 0% 的提示注入攻击成功率和 95.6% 的良性操作保留率——相比之下，GuardAgent 和 LLaMA-Guard 拦截了高达 49.2% 的合法操作。

AILLMMachine LearningSecurityComplianceAutomationTrustDevelopers

ShieldAgent：LLM 智能体的可验证安全策略推理

ShieldAgent (ICML 2025) 使用基于马尔可夫逻辑网络构建的概率规则电路取代了基于 LLM 的护栏，在针对智能体攻击的防御中实现了 90.4% 的准确率，同时 API 调用减少了 64.7% —— 以及这对金融 AI 系统中可验证安全的意义。

AILLMAutomationSecurityMachine LearningTransaction ValidationTrust

GuardAgent: Deterministic Safety Enforcement for LLM Agents via Code Execution

GuardAgent (ICML 2025) places a separate LLM agent between a target agent and its environment, verifying every proposed action by generating and running Python code — achieving 98.7% policy enforcement accuracy while preserving 100% task completion, versus 81% accuracy and 29–71% task failure for prompt-embedded safety rules.

关于一切 Security

LLM 智能体可验证的安全工具使用：当 STPA 遇上 MCP

AGrail：跨任务学习的 LLM 智能体自适应安全护栏

ShieldAgent：LLM 智能体的可验证安全策略推理

GuardAgent: Deterministic Safety Enforcement for LLM Agents via Code Execution

开启 Beancount.io 之旅

入门指南

功能特性

社区

法律合规