AI Agents

Tot Sobre AI Agents

Un article

Autonomous AI agent benchmarks and evaluations for real-world task completion

Retorna a Totes les Publicacions Veure totes les etiquetes

AIMachine LearningAutomationLLMTechnologyData ScienceAI Agents

OSWorld: Desktop AI Agents Succeed on 12% of Tasks Where Humans Succeed on 72%

OSWorld (NeurIPS 2024) benchmarks multimodal AI agents on 369 real desktop tasks across Ubuntu, Windows, and macOS — finding a 60-percentage-point gap between the best model (12.24%) and human performance (72.36%), with 75% of failures traced to visuomotor grounding errors rather than reasoning failures.

Comença amb Beancount.io

Pren el control de les teves finances amb el nostre sistema de comptabilitat per partida doble de codi obert. Comença el teu llibre comptable avui mateix.

Comença gratis Veure preus

Creat amb transparència • Controlat per versions • Impulsat per IA

Tot Sobre AI Agents

OSWorld: Desktop AI Agents Succeed on 12% of Tasks Where Humans Succeed on 72%

Comença amb Beancount.io

Primers passos

Funcions

Comunitat

Legal