What will it take to make a versatile computer use agent that can safely and effectively handle any task?
This is a collection of resources and ideas towards this goal.
The Open Operator project aims to enable AI agents to perform a wide range of computer tasks across several key domains:
- Development: Code generation, project setup, version control
- Data Management: Processing, analysis, and synchronization
- Automation: Workflows, emails, customer support
- Web Interaction: Navigation, form filling, research
- System Operations: File management, software installation, monitoring
For a detailed breakdown of tasks and capabilities, see capabilities.md.
- WebArena is a realistic web environment for building autonomous agents.
- OSWorld is a scalable, real computer environment for multimodal agents that supports task setup, execution-based evaluation, and interactive learning across operating systems.
Latest benchmark results across major evaluation frameworks as of January 2025. Human performance on OSWorld: >72.36%.
Model | WebArena | OSWorld | Openness | Notes |
---|---|---|---|---|
OpenAI Operator | 58.0% | 38.0% | Closed | Best overall on both benchmarks |
Jace.AI | 57.1% | N/A | Closed | Action description + Screenshots |
ScribeAgent | 53.0% | N/A | Closed | Proprietary training data |
ORCHESTRA | 52.1% | N/A | Closed | By UNC x Ventus |
Learn-by-Interact | 48.0% | N/A | Open | Best open source on WebArena |
AgentOccam-Judge | 45.7% | N/A | Open | |
UI-TARS-72B-DPO | N/A | 24.6% | Open | Best on OSWorld |
OSCAR | N/A | 24.5% | Open | Best screenshot-based model |
Aguvis-72B | N/A | 17.04% | Open | Multimodal approach |
Aria-UI | N/A | 15.15% | Closed | By HKU & Rhymes AI |
OS-Atlas | N/A | 14.63% | Open | Multiple model sizes |
SeeClick | N/A | 9.21% | Open | Visual interaction focus |
For detailed results and analysis, see the individual benchmark pages:
- Anthropic Computer Use: Claude AI's computer use capability
- Gumloop: AI-powered automation platform with visual workflow builder
- Lutra: AI-driven workflow automation platform
- OpenAI Operator: Upcoming autonomous AI agent for computer tasks
- Zapier: No-code automation platform connecting various apps and services