SkillsBench

The first benchmark for AI Agent Skills. Measuring how procedural knowledge transfers to agents.

Week 1 Update
January 2026
Skills boost agent performance by up to 27%
Pass@1
Benchmark Results
Without Skills
With Skills
Codex
GPT-5.2
0.645
0.729
+13%
Claude Code
Opus 4.5
0.395
0.500
+27%
440+
Community
120+
Contributor Signups
8
Tasks Merged
44
In Pipeline
🎓
~70%
PhD candidates or holders
📈
2 weeks
From 0 to 440+ community, 52 tasks
✍️
100%
Human-written, real-world tasks
Notable Contributors
First authors of Screenspot Pro, MCP-Universe, and BigCodeBench

We're Recruiting Contributors

Join 120+ researchers building the benchmark for agent skills

51K
Twitter
14K
Rednote