The first gym-style benchmark for AI Agent Skills—measuring how well agents apply procedural knowledge, not just connect to tools.
Open Source
skillsbench.ai
MCP
Tool Provisioning
"What tools can an agent use?"
Skills
Training Materials
"How should an agent work?"
SkillsBench
First Benchmark
Measuring expertise transfer at scale
Timeline
Nov '24
MCP ships
Mar '25
OpenAI adopts
Oct '25
Skills beta
Dec '25
9 agents adopt
The Gap: No Benchmark for Skills
MCP benchmarks measure tool connectivity. But Skills are training materials—procedural knowledge that teaches agents how to work.
No systematic framework exists for evaluating whether Skills actually transfer expertise effectively.
SkillsBench fills this gap.
1
Research Question
Do Skills help agents perform better?
Comparing agent performance with vs. without Skills on identical tasks
2
Research Question
Can agents compose multiple Skills?
Measuring how well agents select and combine Skills for complex workflows