SkillsBench

The first gym-style benchmark for AI Agent Skills—measuring how well agents apply procedural knowledge, not just connect to tools.

Open Source
skillsbench.ai
MCP
Tool Provisioning
"What tools can an agent use?"
Skills
Training Materials
"How should an agent work?"
SkillsBench
First Benchmark
Measuring expertise transfer at scale
Timeline
Nov '24
MCP ships
Mar '25
OpenAI adopts
Oct '25
Skills beta
Dec '25
9 agents adopt
The Gap: No Benchmark for Skills
MCP benchmarks measure tool connectivity. But Skills are training materials—procedural knowledge that teaches agents how to work. No systematic framework exists for evaluating whether Skills actually transfer expertise effectively. SkillsBench fills this gap.
1
Research Question
Do Skills help agents perform better?
Comparing agent performance with vs. without Skills on identical tasks
2
Research Question
Can agents compose multiple Skills?
Measuring how well agents select and combine Skills for complex workflows
Ecosystem Adoption
Dec 2025
28k+
Smithery.ai
45k+
GitHub SKILL.md
Domains
Research Coding Writing Data Design Planning Office Logistics Bio Finance Security DevOps Marketing Legal
Evaluation
State → Action → Reward
Format
Filesystem-native
Properties
Versionable · Composable
Environment
Docker + Harbor