Tool Calling Accuracy
We define tool calling accuracy as the percentage of tool calling requests that contain no invalid tool choices and no schema problems.
A tool calling request is one that:
- Ends with a
"tool_calls"
finish reason - Is sent with at least one available tool option
Don’t know what tool calling is? Learn more about tool calling here
Tool call Success
- GPT-5 ranked #1 in tool calling accuracy across all models on OpenRouter, with Claude 4.1 Opus close behind at 99.5%.
Most Popular Models for Tool Calling
- Gemini 2.5 Flash is currently handling the largest share of tool calling requests on OpenRouter, with over 5 million requests processed in the past week.
Tool Hallucination Trends
- Tool hallucination is a common problem with open source models, but proprietary models are doing a good job. Most with negligible defect rates:
Top Tool Calling Issues Tracked
- We track a variety of different tool issues, including schema mismatches and tools chosen by the model that were never passed in by the developer ("unknown-name").
Why This Matters
The default routing settings on OpenRouter will increasingly factor in these statistics to choose the best-performing model and provider for your requests.
By sharing this data, our goal is to help developers make informed decisions about which model is right for their use case — whether prioritizing accuracy, speed, or cost.
📊 Explore more model performance data in the OpenRouter documentation
Follow us for more insights: @OpenRouterAI on X