Opus 4.7's New Tokenizer: What It Actually Costs

Anthropic announced that Claude Opus 4.7 improves the model's understanding of inputs with a new tokenizer. This means that while the model price hasn't changed ($5/M input, $25/M output), the same inputs will cost more than previous models. They disclosed a 1.0–1.35x inflation range depending on content type. On OpenRouter, Opus usage skews heavily toward programming and technology, with agentic coding workflows making up the bulk of token volume.
We wanted to know: what does this actually look like in practice? What are real users seeing? We looked at usage that shifted from Opus 4.6 to 4.7, comparing patterns across both models.
We found that costs increased 12–27%, with the exception of short prompts, which actually got more cost efficient.
We Used Our Own Tokenizer to Get a Comparable Baseline
OpenRouter records two token counts for every request:
- OpenRouter tokens: Our own consistent tokenizer called "QuadChars," a lightweight, model-agnostic character counting method that groups every 4 printable ASCII characters as one token while counting each non-ASCII character (e.g. Unicode, emoji) as a separate token
- Native tokens: The provider's reported count, which uses the model's actual tokenizer
When a provider changes their tokenizer, the native count shifts while ours stays constant. The ratio between them isolates the tokenizer change from any differences in prompt content.
We identified users whose top model by request count was Opus 4.6 prior to Opus 4.7 launch, who then switched to Opus 4.7 as their top model. This "switcher cohort" gives us a controlled before-and-after comparison of the same user base across model versions.
Opus 4.7 Produces 32–45% More Native Tokens
We computed the median native-to-OpenRouter prompt token ratio for each model, bucketed by prompt size (using OpenRouter tokens as the consistent baseline):
| Prompt Size | Opus 4.6 Ratio | Opus 4.7 Ratio | Tokenizer Inflation |
|---|---|---|---|
| < 2K tokens | ~1.11x | ~1.62x | ~45% |
| 2K – 10K | ~1.00x | ~1.41x | ~42% |
| 10K – 25K | ~1.14x | ~1.52x | ~34% |
| 25K – 50K | ~1.19x | ~1.58x | ~32% |
| 50K – 128K | ~1.25x | ~1.65x | ~32% |
| 128K+ | ~1.30x | ~1.73x | ~33% |
For production-scale prompts (10K+ tokens), the 4.7 tokenizer produces 32–34% more native tokens than 4.6 for equivalent text. Smaller prompts see even higher inflation at 42–45%. We observed the same tokenizer inflation on completion tokens as well, not just prompts.
Why are the absolute ratios above 1.0 for most buckets? OpenRouter's tokenizer generally produces fewer tokens than Anthropic's native tokenizer, so even Opus 4.6 has ratios near or above 1. What matters is the shift between versions, which is attributable to the new tokenizer.
Note: These inflation percentages measure changes in the native-to-OpenRouter ratio, not a direct tokenizer comparison on identical text. For reference, Simon Willison independently measured ~1.46× inflation on system prompts using Anthropic's tokenizer directly.
Caching Absorbs Most of the Token Inflation
The tokenizer produces 32–45% more native tokens. However, prompt caching absorbs a large share of the inflation (cached tokens are billed at a 90% discount) so extra tokens that land in cache have minimal cost impact.
| Prompt Size | Avg Δ Native Tokens | Avg Δ Cached | Avg Δ Uncached | % Absorbed by Cache |
|---|---|---|---|---|
| < 2K tokens | +266 | -149 | +415 | —* |
| 2K – 10K | +2,768 | +1,561 | +1,207 | 56% |
| 10K – 25K | +6,445 | +577 | +5,868 | 9% |
| 25K – 50K | +13,695 | +8,800 | +4,896 | 64% |
| 50K – 128K | +26,304 | +20,257 | +6,046 | 77% |
| 128K+ | +108,559 | +100,410 | +8,149 | 93% |
*Cache rate is extremely low in the < 2K bucket, with less than 10% of requests hitting the cache at all, leading to a negative delta.
For prompts above 25K, the majority of extra tokens from the new tokenizer are captured by the cache. At the longest prompts (128K+), 93% of the extra tokens land in cache.
Completion Length in Opus 4.7 Diverges Based on the Prompt Size
Using OpenRouter's consistent token counts, we also measured how completion lengths changed between models:
| Prompt Size | Median Completion (4.6) | Median Completion (4.7) | Change |
|---|---|---|---|
| < 2K tokens | 302 | 114 | -62% |
| 2K – 10K | 338 | 351 | +4% |
| 10K – 25K | 191 | 248 | +30% |
| 25K – 50K | 119 | 135 | +13% |
| 50K – 128K | 108 | 129 | +19% |
| 128K+ | 113 | 142 | +26% |
Opus 4.7 is significantly more concise with short prompts, generating 62% fewer tokens for simple queries under 2K. For longer context prompts (10K+), it produces moderately longer responses, with 13–30% more tokens at the median.
Actual Cost Impact
Using billed costs from over one million requests in the switcher cohort, we calculated the average cost per million OpenRouter tokens. This normalizes for prompt length, allowing a direct comparison of cost efficiency.
| Prompt Size | Avg $/M OR Tokens (4.6) | Avg $/M OR Tokens (4.7) | Change |
|---|---|---|---|
| < 2K tokens | $14.60 | $14.37 | -1.6% |
| 2K – 10K | $6.65 | $8.46 | +27.2% |
| 10K – 25K | $3.82 | $4.78 | +25.2% |
| 25K – 50K | $2.25 | $2.73 | +21.3% |
| 50K – 128K | $1.66 | $1.86 | +11.9% |
| 128K+ | $1.29 | $1.49 | +15.3% |
Each factor contributes differently to the final cost. Here's how tokenizer inflation, cache absorption, and completion length changes combine:
| Prompt Size | Tokenizer Inflation | Cache Absorption | Completion Δ | Net Cost Δ |
|---|---|---|---|---|
| < 2K tokens | +45% | — | -62% | -1.6% |
| 2K – 10K | +42% | 56% | +4% | +27.2% |
| 10K – 25K | +34% | 9% | +30% | +25.2% |
| 25K – 50K | +32% | 64% | +13% | +21.3% |
| 50K – 128K | +32% | 77% | +19% | +11.9% |
| 128K+ | +33% | 93% | +26% | +15.3% |
Our study of real Opus 4.7 usage shows that actual costs increased 12–27% for prompts above 2K tokens when cache absorption is taken into account. Short prompts under 2K were the exception, where significantly shorter completions offset the tokenizer overhead entirely.
Methodology
- Source: OpenRouter's request logs
- Cohort: Users whose top model by request count was Opus 4.6, who then switched to Opus 4.7 as their top model.
- Sample size: Over one million requests split across 4.6 and 4.7, text-only, non-cancelled
- Normalization: OpenRouter counts tokens independently from Anthropic's native count. The ratio between native and OR token counts isolates the tokenizer change.
- Cost metric: Average cost per million OpenRouter tokens, bucketed by OR prompt token count. Dividing by OR tokens normalizes for prompt length differences across model versions.
- Controls: Excluded media (images, files, audio, video), cancelled requests, and zero-token requests