Posts4/27/2026 by Justin Summerville

Opus 4.7's New Tokenizer: What It Actually Costs

Anthropic announced that Claude Opus 4.7 improves the model's understanding of inputs with a new tokenizer. This means that while the model price hasn't changed ($5/M input, $25/M output), the same inputs will cost more than previous models. They disclosed a 1.0–1.35x inflation range depending on content type. On OpenRouter, Opus usage skews heavily toward programming and technology, with agentic coding workflows making up the bulk of token volume.

We wanted to know: what does this actually look like in practice? What are real users seeing? We looked at usage that shifted from Opus 4.6 to 4.7, comparing patterns across both models.

We found that costs increased 12–27%, with the exception of short prompts, which actually got more cost efficient.

We Used Our Own Tokenizer to Get a Comparable Baseline

OpenRouter records two token counts for every request:

OpenRouter tokens: Our own consistent tokenizer called "QuadChars," a lightweight, model-agnostic character counting method that groups every 4 printable ASCII characters as one token while counting each non-ASCII character (e.g. Unicode, emoji) as a separate token
Native tokens: The provider's reported count, which uses the model's actual tokenizer

When a provider changes their tokenizer, the native count shifts while ours stays constant. The ratio between them isolates the tokenizer change from any differences in prompt content.

We identified users whose top model by request count was Opus 4.6 prior to Opus 4.7 launch, who then switched to Opus 4.7 as their top model. This "switcher cohort" gives us a controlled before-and-after comparison of the same user base across model versions.

Opus 4.7 Produces 32–45% More Native Tokens

We computed the median native-to-OpenRouter prompt token ratio for each model, bucketed by prompt size (using OpenRouter tokens as the consistent baseline):

Prompt Size	Opus 4.6 Ratio	Opus 4.7 Ratio	Tokenizer Inflation
< 2K tokens	~1.11x	~1.62x	~45%
2K – 10K	~1.00x	~1.41x	~42%
10K – 25K	~1.14x	~1.52x	~34%
25K – 50K	~1.19x	~1.58x	~32%
50K – 128K	~1.25x	~1.65x	~32%
128K+	~1.30x	~1.73x	~33%

For production-scale prompts (10K+ tokens), the 4.7 tokenizer produces 32–34% more native tokens than 4.6 for equivalent text. Smaller prompts see even higher inflation at 42–45%. We observed the same tokenizer inflation on completion tokens as well, not just prompts.

Why are the absolute ratios above 1.0 for most buckets? OpenRouter's tokenizer generally produces fewer tokens than Anthropic's native tokenizer, so even Opus 4.6 has ratios near or above 1. What matters is the shift between versions, which is attributable to the new tokenizer.

Note: These inflation percentages measure changes in the native-to-OpenRouter ratio, not a direct tokenizer comparison on identical text. For reference, Simon Willison independently measured ~1.46× inflation on system prompts using Anthropic's tokenizer directly.

Caching Absorbs Most of the Token Inflation

The tokenizer produces 32–45% more native tokens. However, prompt caching absorbs a large share of the inflation (cached tokens are billed at a 90% discount) so extra tokens that land in cache have minimal cost impact.

Prompt Size	Avg Δ Native Tokens	Avg Δ Cached	Avg Δ Uncached	% Absorbed by Cache
< 2K tokens	+266	-149	+415	—*
2K – 10K	+2,768	+1,561	+1,207	56%
10K – 25K	+6,445	+577	+5,868	9%
25K – 50K	+13,695	+8,800	+4,896	64%
50K – 128K	+26,304	+20,257	+6,046	77%
128K+	+108,559	+100,410	+8,149	93%

*Cache rate is extremely low in the < 2K bucket, with less than 10% of requests hitting the cache at all, leading to a negative delta.

For prompts above 25K, the majority of extra tokens from the new tokenizer are captured by the cache. At the longest prompts (128K+), 93% of the extra tokens land in cache.

Completion Length in Opus 4.7 Diverges Based on the Prompt Size

Using OpenRouter's consistent token counts, we also measured how completion lengths changed between models:

Prompt Size	Median Completion (4.6)	Median Completion (4.7)	Change
< 2K tokens	302	114	-62%
2K – 10K	338	351	+4%
10K – 25K	191	248	+30%
25K – 50K	119	135	+13%
50K – 128K	108	129	+19%
128K+	113	142	+26%

Opus 4.7 is significantly more concise with short prompts, generating 62% fewer tokens for simple queries under 2K. For longer context prompts (10K+), it produces moderately longer responses, with 13–30% more tokens at the median.

Actual Cost Impact

Using billed costs from over one million requests in the switcher cohort, we calculated the average cost per million OpenRouter tokens. This normalizes for prompt length, allowing a direct comparison of cost efficiency.

Prompt Size	Avg $/M OR Tokens (4.6)	Avg $/M OR Tokens (4.7)	Change
< 2K tokens	$14.60	$14.37	-1.6%
2K – 10K	$6.65	$8.46	+27.2%
10K – 25K	$3.82	$4.78	+25.2%
25K – 50K	$2.25	$2.73	+21.3%
50K – 128K	$1.66	$1.86	+11.9%
128K+	$1.29	$1.49	+15.3%

Each factor contributes differently to the final cost. Here's how tokenizer inflation, cache absorption, and completion length changes combine:

Prompt Size	Tokenizer Inflation	Cache Absorption	Completion Δ	Net Cost Δ
< 2K tokens	+45%	—	-62%	-1.6%
2K – 10K	+42%	56%	+4%	+27.2%
10K – 25K	+34%	9%	+30%	+25.2%
25K – 50K	+32%	64%	+13%	+21.3%
50K – 128K	+32%	77%	+19%	+11.9%
128K+	+33%	93%	+26%	+15.3%

Our study of real Opus 4.7 usage shows that actual costs increased 12–27% for prompts above 2K tokens when cache absorption is taken into account. Short prompts under 2K were the exception, where significantly shorter completions offset the tokenizer overhead entirely.

Methodology

Source: OpenRouter's request logs
Cohort: Users whose top model by request count was Opus 4.6, who then switched to Opus 4.7 as their top model.
Sample size: Over one million requests split across 4.6 and 4.7, text-only, non-cancelled
Normalization: OpenRouter counts tokens independently from Anthropic's native count. The ratio between native and OR token counts isolates the tokenizer change.
Cost metric: Average cost per million OpenRouter tokens, bucketed by OR prompt token count. Dividing by OR tokens normalizes for prompt length differences across model versions.
Controls: Excluded media (images, files, audio, video), cancelled requests, and zero-token requests