OwnLLM

On April 27, 2026, GitHub announced that Copilot is leaving flat pricing for a token metered model called AI Credits, effective June 1, 2026. It looks like an isolated pricing tweak. It is not. Across the three biggest AI vendors, the same story is playing out: heavy users cost far more than they pay, and the providers have run out of patience.

What's happening

GitHub is replacing premium request units with AI Credits, consumed at each model's published API rate based on input, output, and cached tokens. The included pool now equals your monthly fee: $10 of credits on the $10 Pro plan, $19 on Business, $39 on Enterprise. Code completions stay free. Everything agentic burns credits.

GitHub's framing is honest: "a quick chat question and a multi-hour autonomous coding session can cost the user the same amount." That subsidy was the entire flat pricing model.

The same realization hit OpenAI eighteen months earlier. In January 2025, Sam Altman posted that "we are currently losing money on openai pro subscriptions," despite charging $200 a month, because "people use it much more than we expected." Altman picked the price himself and assumed it would be profitable. It was not.

Anthropic showed its hand in April 2026. On April 21, the company quietly removed Claude Code from its $20 Pro plan for roughly 2% of new prosumer signups, framed internally as an A/B test. Developer backlash was immediate, and the change was reverted within 24 hours. Anthropic's head of growth Amol Avasare confirmed the test and explained: "we've made small adjustments along the way (weekly caps, tighter limits at peak), but usage has changed a lot and our current plans weren't built for this." CEO Dario Amodei has publicly called Anthropic compute constrained.

Cursor went through the same transition earlier. In June 2025, the company replaced its "fast requests" cap with usage-based credits indexed to model API costs. The rollout triggered a backlash large enough that Cursor issued public refunds and a "Clarifying our pricing" post taking responsibility for the communication. The destination was the same as GitHub's: per-token billing, with a fixed credit pool included in the seat price.

What this signals

Three vendors, one pattern. Flat pricing assumed humans typing prompts. Agent loops broke that assumption. A single autonomous coding session can consume far more tokens than a chat user generates over an entire month. When that gap widens, no flat price reconciles the median user with the heavy user. You either lose money on power users or overcharge everyone else.

The industry response is converging on three moves: meter by token, cap usage with rate limits, and tier by intensity. GitHub picked metering. Anthropic picked tightening limits. OpenAI has openly considered both. Expect Cursor, Replit, and the rest to follow within twelve months. The era where $20 a month bought unlimited frontier intelligence is closing, because the unit economics never worked.

For buyers, the implication is uncomfortable. The line item that used to be a fixed seat cost is becoming a variable bill, harder to project month over month based on how aggressively your team runs agents. Finance teams have not budgeted for this. Engineering teams will start throttling, which defeats the point of agents in the first place.

What it means for your team

If your team is already running agentic workflows, model your worst month, not your average. Pooled budgets help allocate the pain across the org. They do not change the total.

The other path is to stop buying inference per token at all. Open weight models (Qwen, Llama, DeepSeek, Mistral) now cover much of the work that routine agent runs actually do: refactors, doc generation, codebase Q&A, test scaffolding. Run them on a workstation or rented GPU, expose an OpenAI compatible endpoint, and your existing tools (Cursor, Claude Code, OpenCode) keep working unchanged. That is the kind of setup we build at OwnLLM, for teams who would rather optimize hardware utilization once than monitor a token bill every month.

Frontier models still matter for hard problems, and Copilot's new bill view will help teams that stay. But for routine agent work, paying per token to do what an open weight model handles fine is a tax worth questioning before June 1.

Where to go from here

The flat subscription era ending is not a Copilot story. It is an industry story, and it is moving faster than most teams expect. Map which workloads truly need a frontier model and which do not, then run the math. More on the private deployment side at ownllm.app.