News & analysis · 7 June 2026

Model routing hits enterprise AI budgets: why OpenAI and Anthropic face an IPO pricing trap

For two years, the enterprise AI playbook was simple: pick the most capable model and route every task through it. Summarize a memo? Frontier model. Classify a support ticket? Frontier model. Draft boilerplate code? Still frontier model. That era is ending — not because the models failed, but because the bills arrived. A discipline called model routing — matching each job to the cheapest model that can do it well — is spreading through corporate America just as OpenAI and Anthropic prepare trillion-dollar-class public listings. According to reporting from CNBC on June 5, routing is no longer a niche optimization trick. It is becoming the default response to AI budgets that blew past forecasts. For investors pricing mega-IPOs, that shift threatens the core assumption both companies have sold: enormous, sustained demand at premium per-token prices.

What model routing actually means

Model routing is an orchestration layer that sits between applications and model providers. Instead of hard-coding a single API endpoint, enterprises deploy a router — sometimes a gateway product, sometimes home-grown logic — that classifies incoming work by complexity, latency requirements, and risk tolerance. Hard reasoning, novel code generation, and high-stakes legal review might still go to GPT-4 class or Claude Opus. Summarization, entity extraction, simple classification, and templated responses get steered to smaller, cheaper models — including open-weight alternatives from DeepSeek, Meta, and others.

The economics are not marginal. Scott Wu, co-founder and CEO of Cognition — the company behind the coding agent Devin — told CNBC that routing routine work to appropriately sized models can deliver roughly five to ten times better cost efficiency on boilerplate tasks compared with running everything through a frontier endpoint. That is not a rounding error when your engineering org has tens of thousands of seats.

Cisco chief product officer Jeetu Patel offered a concrete scale example on the same program: at roughly $200 of token usage per employee per week, a company with 90,000 employees faces on the order of $900 million per year in inference spend if usage stays uncapped at frontier pricing. That math is why boards that approved “AI transformation” pilots in 2024 are now asking FinOps teams for routing architectures in 2026.

The 95% problem Glean’s CEO described

The waste is structural, not anecdotal. Arvind Jain, CEO of enterprise search company Glean, estimated on CNBC that approximately 95% of enterprise AI usage still defaults to the most expensive frontier models — even for work that cheaper models handle adequately. That figure captures the hangover from the first deployment wave, when teams optimized for capability and speed-to-demo, not unit economics.

The correction looks familiar to anyone who lived through early cloud adoption. Enterprises initially over-provisioned AWS instances; then they right-sized. AI is following the same curve, only faster and at higher nominal dollar amounts. The difference is that cloud bills scaled with infrastructure; AI bills scale with every employee click, every agent loop, and every automated retry.

This is the backdrop for Microsoft’s decision to cancel most internal Claude Code licenses by June 30. Per-engineer Claude Code spend reportedly reached $500 to $2,000 per month inside the Experiences and Devices division. Microsoft did not end its Anthropic partnership; it stopped letting one division run unconstrained agentic coding on a metered third-party API. Routing — in this case, toward GitHub Copilot CLI and Azure-hosted models with internal cost controls — is the operational fix to the same budget pressure.

Why routing threatens the IPO narrative

OpenAI and Anthropic have built valuations — reportedly in the $850 billion to $965 billion range ahead of anticipated 2026 listings — on the premise that enterprises will keep paying top dollar for frontier intelligence at scale. Model routing attacks that premise from two directions.

First, volume shifts downmarket. If 60–80% of queries that previously hit Opus or GPT-4 move to Haiku, Sonnet, or open models, aggregate revenue per seat falls even if total seat count rises. The labs still win the hard problems; they lose the high-frequency easy ones that padded usage charts.

Second, routing reduces lock-in. Once an enterprise maintains a routing table across OpenAI, Anthropic, Google, and self-hosted models, switching any single provider becomes a configuration change rather than a platform migration. That commoditizes the API layer the labs hoped would be sticky — similar to how multi-cloud strategies weakened pure-play infrastructure pricing power.

The tension shows up in this week’s headlines. Anthropic projects its first operating profit on surging Q2 revenue, while its largest customers simultaneously cap spend. Both stories can be true: Anthropic may be profitable at the provider level while buyers compress customer-level average revenue per user through routing and seat governance. Public-market investors will have to separate lab margins from enterprise willingness to pay.

Who wins and who loses in a routed world

Winners: AI gateway and observability vendors (the routing layer itself), companies selling fine-tuned small models for narrow domains, and hyperscalers that bundle inference into existing cloud contracts where marginal token cost is harder for CFOs to isolate. Microsoft’s push toward Copilot CLI inside its own stack is a textbook example: capture the routing layer and the margin, even if you still call Anthropic’s models on the back end.

Losers: Pure-play API businesses whose revenue mix depends on undifferentiated frontier-token volume. Open-source model families that keep improving — including DeepSeek’s well-funded open-weight push — give routing tables a credible cheap tier. Federal procurement is already pricing the race to zero: the GSA’s June OneGov AI menu lists Perplexity at $0.25 per agency and caps OpenAI and Anthropic at $1 for unlimited seats — symbolic pricing that treats frontier models as distribution loss leaders, not margin engines.

Hybrid architectures extend the pattern beyond the data center. Perplexity’s Computex 2026 demo routes agent subtasks between local NPU/GPU silicon and cloud frontier models in real time — another form of routing that keeps easy inference off the most expensive endpoints. As on-device models improve through 2026, the addressable premium-token market shrinks further.

How labs are responding

Neither OpenAI nor Anthropic is standing still. Both already sell model tiers — Opus vs. Sonnet vs. Haiku; GPT-4 class vs. smaller variants — which is itself a form of official routing if customers use it correctly. The labs’ challenge is that enterprise buyers are now building external routers that arbitrage across vendors, not just across one vendor’s SKU ladder.

Anthropic’s June 4 coordinated essay on slowing recursive self-improvement — covered in our IPO paradox analysis — sits awkwardly beside routing economics. The company simultaneously argues frontier capabilities are dangerous enough to pause and valuable enough to price at a premium. Enterprise buyers hear the second message for invoices and the first message for risk committees; routing lets them act on both by using smaller models for most work and frontier models only where justified.

OpenAI’s ChatGPT super-app pivot is a parallel strategic response: if API token margins compress, own the consumer surface and bundle ads, subscriptions, and commerce. Routing pressure on the enterprise API is one reason the consumer distribution war matters more, not less, heading into an IPO window that coincides with SpaceX’s June 12 listing and the broader liquidity drain hitting risk assets this week.

What buyers and investors should watch

If you run enterprise AI procurement, treat routing as infrastructure, not an afterthought. Minimum checklist:

  • Instrument before you route. You cannot optimize what you do not measure. Log model, task type, latency, and cost per request before deploying a router.
  • Define tier boundaries explicitly. Which tasks require frontier models for regulatory, safety, or quality reasons? Put those in writing so routing does not become a silent quality downgrade.
  • Test open-weight fallbacks. Run shadow traffic through cheaper models and compare output quality on your actual workloads, not benchmark leaderboards.
  • Renegotiate contracts around routed volume. Committed spend deals assume monolithic usage. Split pools by tier or shift to pay-as-you-go with internal caps.

If you are underwriting AI IPOs, ask a harder question than “what is Q2 revenue?” Ask what percentage of enterprise usage still hits the highest-priced SKU, and what happens when that number falls from 95% toward 50%. The mega-IPO skepticism that hit semiconductors and Mag 7 names this week is partly macro — jobs data, Fed repricing, SpaceX crowding out — but routing is a secular headwind on the revenue side that does not reverse when rates fall.

Bottom line

Model routing is the enterprise AI market growing up. The first phase was about proving capability; the second is about proving return on inference spend. Cisco’s nine-figure annual token math and Glean’s 95% overspend estimate describe the same transition: CFOs are no longer willing to fund a single-model strategy just because the demo was impressive.

For OpenAI and Anthropic, that is an IPO-year problem. Their listings assume premium demand persists at scale. Routing says most demand will persist — but at a blend of price points the labs do not control. The winners of the next phase may not be whoever builds the smartest model; they may be whoever owns the router, the evaluation harness, and the trust layer that decides which model runs which job. Read our guide to AI agents and tool use for the architectural context behind those routing decisions.

Sources: CNBC — model routing threatens OpenAI and Anthropic (5 Jun 2026); CNBC video — Jeetu Patel and Scott Wu on AI budgets; Let’s Data Science — routing revenue analysis. Related on Solana Garden: Microsoft Claude Code cancellation, Anthropic unit economics split, Anthropic IPO paradox, AI agents and tool use explained.