Managing AI Inference Optionality
AT&T cut costs 90% routing AI requests. Airbnb uses open-weight models in customer support. How are you managing inference optionality?
Supply is under pressure.
Anthropic is now paying xAI $1.25 billion a month to rent an entire data center’s output and Google is leasing roughly 110,000 GPUs from SpaceX/x.ai.
JP Morgan estimates 60% of 2027 data center supply is not yet under construction.
Unit prices fall; demand increases.
Bain’s June analysis put it bluntly — model prices fall about 10x per generation, yet the effective cost per task stays flat. Reasoning and agents simply consume more.
Given the demand and supply imbalance, logically, model providers are looking to raise inference prices. Anthropic announced moves to limit subscription token access in May, only to pause it in June.
As enterprises are building AI for critical functionality, in my experience, progressive companies are treating compute like a critical input. They are managing its usage and creating inference optionality — the ability to keep delivering value no matter which model, price, or data center is available next quarter.
Three approaches build it:
Secure supply and design for failover, so an outage or a latency spike doesn’t take your product down with it.
Expand low-cost continuity. Investigate open-weight models for the tasks that don’t need the frontier, and pair them with self-hosting where control matters. Airbnb describes open-weight Qwen as “very good”, “fast”, and “affordable.”
Route by task and role. AT&T, running 8 billion tokens a day, cut costs about 90% by sending routine work to smaller domain models. Zscaler simply doesn’t give its marketing team frontier access — as one exec put it, using a huge model for a simple job is a misuse of resources.
None of this is exotic. It’s the same discipline every operator already applies to any critical, volatile input: manage usage and develop alternatives.
The firms that win the next year won’t be the ones running the best model.
They’ll be the ones who never depended on it.