Skip to main content
AI costagentic AItoken billingenterpriselocal-firstBYOLLM

Even Uber hit the AI coding cost wall

Bodega One9 min read

Quick answer

On June 1, 2026, GitHub flipped Copilot to metered AI Credits billing and developers revolted. That was the loud story. The quieter, bigger one is that the companies with the deepest pockets are hitting the same wall. Uber burned its entire 2026 AI coding budget in four months and capped engineers at $1,500 a month, per tool. Microsoft cancelled most of its own Claude Code licenses. When Uber and Microsoft cannot keep agentic cloud AI inside a budget, the problem is not a bad plan tier. It is the metered model itself. Here are the numbers, why the cost is structurally unbounded, and the two cost structures that actually cap the bill.

The month the bill became the story

For two years the AI coding conversation was about capability: which model writes the best code, which agent ships the most. In June 2026 it became about the bill. GitHub moved every Copilot plan to token-based AI Credits on June 1, and the community thread filled with developers watching a month of credits vanish in a day. We walked through that in the week-one bills post and the rate-card math.

But the Copilot backlash is the symptom, not the disease. The clearest signal that something structural is wrong came from the opposite end of the market: not price-sensitive indie developers, but the reference customers, the companies whose engineering budgets are measured in the hundreds of millions. They are pulling back too.

Uber burned a year of budget in four months

Uber's CTO, Praveen Neppalli Naga, went on record that the company burned through its entire 2026 AI coding tools budget in the first four months of the year (first reported by The Information). Adoption of agentic tools like Claude Code and Cursor surged after their late-2025 rollout, and the bills surged with it. Bloomberg and TechCrunch reported on June 2 that before the company intervened, some engineers were running between $500 and $2,000 a month in token consumption.

Uber's fix was a cap: $1,500 per month, per employee, per tool. Spending on Claude Code does not draw down the budget for Cursor; each tool gets its own ceiling, tracked on an internal dashboard, with exceptions available on request. Sit with that number. A company running one of the largest engineering organizations on the planet looked at agentic AI spend and decided $1,500 per developer, per tool, per month was the line it had to draw.

That is not the shape of a tool being a little overpriced. It is the shape of a cost that does not stay still.

Microsoft cut its own Claude Code

The same week brought a second signal. According to The Verge, Microsoft cancelled most of its direct Claude Code licenses about six months after it first gave engineers access, and pointed them at GitHub Copilot CLI instead.

The nuance matters, so here it is straight: this is an internal-tooling decision, and it does not touch Microsoft's separate Anthropic arrangement, a reported $5B investment and a $30B Azure commitment. Microsoft is not walking away from Anthropic. It is declining to pay per-token for an agentic tool when it owns a cheaper substitute. That is the buyer logic playing out at the very top of the market: when the company that ships Copilot decides metered third-party agents cost too much to hand its own engineers, the meter is the problem, not the vendor.

“For my team, the cost of compute is far beyond the costs of the employees.”

That is Nvidia's Bryan Catanzaro, VP of applied deep learning, speaking to Axios. When inference is your dominant cost line, every metered call is a variable you cannot fully control, and it compounds with every agent you deploy.

Why agentic cloud AI cost is structurally unbounded

Autocomplete was cheap because it was essentially one short model call per suggestion. An agent is not one call. It plans, reads files, runs tools, re-reads the context it just changed, and writes across multiple steps. A single “fix this across the repo” task can move hundreds of thousands of tokens. And the output tokens, the ones the agent generates, are the expensive ones: frontier models run $25 to $30 per million output tokens. The bill scales with the size of the work, not with a flat seat.

The standard reassurance is that token prices keep falling. They do. Gartner projects that inference on a trillion-parameter model will cost roughly 90% less in 2030 than it did in 2025. But that is only one of the two curves. Goldman Sachs projects a 24-fold increase in token consumption by 2030. When consumption climbs faster than price drops, the bill still goes up. Cheaper tokens do not cap a metered bill. They change how fast it fills.

This is why moving from one metered platform to another solves nothing. Swapping Copilot's credit pool for Cursor's, or Cursor's for Devin Desktop's cloud-agent meter, just changes which dashboard you check at the end of the month. The meter is still there, and it still scales with how hard you work.

The two cost structures that actually cap the bill

There are only two ways to put a real ceiling on agentic AI cost, and neither is a better subscription tier.

ApproachWhat it removesThe tradeoff
Platform subscription (Copilot, Cursor, Devin Desktop)Nothing. The meter is the product.Bill scales with usage. You cannot predict it.
BYOK extension (Cline, Continue.dev)The platform markup. You pay the provider at cost.Still per-token, but wholesale and fully visible.
Local model (Ollama, LM Studio, llama.cpp)The inference bill entirely. Zero marginal cost.Needs hardware: roughly 16 to 48 GB of VRAM.
One-time-purchase IDE (Bodega One Code)The subscription renewal. Pay once, own it.None on cost. Pair with local or BYOK for the model.

The bottom two rows are the only ones that bound the number. Bring-your-own-key tools like Cline and Continue.dev strip out the platform credit layer so you pay wholesale and see every dollar. A local model removes the inference bill outright, and in 2026 that is genuinely viable: Qwen3.6, DeepSeek V4, GLM-5.1, MiniMax M3, and Kimi K2.6 all run on your own hardware and post competitive coding-benchmark scores. The tradeoff is hardware, not quality. Our VRAM calculator and local-LLM rankings size it for your machine.

Stack the two, local inference plus a tool you bought once instead of renting monthly, and the cost of an agentic coding session stops being a variable on someone else's dashboard. It becomes a fixed cost you already paid.

Where Bodega One Code fits

This is the problem we built Bodega One Code around, so take this section as the pitch it is. It is a local-first desktop IDE: a full Monaco editor, AI chat, and an autonomous coding agent that all run on your machine. It is BYOLLM, so you point it at a local model for zero inference cost, or at your own API key for any cloud provider at that provider's rate. No platform credit pool, no per-model paywall, no monthly renewal.

Right now it is free for everyone in the open beta, commercial use included. At full release, Personal stays free for personal use and Pro is $39 one-time for commercial use. One payment, not a meter. For teams in regulated environments, air-gap mode blocks all network egress so code cannot leave the machine, which also happens to make a runaway inference bill structurally impossible.

None of this makes the frontier cloud models obsolete. If you need Claude Opus on a genuinely hard problem, use it, and pay Anthropic directly. The point is narrower: the default should not be a meter you cannot predict. Uber and Microsoft just demonstrated, at the top of the market, what happens when it is. Download Bodega One Code free and put the bill back under your control.


Sources

  • Bloomberg, June 2, 2026: Uber Caps Usage of AI Tools Like Claude Code to Cut Costs: bloomberg.com
  • Inc., June 2026: Uber Blew Through Its 2026 AI Budget in 4 Months, Now It's Capping Employee Use (the $500 to $2,000 per-engineer figure and the $1,500 per-tool cap): inc.com
  • Simon Willison, June 3, 2026: Uber Caps Usage of AI Tools Like Claude Code to Manage Costs: simonwillison.net
  • Fortune, May 22, 2026: Microsoft reports are exposing AI's real cost problem (Microsoft cancelling most direct Claude Code licenses per The Verge, the Nvidia Catanzaro quote, and the Goldman Sachs and Gartner projections): fortune.com
  • The Register, June 2, 2026: Angry devs vow to flee GitHub Copilot as metered billing takes hold: theregister.com
  • Bodega One Code, June 8, 2026: GitHub Copilot, one week in (the developer-level version of this story): bodegaone.ai/blog/github-copilot-week-one-real-bills

Common questions

How much did Uber's AI coding bill actually grow?
Uber burned through its entire 2026 AI coding tools budget in the first four months of the year, per CTO Praveen Neppalli Naga (first reported by The Information). Before the company stepped in, some engineers were running between $500 and $2,000 a month in token consumption on tools like Claude Code and Cursor. Uber's response, reported by Bloomberg and TechCrunch on June 2, 2026, was a $1,500 monthly cap per employee, per tool, tracked on an internal dashboard.
Did Microsoft really cancel its own Claude Code licenses?
According to The Verge, Microsoft cancelled most of its direct Claude Code licenses about six months after first granting access, and pointed engineers to GitHub Copilot CLI instead. It does not affect Microsoft's separate Anthropic arrangement (a reported $5B investment and $30B Azure commitment). The internal-tooling decision and the cloud-platform deal are different things.
Why does agentic AI cost so much more than autocomplete?
Autocomplete is one short model call. An agent plans, reads files, runs tools, re-reads the context it just changed, and writes across multiple steps, so a single task can move hundreds of thousands of tokens. Output tokens cost the most (top models run $25 to $30 per million output tokens) and agents generate a lot of output. The bill scales with the size of the work, not with a flat seat price.
Token prices keep falling. Won't this fix itself?
Per-token prices are falling. Gartner projects inference on a trillion-parameter model will cost roughly 90% less by 2030. But consumption is rising faster: Goldman Sachs projects a 24-fold increase in token consumption by 2030. When usage outpaces price cuts, the bill still climbs. Cheaper tokens do not cap a metered bill, they just change how fast it fills.
What actually caps the cost of AI coding?
Two structures, and neither is a better subscription tier. BYOK (bring your own key) removes the platform markup so you pay the model provider directly at cost. A local model removes the inference bill entirely because you run it on your own hardware. Combine local inference with a one-time-purchase tool and the marginal cost of a coding session is effectively zero.
How does Bodega One Code avoid the meter?
Bodega One Code is a local-first desktop IDE with BYOLLM. Point it at a local model (Ollama, LM Studio, llama.cpp) for zero inference cost, or at your own API key for any cloud provider at that provider's rate. There is no platform credit pool and no per-model paywall. It is free for everyone in the open beta; at full release Personal stays free for personal use and Pro is $39 one-time for commercial use. See bodegaone.ai/pricing.

Stay in the loop

Build-in-public updates, model picks, and Copilot/Cursor news as it breaks.

Ready to own your tools?

Beta is free and open to everyone. Download free.