Google caps Meta’s Gemini use as AI compute runs short
TL;DR:
- Google told Meta around March it could not supply all the Gemini capacity Meta wanted, disrupting some of Meta’s internal AI projects.
- The cap is a rare public sign of the compute bottlenecks squeezing even the largest AI providers.
- Meta has urged staff to use AI tokens more sparingly and is leaning on its own Muse Spark model to cut dependence.
Google has put limits on Meta’s use of its Gemini models after the social media company sought more computing power than its rival could provide, according to the Financial Times. The restriction, which remains in place, delayed some of Meta’s internal AI work and offers an unusually candid look at the infrastructure strain building across the industry.
Demand outruns supply
Several other Google clients have been affected to a lesser degree, but Meta’s exceptionally high demand made it the most exposed. In response, Meta has told staff to be more efficient with AI tokens — the units measuring model usage — as part of a broader push to control costs. The company initially favoured Gemini because it outperformed Meta’s own Llama models, but has recently begun prioritising its newer Muse Spark system to reduce reliance on external providers.
The squeeze is striking given the sums involved. Despite tens of billions spent on chips, data centres and power, even hyperscalers cannot keep pace with demand. Google itself signed a $920mn-a-month deal to lease capacity from SpaceX, while chief executive Sundar Pichai told investors the company was “compute-constrained in the near term”, with cloud revenue topping $20bn for the first time and its backlog of signed contracts nearly doubling to more than $460bn. Reuters separately confirmed the capacity shortfall, noting the same constraints rippling through other customers.
Looking forward
The episode lands as warnings mount over whether AI infrastructure can scale fast enough. For UK firms further down the queue than Meta, the lesson is blunt: access to frontier compute is now a scarce, rationed commodity, and dependence on a single provider carries real delivery risk. As capacity tightens, the firms that plan for token efficiency and multi-model flexibility — rather than assuming unlimited supply — will be better placed to keep their AI projects on schedule.