Google AI breakthrough shows why we don't need more data centers

Make AI work smarter, not harder.
 By 
Chris Taylor
 on 
A smartphone with "Google AI" on it, atop a keyboard
Credit: Thomas Fuller/NurPhoto via Getty Images

We have seen the future of AI via Large Language Models. And it's smaller than you think.

That much was clear in 2025, when we first saw China's DeepSeek — a slimmer, lighter LLM that required way less data center energy to do its job and performed surprisingly well on benchmark tests against heftier American AI models. (Ironically, it was built atop an open source U.S. model, Meta's Llama).

DeepSeek may have foundered on privacy concerns, but the trend towards smaller and smarter AI isn't going away. The evolution is on display again in TurboQuant, a compression algorithm that Google quietly unveiled this week via a Google Research paper.


You May Also Like

The paper itself is pretty impenetrable if you're not an AI nerd who talks tokens and high-dimensional vectors. We'll get into a more detailed explanation below. But here's the TL;DR: The TurboQuant algorithm can make LLMs' memory usage six times smaller.

What does that mean? Less energy usage, perhaps to the point where running a powerful AI model on your powerful smartphone becomes possible. Less RAM usage, right on time for the ongoing RAM shortage.

Certainly, algorithms like this can help LLMs make more efficient use of the data centers they're hosted in — either by using the extra space to run more complex models, or, hear me out, by allowing us not to rush into building so many unpopular new data centers in the first place.

And that, paradoxically, could be a problem for the AI economy, at least as it's currently structured.

Why smaller and smarter will mess up NVIDIA

For the past three years, tech stocks have been riding ever higher on the back of one company alone: NVIDIA. And NVIDIA has been riding ever higher on the assumption that we're in the middle of what CEO Jensen Huang called this month "the largest infrastructure buildout in history" — an explosion of data centers, for which NVIDIA will be the chief provider of chips.

But that infrastructure build-out, if you look at data centers actually built versus data centers promised, is already stumbling, as a fresh New York Times investigation just made clear. What's the holdup? Not just opposition from concerned citizens across the U.S., now including the NAACP. It's also permits, applications, inspections, and the other unsexy but often necessary parts of the local government machinery.

Not least of the problems: A dearth of power generation and transmission, which doesn't sit well with the AI industry's unquantifiable ability to soak up electricity and suck up water.

What happens when the desire for more AI runs into a lack of infrastructure? Well, then necessity becomes the mother of invention. We learn to do more with less. And that's exactly what TurboQuant does.

Middle-out compression

Here's that explanation — although since TurboQuant is a compression algorithm, you'd be forgiven for imagining Google had the same NSFW "middle out" compression algorithm inspiration that drove the plot of the HBO comedy Silicon Valley.

So there's a couple of energy "bottlenecks" when AI models reach for something they really want and frequently use. One is called the key-value cache, which is like a really hot library that stores the most-used information. The other is the vector search, which matches things that look the same. TurboQuant effectively lubricates both at once, making memory grabs faster, smoother, and less fraught.

TurboQuant "helps unclog key-value cache bottlenecks by reducing the size of key-value pairs," Google's paper says, in part by the "clever" move of "randomly rotating the data vectors."

Got that? No? Well, it doesn't really matter. All you need to know is that there's a promising new field of extremely complex computational mathematics, and it works the way compression algorithms have long worked — making new technology faster, lighter, easier to run.

First, it was ZIP file downloads, then the video compression that enabled the streaming revolution, and now it's AI. The result could allow a more powerful LLM to run entirely on your phone, or it could crash the global economy, or both at the same time. Isn't life in 2026 wild?

Chris Taylor
Chris Taylor

Chris is a veteran tech, entertainment and culture journalist, author of 'How Star Wars Conquered the Universe,' and co-host of the Doctor Who podcast 'Pull to Open.' Hailing from the U.K., Chris got his start as a sub editor on national newspapers. He moved to the U.S. in 1996, and became senior news writer for Time.com a year later. In 2000, he was named San Francisco bureau chief for Time magazine. He has served as senior editor for Business 2.0, and West Coast editor for Fortune Small Business and Fast Company. Chris is a graduate of Merton College, Oxford and the Columbia University Graduate School of Journalism. He is also a long-time volunteer at 826 Valencia, the nationwide after-school program co-founded by author Dave Eggers. His book on the history of Star Wars is an international bestseller and has been translated into 11 languages.

Mashable Potato

Recommended For You
The NAACP is fighting back against AI data centers
Construction on the xAI data center in Memphis.

Why SpaceX bought xAI: Data centers in space aren't the only reason
SpaceX and xAI logos

Researchers say they convinced Gemini to leak Google Calendar data (updated)
Google Gemini logo next to a man on a mobile device

4 highlights from Google CEO Sundar Pichai's talk at the AI Impact Summit 2026 in India
Google CEO Sundar Pichai

NVIDIA GTC keynote: Everything Jensen Huang announced from AI gaming to space data centers
NVIDIA CEO Jensen Huang with Olaf the snowman from 'Frozen'

More in Tech
Amazon's sister site is having a one-day sale, and this Bissell TurboClean deal is too good to skip
A woman using the Bissell TurboClean Cordless Hard Floor Cleaner Mop and Lightweight Wet/Dry Vacuum.

The best smartwatch you've never heard of is on sale for less than $50
Nothing CMF Watch 3 Pro in light green with blue and green abstract background

Reddit r/all takes another step into the grave
Reddit logo on phone screen


Google launches Gemma 4, a new open-source model: How to try it
Google Gemma

Trending on Mashable
NYT Connections hints today: Clues, answers for April 3, 2026
Connections game on a smartphone

Wordle today: Answer, hints for April 3, 2026
Wordle game on a smartphone

NYT Connections hints today: Clues, answers for April 2, 2026
Connections game on a smartphone

NYT Strands hints, answers for April 3, 2026
A game being played on a smartphone.

Wordle today: Answer, hints for April 2, 2026
Wordle game on a smartphone
The biggest stories of the day delivered to your inbox.
These newsletters may contain advertising, deals, or affiliate links. By clicking Subscribe, you confirm you are 16+ and agree to our Terms of Use and Privacy Policy.
Thanks for signing up. See you at your inbox!