What percentage of LLM inference power is actual computation?

Just 0.2%. 99.8% hauls data—we're memory taxis, not math machines.

How much power does a GPT-4-like service guzzle yearly?

35 GWh for GPUs alone on 100M queries/day. Equals 3,500 homes' usage.

Yep, without hacks. Expect 2027 crunch unless photonics or analogs deliver.

We all figured bandwidth or VRAM would cap LLMs. Nope. Power's the brick wall, and it's mostly pissed away shuffling weights—not doing math.

theAIcatchup Apr 08, 2026 3 min read 12 views

Published by

Community-driven. Code-first.

#Dennard scaling #GPU TDP #LLM inference #NVIDIA GPUs #power consumption #power wall

Get the best Open Source stories of the week in your inbox — no noise, no spam.

Originally reported by Dev.to