🤝 Community & Governance
Intel NPU's LLM Reality Check: 96-Second Loads and CPU Wins on Core Ultra
You'd think Intel's NPU would crush local LLMs. Wrong. On a Core Ultra laptop, it loads in 96 seconds and trails the CPU.
theAIcatchup
Apr 10, 2026
3 min read
⚡ Key Takeaways
-
NPU loads models 20x slower (96s vs 5s) with no generation speed gain over CPU.
𝕏
-
llama.cpp crushes all: 22 tok/s, 2s loads on Intel Core Ultra.
𝕏
-
Fix NPU: Special export flags + openvino-genai; standard tools crash.
𝕏
The 60-Second TL;DR
- NPU loads models 20x slower (96s vs 5s) with no generation speed gain over CPU.
- llama.cpp crushes all: 22 tok/s, 2s loads on Intel Core Ultra.
- Fix NPU: Special export flags + openvino-genai; standard tools crash.
Published by
theAIcatchup
Community-driven. Code-first.
Worth sharing?
Get the best Open Source stories of the week in your inbox — no noise, no spam.