Does Intel NPU run LLMs faster than CPU?

No—benchmarks show CPU faster at generation, 20x quicker loads. NPU ties on speed but compilation kills usability.

How to fix Intel NPU LLM crashes?

Use --sym --ratio 1.0 --group-size 128 in optimum export, then openvino-genai.LLMPipeline. Standard optimum-intel fails.

Yes—22 tok/s on 1.5B, instant loads. Beats OpenVINO NPU/CPU hands down for local inference.

🤝 Community & Governance

You'd think Intel's NPU would crush local LLMs. Wrong. On a Core Ultra laptop, it loads in 96 seconds and trails the CPU.

theAIcatchup Apr 10, 2026 3 min read

NPU loads models 20x slower (96s vs 5s) with no generation speed gain over CPU. 𝕏
llama.cpp crushes all: 22 tok/s, 2s loads on Intel Core Ultra. 𝕏
Fix NPU: Special export flags + openvino-genai; standard tools crash. 𝕏

Published by

Community-driven. Code-first.

#Core Ultra #Intel NPU #LLM inference #Llama.cpp #OpenVINO

Get the best Open Source stories of the week in your inbox — no noise, no spam.

Originally reported by Dev.to