AI PCs & Copilot+ 2026: NPUs, On-Device AI & Local Intelligence

A new category of personal computer has emerged — the AI PC, defined by a dedicated Neural Processing Unit (NPU) capable of running large AI models entirely on the device, without sending data to the cloud. In 2026, every major chip manufacturer ships NPUs, Microsoft has built AI features into Windows 11, and the race to bring local intelligence to consumers is accelerating.

1. What Is an AI PC?

The term AI PC refers to a personal computer equipped with a dedicated Neural Processing Unit (NPU) — a hardware accelerator specifically designed to run deep learning inference efficiently. AI PCs can execute large language model inference, image generation, real-time audio processing, and computer vision tasks locally, without an internet connection or cloud API call.

The industry does not have a single universal standard, but Microsoft's Copilot+ PC specification requires a minimum of 40 TOPS (Tera Operations Per Second) of NPU performance. Devices meeting this bar qualify for Copilot+ features in Windows 11.

AI PCs are distinct from earlier "AI-accelerated" machines (like those with discrete NVIDIA GPUs for local ML) in that the NPU is integrated on the main system-on-chip (SoC), keeping power consumption low enough for laptop battery life.

2. What Is an NPU?

A Neural Processing Unit (NPU) is a specialized processor optimized for the types of operations that dominate AI inference: matrix multiplications, convolutions, and activation functions. Compared to:

  • CPU — General-purpose. Flexible but inefficient for repeated tensor operations. 10–50 TOPS typical for AI workloads in a modern laptop CPU.
  • GPU — Parallel processing powerhouse, excellent for training and high-throughput inference, but power-hungry (50–350W typical). Not integrated in laptop SoCs at this power level.
  • NPU — Purpose-built for AI inference. Executes compute graphs with low power (2–5W), high throughput (40–120 TOPS), and low latency using fixed-function datapaths optimized for int8 and float16 arithmetic.

How NPUs work

An NPU typically consists of a large array of MAC (Multiply-Accumulate) units and specialized SRAM for model weights and activations. The processor executes pre-compiled computational graphs (produced by ONNX Runtime, TFLite, or vendor compilers) rather than general-purpose instructions. This specialization delivers 10–100× better power efficiency for AI inference than a general-purpose CPU.

3. Copilot+ PCs Explained

Microsoft announced Copilot+ PCs in May 2024 — a new product category defined by specific hardware and software capabilities. Requirements:

  • Minimum 40 TOPS NPU performance.
  • Minimum 16 GB RAM.
  • Minimum 256 GB storage.
  • Compatible processor (Qualcomm Snapdragon X, Intel Core Ultra 200V/300, AMD Ryzen AI 300+).

Copilot+ PCs unlock a suite of AI features exclusive to this tier in Windows 11, including Recall, Cocreator in Paint, Live Captions with real-time translation, Super Resolution for video, and generative eraser in Photos.

As of early 2026, over 50 Copilot+ PC models are available from Microsoft (Surface), Dell, HP, Lenovo, Samsung, ASUS, and Acer at prices from $999 to $3,000+.

4. Qualcomm Snapdragon X Elite & X Plus

Qualcomm's Snapdragon X Elite (launched May 2024) was the first Copilot+ PC chip and remains a best-in-class performer in late 2026:

  • CPU: 12-core Oryon CPU at up to 3.8 GHz. Rivalls Apple M3 in single-thread performance.
  • GPU: Adreno integrated GPU with 4.6 TFLOPS.
  • NPU: Qualcomm Hexagon NPU — 45 TOPS.
  • Memory bandwidth: Up to 136 GB/s (LPDDR5x 8448 MHz).
  • Power envelope: 23W sustained, much less under light loads.

The Snapdragon X Plus is a more affordable variant (8-core CPU, 45 TOPS NPU) for mainstream Copilot+ devices under $1,200.

In 2025, Qualcomm launched the Snapdragon X2 Elite with a 75 TOPS NPU and the first on-chip support for 4-bit quantised 7B parameter LLMs in real time.

5. Intel Core Ultra 200V Series

Intel's response to Snapdragon X is the Core Ultra 200V (codenamed Lunar Lake, late 2024):

  • CPU: 4 performance + 4 efficient cores (Lion Cove + Skymont).
  • GPU: Intel Arc 140V integrated GPU with real-time ray tracing and XeSS upscaling.
  • NPU: Intel AI Boost NPU 4.0 — 48 TOPS; total SoC AI performance ~120 TOPS (CPU + GPU + NPU combined).
  • Memory: On-package LPDDR5X (4 GB soldered), 96 GB/s bandwidth.
  • Power: Configurable 15–30W TDP.

The Core Ultra 200H (Arrow Lake) targets higher-performance laptops and desktops with up to 13 performance cores and a 32 TOPS NPU, better suited for gaming alongside AI workloads.

6. AMD Ryzen AI 300 Series

AMD's Ryzen AI 300 (codenamed Strix Point) is the strongest AMD entry in the AI PC race:

  • CPU: Up to 12 Zen 5 + Zen 5c cores.
  • GPU: Radeon 890M with 16 RDNA 3.5 compute units — the fastest integrated GPU in a laptop chip.
  • NPU: XDNA 2 architecture — 50 TOPS. Currently the highest NPU performance of shipping laptop chips.
  • Memory: LPDDR5X-8000, up to 96 GB/s bandwidth.
  • Power: 15–45W configurable TDP.

AMD also ships the Ryzen AI 9 HX 370 and AI 9 365 as top-tier AI PC processors targeting creators and developers who need strong CPU, GPU, and NPU simultaneously.

7. Apple M-Series Comparison

Apple's M-series chips have included a dedicated Neural Engine since M1 (2020). Apple does not participate in the Copilot+ framework, but macOS runs its own on-device AI stack:

  • M4 Neural Engine: 38 TOPS — below the 40 TOPS Copilot+ threshold.
  • M4 Pro Neural Engine: 38 TOPS per die; M4 Max scales further via memory bandwidth.
  • Apple Intelligence (iOS/macOS) runs on the Neural Engine for Siri enhancements, Writing Tools, and on-device image generation.
  • macOS uses Core ML, BNNS, and Metal Performance Shaders for on-device AI.

Apple's key advantage is unified memory architecture: the CPU, GPU, and Neural Engine all share the same memory pool at high bandwidth (up to 273 GB/s on M3 Max), enabling larger models to run on-device than Windows AI PCs with their soldered LPDDR5X.

8. NPU Performance Benchmarks

Chip NPU TOPS LLM (Phi-3 Mini tokens/s) Power (NPU active, W)
Qualcomm Snapdragon X Elite 45 ~40 t/s ~3.5 W
Intel Core Ultra 200V (48T NPU) 48 ~35 t/s ~4 W
AMD Ryzen AI 9 HX 370 50 ~45 t/s ~4.5 W
Qualcomm Snapdragon X2 Elite 75 ~70 t/s ~5 W
Apple M4 Pro Neural Engine 38 ~55 t/s (unified mem advantage) ~3 W

Note: LLM token/s figures are approximate for Phi-3 Mini 4K INT4 quantized model. Real-world performance varies by model size, quantization, and software stack optimization.

9. Windows Recall

Recall is Microsoft's most controversial and ambitious Copilot+ feature. It continuously takes screenshots of everything displayed on the screen and uses on-device AI (running on the NPU) to make this visual history searchable in natural language:

  • "Show me that Figma file from last Tuesday" → Recall surfaces screenshots from that session.
  • "Find the contract I was reading last week" → Recall identifies the document and jumps to the moment.
  • "What was the price shown in that email?" → Recall extracts the value from a screenshot.

Privacy design

  • All processing and storage is on-device only. Snapshots never leave the PC.
  • Recall data is stored in an encrypted database accessible only to the user, protected by Windows Hello authentication (biometrics required).
  • You can pause Recall at any time, set filters to exclude specific apps or websites, and delete all snapshots.
  • DRM-protected content (e.g., Netflix, Disney+) is automatically excluded from captures.

After a controversial preview launch in mid-2024 (initially delayed due to security concerns), Recall shipped as an opt-in feature in late 2024 with significantly hardened security architecture.

10. Other Copilot+ Features

Live Captions with real-time translation

Transcribes any audio playing on your PC (system audio or microphone) in real time, entirely on-device. Copilot+ adds real-time translation from 44 languages into English — useful for international video calls and foreign-language content.

Cocreator in Paint

Generate and edit images in Microsoft Paint using natural language prompts, powered by a local diffusion model running on the NPU. Works entirely offline.

Super Resolution

Upscale any video playing in Windows to higher resolution using an on-device AI upscaler. Similar to NVIDIA RTX Video Super Resolution but powered by the NPU instead of a discrete GPU.

Generative Erase in Photos

Remove unwanted objects from photos using AI inpainting, processed locally on the NPU.

Windows Studio Effects

Real-time background blur, gaze correction (makes you appear to look directly at the camera even when reading), voice focus (background noise suppression), and portrait light — all running live on the NPU during video calls.

11. On-Device LLMs: Phi-3, Phi-4 & Local AI

Microsoft Research's Phi small language model family is purpose-designed for on-device inference on AI PC hardware:

Phi-3 Mini (3.8B parameters)

Runs at 30–50 tokens/second on AI PCs. Delivers coding assistance, Q&A, and summarization quality competitive with GPT-3.5 on most benchmarks, entirely offline. Available in 4K and 128K context variants.

Phi-3.5 Mini / MoE

The 3.5 generation adds multilingual capability and improved instruction following. The Mixture-of-Experts (MoE) variant activates only 6.6B of its 41.9B total parameters per token, enabling higher quality within NPU memory constraints.

Phi-4 (14B parameters)

Launched in late 2024. Achieves GPT-4o-class performance on math and coding benchmarks. Runs in INT4 quantization on AI PCs with 16 GB+ unified memory, though more slowly (~10–15 t/s on NPU; ~25 t/s using combined CPU+GPU offload).

Other notable on-device models (2026)

  • Mistral 7B INT4 — Strong general-purpose model, runs well on AMD Ryzen AI 300.
  • Llama 3.2 3B — Meta's smallest Llama, designed for edge deployment. Excellent instruction following at 3B scale.
  • Gemma 3 2B/7B — Google's on-device model family, optimised for Arm NPUs.

12. Developer APIs: DirectML, ONNX Runtime, WinML

Developers targeting AI PCs have several options for accessing NPU acceleration:

ONNX Runtime

The cross-platform ML inference engine from Microsoft. Export any model to ONNX format, then deploy with the QNNExecutionProvider (Qualcomm), DmlExecutionProvider (DirectML/Intel), or ROCmExecutionProvider (AMD). Single API, multiple backends.

import onnxruntime as ort

# Use QNN execution provider for Snapdragon NPU
session = ort.InferenceSession(
    "model.onnx",
    providers=["QNNExecutionProvider", "CPUExecutionProvider"]
)

outputs = session.run(None, {"input": input_data)}

DirectML

Microsoft's cross-vendor GPU/NPU API for Windows. Sits beneath ONNX Runtime's DmlExecutionProvider. Works on Intel, AMD, Qualcomm, and NVIDIA hardware from a single codebase.

Windows ML (WinML)

A higher-level Windows API for ML inference, layered on DirectML. Accessible from C#, C++, and Python. Simplest path for Windows-native app developers.

Vendor SDKs

  • Qualcomm AI Stack / QNN SDK — Direct access to Hexagon DSP, maximum performance on Snapdragon.
  • Intel OpenVINO — Intel's inference optimization toolkit; supports CPU, integred GPU, and NPU on Core Ultra.
  • AMD ROCm + MLIR-AIE — AMD's AI PC developer stack for Ryzen AI NPU access.

13. Real-World Use Cases for NPU

Real-time video call enhancement

Background removal, face framing (auto-crop to keep your face centred), eye contact correction, and noise cancellation all running simultaneously at <5W — during a 3-hour meeting your battery barely budges.

On-device code completion

Code editors like VS Code (with GitHub Copilot's local mode) and Cursor can route completion requests to a local Phi-4 model when offline or when privacy requirements prevent cloud API calls.

Document summarization

Process long PDFs, meeting transcripts, or research papers with a local LLM. No subscription, no data leaving the device, no token costs.

Real-time transcription

Local Whisper models (OpenAI's speech recognition) run at near-real-time speed on NPUs, enabling offline transcription of meetings and voice memos with higher accuracy than cloud services for specialist vocabulary.

Image generation

Stable Diffusion XL Turbo with INT8 quantization generates 512×512 images in 3–5 seconds on a Snapdragon X Elite NPU, entirely offline.

14. Privacy: Why Local AI Matters

Running AI on-device instead of the cloud has fundamental privacy advantages:

  • Data sovereignty — Images, documents, voice recordings, and conversations never leave your device. No server retention, no training on your data.
  • No API keys / usage billing — Local inference has zero marginal cost per query.
  • Air-gap capability — Works fully offline. Important for classified environments, healthcare settings with strict data regulations, and enterprise IP protection.
  • Regulatory compliance — GDPR, HIPAA, and similar regulations may restrict sending certain data to cloud AI providers. On-device processing sidesteps these concerns entirely.

15. Battery & Efficiency

One of the most significant benefits of the AI PC era is battery life. The Snapdragon X Elite's dedication to efficiency is remarkable:

  • Microsoft Surface Pro 11 (Snapdragon X Elite): up to 14 hours mixed workload battery life.
  • Running Phi-3 Mini inference at 30 t/s on the NPU consumes approximately 3–4 W — a fraction of what a discrete GPU (typically 50–150W) or even a CPU (15–25W on an AI workload) would consume.
  • Intel Core Ultra 200V in 15W mode achieves comparable efficiency to Apple M3 for the first time in x86 history.

16. Notable AI PC Devices in 2026

Snapdragon X powered

  • Microsoft Surface Pro 11 / Laptop 7 — The reference Copilot+ PC. Excellent build quality, 14h battery life, $999 starting.
  • Samsung Galaxy Book5 Pro — Premium 16-inch AMOLED display, $1,499.
  • ASUS ProArt PZ13 — Highly portable 13-inch 2-in-1 creator device with OLED.

Intel Core Ultra 200V powered

  • Dell XPS 13 (2025) — Flagship thin-and-light with Intel's best NPU, exceptional display.
  • HP OmniBook Ultra 14 — Business Copilot+ laptop with WiFi 7 and 5G option.
  • Lenovo ThinkPad X1 Carbon Gen 13 — Enterprise staple with Copilot+ AI features.

AMD Ryzen AI 300 powered

  • ASUS ROG Zephyrus G14 — Gaming + AI PC combination with the fastest integrated GPU available.
  • Lenovo IdeaPad 5x Gen 9 — Value Copilot+ laptop at $899.
  • MSI Prestige 16 AI Evo — Creator laptop with Ryzen AI + 32 GB RAM.

17. Buying Guide

Choose Snapdragon X if you:

  • Prioritise battery life above all else (>10 hours regular use).
  • Run ARM-native software (most modern apps including Office, Chrome, VS Code are ARM-native in 2026).
  • Want maximum NPU token generation speed for on-device LLMs.

Choose Intel Core Ultra 200V if you:

  • Need x86 application compatibility without emulation (niche legacy software, certain enterprise tools).
  • Want the best integration with Intel's software stack (OpenVINO, TensorFlow with Intel extensions).
  • Prioritise the overall platform (better PCIe expansion on some designs).

Choose AMD Ryzen AI 300 if you:

  • Need the best integrated GPU performance (gaming, GPU-accelerated creative work alongside AI).
  • Want x86 compatibility and the highest NPU TOPS.
  • Need 32 GB+ RAM options (more widely available in AMD configurations at this tier).

Key specs to check

  • NPU TOPS ≥ 40 for Copilot+ compliance.
  • RAM: 16 GB minimum; 32 GB recommended for 7B+ parameter LLMs.
  • Storage: 512 GB minimum; LLM model files are 4–8 GB each.
  • Display: OLED or high-refresh IPS for sustained productivity.

18. What Comes Next?

  • 100+ TOPS NPUs — Qualcomm's next-generation Snapdragon X3 (rumoured late 2026) is targeting 100+ TOPS, enabling real-time 13B parameter LLM inference.
  • On-device multimodal AI — Image-text models (LLaVA, Phi-3 Vision) running locally become practical as NPU performance grows and model quantization improves.
  • AI PC desktops — Intel and AMD are bringing high-TOPS NPUs to desktop CPUs; ASUS and MSI have launched AI PC desktops in 2026.
  • AI agents on-device — Rather than a simgle query→response cycle, persistent on-device agents that observe your screen and proactively assist (like an always-on Recall-integrated assistant) are in active development at Microsoft.
  • EU AI Act compliance — On-device AI automatically sidesteps many GPAI transparency requirements since model outputs are processed locally and not provided as an API service.

19. FAQ

Do I need a Copilot+ PC to use AI features?
No — cloud AI (ChatGPT, Copilot.microsoft.com, Claude) works on any PC with an internet connection. Copilot+ unlocks on-device, offline AI features (Recall, real-time translation, local LLMs). If you only use cloud AI, any modern PC suffices.
Will my existing Windows apps run on Snapdragon X (ARM)?
Most mainstream apps (Chrome, Firefox, Edge, Office 365, VS Code, Adobe, Slack, Zoom) are natively ARM-compiled in 2026. x86 apps run via Windows on ARM x86 emulation. Gaming compatibility has improved significantly but remains the weakest area for Snapdragon X compared to Intel/AMD x86.
Can I upgrade the NPU later?
No. The NPU is integrated into the SoC and is not user-upgradeable. It is tied to the chip you buy.
Is 16 GB RAM enough for on-device LLMs?
Yes for models up to ~7B parameters in INT4 quantization (Phi-3 Mini, Mistral 7B, Llama 3.2 3B). For 13B models you need 32 GB, and for 30B+ you need 64 GB or a dedicated AI workstation.
Do I need a Copilot+ PC to run Ollama locally?
No. Ollama works on any PC (Windows/Mac/Linux) using the CPU or GPU. A Copilot+ PC accelerates inference on the NPU specifically via ONNX Runtime with QNN/DirectML backends, not through Ollama's standard GGUF CPU path.

20. Glossary

NPU (Neural Processing Unit)
A dedicated hardware accelerator for AI inference workloads, integrated into modern laptop SoCs alongside the CPU and GPU.
TOPS (Tera Operations Per Second)
A measure of an NPU's performance: how many trillion arithmetic operations it can execute per second. The Copilot+ minimum is 40 TOPS.
Copilot+ PC
Microsoft's brand for Windows PCs meeting minimum NPU, RAM, and storage requirements that unlock exclusive AI features in Windows 11.
ONNX Runtime
Microsoft's cross-platform ML inference engine that can target NPUs, GPUs, and CPUs using the same model file and API.
INT4 / INT8 quantization
Techniques for compressing neural network weights from 32-bit or 16-bit floating point to 4-bit or 8-bit integers, reducing model size and inference cost at the cost of minor accuracy degradation.
Recall
A Windows 11 Copilot+ feature that continuously captures screenshots and uses on-device AI to make screen history searchable in natural language.
Apple Intelligence
Apple's brand for on-device and hybrid (Apple's private cloud) AI features available on M-series Macs and A17+ iPhones/iPads running iOS 18/macOS Sequoia and later.

21. References & Further Reading

22. Conclusion

AI PCs mark a genuine architectural shift in personal computing — the first time a dedicated, power-efficient AI accelerator has been a standard component of every new laptop. In 2026, the NPU is as fundamental to a modern laptop as the Wi-Fi chip.

For most users, the immediate practical benefit is better video calls, faster video editing, and smart assistants that respect privacy. For developers, the NPU opens a new deployment target — models running locally, off-line, at zero marginal cost, with data that never leaves the device.

When buying your next laptop or desktop, put NPU TOPS on your checklist alongside RAM and storage. In two years, local AI capability will feel as indispensable as a fast SSD feels today.