Microsoft Archives | Puget Systems

Exploring Hybrid CPU/GPU LLM Inference

Posted on March 20, 2025 by Jon Allman

A brief look into using a hybrid GPU/VRAM + CPU/RAM approach to LLM inference with the KTransformers inference library.

What’s the deal with NPUs?

Posted on October 25, 2024 by Jon Allman

An introduction to NPU hardware and its growing presence outside of mobile computing devices.