HPC Posts Archive

Exploring Hybrid CPU/GPU LLM Inference

Posted on March 20, 2025 by Jon Allman

A brief look into using a hybrid GPU/VRAM + CPU/RAM approach to LLM inference with the KTransformers inference library.

What’s the deal with NPUs?

Posted on October 25, 2024 by Jon Allman

An introduction to NPU hardware and its growing presence outside of mobile computing devices.

Local alternatives to Cloud AI services

Posted on April 11, 2024 by Jon Allman

Presenting local AI-powered software options for tasks such as image & text generation, automatic speech recognition, and frame interpolation.

HPL Amdhal's Law scaling chart of results in GFLOPS for TrPRO 7995WX, 7985WX, Tr 7980X and Xeon w9-3495X

AMD Zen4 Threadripper PRO vs Intel Xeon-w9 For Science and Engineering

Posted on March 7, 2024 by Dr. Donald Kinghorn

The performance improvement with the new Zen4 TrPRO over the Zen3 TrPRO is very impressive!
My first recommendation for a Scientific and Engineering workstation CPU would now be the AMD Zen4 architecture as either Zen4 Threadripper PRO or Zen4 EPYC for multi-socket systems.

Benchmarking with TensorRT-LLM

Posted on February 16, 2024 by Jon Allman

Evaluating the speed of GeForce RTX 40-Series GPUs using NVIDIA’s TensorRT-LLM tool for benchmarking GPU inference performance.

Experiences with Multi-GPU Stable Diffusion Training

Posted on January 29, 2024 by Jon Allman

Results and thoughts with regard to testing a variety of Stable Diffusion training methods using multiple GPUs.

LLM Server Setup Part 2 — Container Tools

Posted on November 20, 2023 by Dr. Donald Kinghorn

This post is Part 2 in a series on how to configure a system for LLM deployments and development usage. Part 2 is about installing and configuring container tools, Docker and NVIDIA Enroot.

LLM Server Setup Part 1 – Base OS

Posted on November 15, 2023 by Dr. Donald Kinghorn

This post is Part 1 in a series on how to configure a system for LLM deployments and development usage. The configuration will be suitable for multi-user deployments and also useful for smaller development systems. Part 1 is about the base Linux server setup.

Can You Run A State-Of-The-Art LLM On-Prem For A Reasonable Cost?

Posted on July 17, 2023 by Dr. Donald Kinghorn

In this post address the question that’s been on everyone’s mind; Can you run a state-of-the-art Large Language Model on-prem? With *your* data and *your* hardware? At a reasonable cost?