Skip to content
Main Navigation Puget Systems Logo
  • Solutions
    • Content Creation
      • Photo Editing
        • Recommended Systems For:
        • Adobe Lightroom Classic
        • Adobe Photoshop
        • Stable Diffusion
      • Video Editing & Motion Graphics
        • Recommended Systems For:
        • Adobe After Effects
        • Adobe Premiere Pro
        • DaVinci Resolve
        • Foundry Nuke
      • 3D Design & Animation
        • Recommended Systems For:
        • Autodesk 3ds Max
        • Autodesk Maya
        • Blender
        • Cinema 4D
        • Houdini
        • ZBrush
      • Real-Time Engines
        • Recommended Systems For:
        • Game Development
        • Unity
        • Unreal Engine
        • Virtual Production
      • Rendering
        • Recommended Systems For:
        • Keyshot
        • OctaneRender
        • Redshift
        • V-Ray
      • Digital Audio
        • Recommended Systems For:
        • Ableton Live
        • FL Studio
        • Pro Tools
    • Engineering
      • Architecture & CAD
        • Recommended Systems For:
        • Autodesk AutoCAD
        • Autodesk Inventor
        • Autodesk Revit
        • SOLIDWORKS
      • Visualization
        • Recommended Systems For:
        • Enscape
        • Lumion
        • Twinmotion
      • Photogrammetry & GIS
        • Recommended Systems For:
        • ArcGIS Pro
        • Agisoft Metashape
        • Pix4D
        • RealityCapture
    • AI & HPC
      • Recommended Systems For:
      • Data Science
      • Generative AI
      • Large Language Models
      • Machine Learning / AI Dev
      • Scientific Computing
    • More
      • Recommended Systems For:
      • Compact Size
      • Live Streaming
      • NVIDIA RTX Studio
      • Quiet Operation
      • Virtual Reality
    • Business & Enterprise
      We can empower your company
    • Government & Education
      Services tailored for your organization
  • Products
    • Puget Mobile
      Powerful laptop workstations
      • Puget Mobile 16″
    • Puget Workstations
      High-performance desktop PCs
      • AMD Ryzen
        • Ryzen 9000:
        • Small Form Factor
        • Mini Tower
        • Mid Tower
        • Full Tower
      • AMD Threadripper
        • Threadripper 7000:
        • Mid Tower
        • Full Tower
        • Threadripper PRO 7000WX:
        • Full Tower
      • AMD EPYC
        • EPYC 9004:
        • Full Tower
      • Intel Core Ultra
        • Core Ultra Series 2:
        • Small Form Factor
        • Mini Tower
        • Mid Tower
        • Full Tower
      • Intel Xeon
        • Xeon W-2500:
        • Mid Tower
        • Xeon W-3500:
        • Full Tower
    • Custom Computers
    • Puget Rackstations
      Workstations in rackmount chassis
      • AMD Rackstations
        • Ryzen 7000 / EPYC 4004:
        • R550-6U 5-Node
        • Ryzen 9000:
        • R132-4U
        • Threadripper 7000:
        • T121-4U
        • Threadripper PRO 7000WX:
        • T141-4U
        • T140-5U (Dual 5090s)
      • Intel Rackstations
        • Core Ultra Series 2:
        • C132-4U
        • Xeon W-3500:
        • X131-4U
        • X141-5U
    • Custom Rackmount Workstations
    • Puget Servers
      Enterprise-class rackmount servers
      • Rackmount Servers
        • AMD EPYC:
        • E200-1U
        • E140-2U
        • E280-4U
        • Intel Xeon:
        • X200-1U
    • Comino Grando GPU Servers
    • Custom Servers
    • Puget Storage
      Solutions from desktop to datacenter
      • Network-Attached Storage
        • Synology NAS Units:
        • 4-bay DiskStation
        • 8-bay DiskStation
        • 12-bay DiskStation
        • 4-bay RackStation
        • 12-bay FlashStation
      • Software-Defined Storage
        • Datacenter Storage:
        • 12-Bay 2U
        • 24-Bay 2U
        • 36-Bay 4U
    • Recommended Third Party Peripherals
      Curated list of accessories for your workstation
    • Puget Gear
      Quality apparel with Puget Systems branding
  • Publications
    • Articles
    • Blog Posts
    • Case Studies
    • HPC Blog
    • Podcasts
    • Press
    • PugetBench
  • Support
    • Contact Support
    • Support Articles
    • Warranty Details
    • Onsite Services
    • Unboxing
  • About Us
    • About Us
    • Contact Us
    • Our Customers
    • Enterprise
    • Gov & Edu
    • Press Kit
    • Testimonials
    • Careers
  • Talk to an Expert
  • My Account
  1. Home
  2. /
  3. Hardware Articles
  4. /
  5. Effects of CPU speed on GPU inference in llama.cpp

Effects of CPU speed on GPU inference in llama.cpp

Posted on July 1, 2024 (September 25, 2024) by Jon Allman
Always look at the date when you read an article. Some of the content in this article is most likely out of date, as it was written on July 1, 2024. For newer information, see our more recent articles.

Table of Contents

  • Introduction
  • Test Setup
  • Results
  • Final Thoughts

Introduction

In our recent Puget Mobile vs. MacBook Pro for AI workflows article, we included performance testing with a smaller LLM, Meta-Llama-3-8B-Instruct, as a point of comparison between the two systems. Interestingly, when we compared Meta-Llama-3-8B-Instruct between exllamav2 and llama.cpp on the Puget Mobile, we found that they both achieved the exact same token generation speeds despite the difference in inference libraries, theoretically due to a CPU bottleneck.

Image
Open Full Resolution

To determine if and when a specific CPU or platform change could have a sizable impact on LLM performance, we want to expand on that testing today, looking at another smaller-sized LLM, Phi-3-mini-4k-instruct (a 3.8B parameter model), across several platforms. If it’s true that GPU inference with smaller LLMs puts a heavier strain on the CPU, then we should find that Phi-3-mini is even more sensitive to CPU performance than Meta-Llama-3-8B-Instruct.

Although llama.cpp can be run as a CPU-only inference library, in addition to GPU or CPU/GPU hybrid modes, this testing was focused on determining what impact (if any) the CPU/platform choice has specifically during GPU inference.

Test Setup

Like in our notebook comparison article, we used the llama-bench executable contained within the precompiled CUDA build of llama.cpp (build 3140) for our testing. However, in addition to the default options of 512 and 128 tokens for prompt processing (pp) and token generation (tg), respectively, we also included tests with 4096 tokens for each, filling the context window of the model. As we can see in the charts below, this has a significant performance impact and, depending on the use-case of the model, may better represent the actual performance in day-to-day use.

For this testing, we looked at a wide range of modern platforms, including Intel Core, Intel Xeon W, AMD Ryzen, and AMD Threadripper PRO. We tested with both an NVIDIA GeForce RTX 4080 and RTX 4090 in order to see if different GPUs had an impact on performance. Note that we are only including single GPU configurations, as these smaller models are unlikely to be paired with dual or quad-GPU configurations. Full system specs for each platform we tested are listed below:

Shared System Specs

GPUs:
NVIDIA GeForce RTX 4080 16GB Founders Edition
NVIDIA GeForce RTX 4090 24GB Founders Edition
Driver Version: Studio 555.85
PSU: Super Flower LEADEX Platinum 1600W
Storage: Samsung 980 Pro 2TB
OS: Windows 11 Pro 64-bit (22631)

Platform Specs

Intel Core

Motherboard: Asus ProArt Z690-Creator WiFi
BIOS version: 3603
CPUs:
Intel Core i5-14600K
Intel Core i7-14700K
Intel Core i9-14900K
RAM: 2x DDR5-5600 32GB (64GB total)
Running at 5600 Mbps

Ryzen Desktop

Motherboard: Asus ProArt X670E-Creator WiFi
BIOS version: 1602
CPUs:
AMD Ryzen 7 7700X
AMD Ryzen 9 7900X
AMD Ryzen 9 7950X
RAM: 2x DDR5-5600 32GB (64GB total)
Running at 5200 Mbps

Xeon

Motherboard: Supermicro X12SPA-TF 64L
BIOS version: 1.4b
CPU: Intel Xeon W-3335
RAM: 8x DDR4-3200 16GB ECC Reg. (128GB total)
Running at 3200 Mbps

Threadripper

Motherboard: Asus Pro WS WRX90E-SAGE SE
BIOS version: 0404
CPUs:
AMD Ryzen Threadripper Pro 7985WX
AMD Ryzen Threadripper Pro 7995WX
RAM: 8x DDR5-5600 16GB (128GB total)
Running at 5200 Mbps

Benchmark Software

llama.cpp
build a9cae480 (3140)
Phi-3-mini-4k-instruct
Phi-3-mini-4k-instruct-q4.gguf
Laptop Icon in Puget System Colors

Looking for a Laptop Workstation?

We build computers tailor-made for your workflow. 

Configure a System!
Talking Head Icon in Puget Systems Colors

Don’t know where to start?
We can help!

Get in touch with our technical consultants today.

Talk to an Expert

Results

llama.cpp RTX 4080 prompt processing chart
llama.cpp RTX 4090 prompt processing chart
llama.cpp RTX 4080 prompt processing chart
llama.cpp RTX 4090 prompt processing chart
Previous Next
System Image
llama.cpp RTX 4080 prompt processing chart
Open Full Resolution
llama.cpp RTX 4090 prompt processing chart
Open Full Resolution
Previous Next

Starting with prompt processing, with either the NVIDIA RTX GeForce 4080 or 4090, we only found minuscule differences in the results between platforms. Especially considering that we are working with speeds in the several thousand tokens per second, differences of a few hundred tokens per second at most are not only imperceptible, but fall within the margin of error of the tests.

As a side note, these tests are an impressive showcase of the RTX 4090’s performance, with its pp4096 test results achieving 90% of the performance of the RTX 4080’s pp512 test despite an eightfold difference in the size of the context.

llama.cpp RTX 4080 token generation chart
llama.cpp RTX 4090 token generation chart
llama.cpp RTX 4080 token generation chart
llama.cpp RTX 4090 token generation chart
Previous Next
System Image
llama.cpp RTX 4080 token generation chart
Open Full Resolution
llama.cpp RTX 4090 token generation chart
Open Full Resolution
Previous Next

Moving on to token generation, we begin to see more significant differences between the platforms, generally tracking with the processor’s single-core performance. For example,  the Xeon W-3335, with its modest Max Turbo speed of 4.0 GHz, consistently obtains the lowest performance of all of the CPUs tested. However, even this only represents about a 5% difference when compared to the other platforms.

Although the i9-14900K has the highest single-core clock speed of the processors we tested, it was surprising that it virtually tied with the Threadripper Pro 7985WX in the RTX 4080-based test and even lagging slightly behind it in the RTX 4090-based test. Another surprise was finding the Ryzen 7 7700X leading ahead of the Ryzen 9 7950X despite having a lower Max Boost. This might have been explained by the fact that the Ryzen 9 7950X uses a dual Core Chiplet Die (CCD) design, which could be introducing some overhead due to inter-chiplet communication; however, the Ryzen 7 7900X is a dual CCD design as well, and it performed similarly to the Ryzen 7 7700X.

To verify this, we did some additional testing, artificially limiting the 7900X to just 2.5GHz and 1.0GHz in two tests with an RTX 4090. When we did so, we received markedly lower results (pp4096 – 167 t/s & 137 t/s, respectively), confirming that CPU performance does indeed affect GPU inference speed. It is simply that most modern CPUs of the latest generation are fast enough that the difference between most of the CPUs we tested is inconsequential. Ultimately, outside of the Xeon W-3335, the results between the processors tested are so close that making meaningful distinctions between them isn’t feasible with this test.

Final Thoughts

Although single-core CPU speed does affect performance when executing GPU inference with llama.cpp, the impact is relatively small. It appears that almost any relatively modern CPU will not restrict performance in any significant way, and the performance of these smaller models is such that the user experience should not be affected. We may explore whether this holds true for other inference libraries, such as exllamav2, in a future article.

In general, this means that if you are using smaller LLM models that fit within a typical consumer-class GPU, you don’t have to worry about what base platform and CPU you are using. Except in a few isolated instances, it should largely be inconsequential and will have a minimal impact on how fast your system is able to run the LLM. However, this test serves as a good reminder that no single component of a system exists in a vacuum and will be affected in some way by the other components in the system as a whole.

Tower Computer Icon in Puget Systems Colors

Looking for an AI and Scientific Computing workstation?

We build computers tailor-made for your workflow. 

Configure a System
Talking Head Icon in Puget Systems Colors

Don’t know where to start?
We can help!

Get in touch with one of our technical consultants today.

Talk to an Expert

Related Content

  • Z890 vs. B860 vs. H810
  • AMD Ryzen 9 9950X3D and 9900X3D Content Creation Review
  • Is it Worth Upgrading to NVIDIA’s GeForce RTX 50 Series for 3D Artists?
  • Understanding Modern Desktop PC Hardware for Workstations
View All Related Content

Latest Content

  • Do Video Editors Need GeForce RTX 50 Series GPUs?
  • Adobe Premiere Pro and After Effects – What’s New In Version 25.2?
  • The Future of LED Walls: Arena & Nuke Stage Go Beyond Game Engines
  • 2025 Tariff Impacts at Puget Systems
View All
Tags: AMD, GPU, Intel, llama.cpp, NVIDIA, Performance

Who is Puget Systems?

Puget Systems builds custom workstations, servers and storage solutions tailored for your work.

We provide:

Extensive performance testing
making you more productive and giving better value for your money

Reliable computers
with fewer crashes means more time working & less time waiting

Support that understands
your complex workflows and can get you back up & running ASAP

A proven track record
as shown by our case studies and customer testimonials

Get Started

Browse Systems

Puget Systems Mobile Laptop Workstation Icon

Mobile

Puget Systems Tower Workstation Icon

Workstations

Puget Systems Rackmount Workstation Icon

Rackstations

Puget Systems Rackmount Server Icon

Servers

Puget Systems Rackmount Storage Icon

Storage

Latest Articles

  • Do Video Editors Need GeForce RTX 50 Series GPUs?
  • Adobe Premiere Pro and After Effects – What’s New In Version 25.2?
  • The Future of LED Walls: Arena & Nuke Stage Go Beyond Game Engines
  • 2025 Tariff Impacts at Puget Systems
  • Z890 vs. B860 vs. H810
View All

Post navigation

 AMD X870E vs X870 vs X670E vs X670 vs B650E vs B650DaVinci Resolve Studio 18.6 – Consumer GPU Performance Analysis 
Puget Systems Logo
Build Your Own PC Site Map FAQ
facebook instagram linkedin rss twitter youtube

Optimized Solutions

  • Adobe Premiere
  • Adobe Photoshop
  • Solidworks
  • Autodesk AutoCAD
  • Machine Learning

Workstations

  • Content Creation
  • Engineering
  • Scientific PCs
  • More

Support

  • Online Guides
  • Request Support
  • Remote Help

Publications

  • All News
  • Puget Blog
  • HPC Blog
  • Hardware Articles
  • Case Studies

Policies

  • Warranty & Return
  • Terms and Conditions
  • Privacy Policy
  • Delivery Times
  • Accessibility

About Us

  • Testimonials
  • Careers
  • About Us
  • Contact Us
  • Newsletter

© Copyright 2025 - Puget Systems, All Rights Reserved.