Skip to content
Main Navigation Puget Systems Logo
  • Solutions
    • Content Creation
      • Photo Editing
        • Recommended Systems For:
        • Adobe Lightroom Classic
        • Adobe Photoshop
        • Stable Diffusion
      • Video Editing & Motion Graphics
        • Recommended Systems For:
        • Adobe After Effects
        • Adobe Premiere Pro
        • DaVinci Resolve
        • Foundry Nuke
      • 3D Design & Animation
        • Recommended Systems For:
        • Autodesk 3ds Max
        • Autodesk Maya
        • Blender
        • Cinema 4D
        • Houdini
        • ZBrush
      • Real-Time Engines
        • Recommended Systems For:
        • Game Development
        • Unity
        • Unreal Engine
        • Virtual Production
      • Rendering
        • Recommended Systems For:
        • Keyshot
        • OctaneRender
        • Redshift
        • V-Ray
      • Digital Audio
        • Recommended Systems For:
        • Ableton Live
        • FL Studio
        • Pro Tools
    • Engineering
      • Architecture & CAD
        • Recommended Systems For:
        • Autodesk AutoCAD
        • Autodesk Inventor
        • Autodesk Revit
        • SOLIDWORKS
      • Visualization
        • Recommended Systems For:
        • Enscape
        • Lumion
        • Twinmotion
      • Photogrammetry & GIS
        • Recommended Systems For:
        • ArcGIS Pro
        • Agisoft Metashape
        • Pix4D
        • RealityCapture
    • AI & HPC
      • Recommended Systems For:
      • Data Science
      • Generative AI
      • Large Language Models
      • Machine Learning / AI Dev
      • Scientific Computing
    • More
      • Recommended Systems For:
      • Compact Size
      • Live Streaming
      • NVIDIA RTX Studio
      • Quiet Operation
      • Virtual Reality
    • Business & Enterprise
      We can empower your company
    • Government & Education
      Services tailored for your organization
  • Products
    • Puget Mobile
      Powerful laptop workstations
      • Puget Mobile 16″
    • Puget Workstations
      High-performance desktop PCs
      • AMD Ryzen
        • Ryzen 9000:
        • Small Form Factor
        • Mini Tower
        • Mid Tower
        • Full Tower
      • AMD Threadripper
        • Threadripper 7000:
        • Mid Tower
        • Full Tower
        • Threadripper PRO 7000WX:
        • Full Tower
      • AMD EPYC
        • EPYC 9004:
        • Full Tower
      • Intel Core Ultra
        • Core Ultra Series 2:
        • Small Form Factor
        • Mini Tower
        • Mid Tower
        • Full Tower
      • Intel Xeon
        • Xeon W-2500:
        • Mid Tower
        • Xeon W-3500:
        • Full Tower
    • Custom Computers
    • Puget Rackstations
      Workstations in rackmount chassis
      • AMD Rackstations
        • Ryzen 7000 / EPYC 4004:
        • R550-6U 5-Node
        • Ryzen 9000:
        • R132-4U
        • Threadripper 7000:
        • T121-4U
        • Threadripper PRO 7000WX:
        • T141-4U
        • T140-5U (Dual 5090s)
      • Intel Rackstations
        • Core Ultra Series 2:
        • C132-4U
        • Xeon W-3500:
        • X131-4U
        • X141-5U
    • Custom Rackmount Workstations
    • Puget Servers
      Enterprise-class rackmount servers
      • Rackmount Servers
        • AMD EPYC:
        • E200-1U
        • E140-2U
        • E280-4U
        • Intel Xeon:
        • X200-1U
    • Comino Grando GPU Servers
    • Custom Servers
    • Puget Storage
      Solutions from desktop to datacenter
      • Network-Attached Storage
        • Synology NAS Units:
        • 4-bay DiskStation
        • 8-bay DiskStation
        • 12-bay DiskStation
        • 4-bay RackStation
        • 12-bay FlashStation
      • Software-Defined Storage
        • Datacenter Storage:
        • 12-Bay 2U
        • 24-Bay 2U
        • 36-Bay 4U
    • Recommended Third Party Peripherals
      Curated list of accessories for your workstation
    • Puget Gear
      Quality apparel with Puget Systems branding
  • Publications
    • Articles
    • Blog Posts
    • Case Studies
    • HPC Blog
    • Podcasts
    • Press
    • PugetBench
  • Support
    • Contact Support
    • Support Articles
    • Warranty Details
    • Onsite Services
    • Unboxing
  • About Us
    • About Us
    • Contact Us
    • Our Customers
    • Enterprise
    • Gov & Edu
    • Press Kit
    • Testimonials
    • Careers
  • Talk to an Expert
  • My Account
  1. Home
  2. /
  3. Hardware Articles
  4. /
  5. Stable Diffusion Linux vs. Windows

Stable Diffusion Linux vs. Windows

Posted on April 1, 2024 (April 1, 2024) by Jon Allman
Always look at the date when you read an article. Some of the content in this article is most likely out of date, as it was written on April 1, 2024. For newer information, see our more recent articles.

Table of Contents

  • Introduction
  • Resource Usage
  • Optimizations
  • Test Setup
  • SD-WebUI – NVIDIA
  • SD-WebUI-DirectML (fork) – AMD
  • SD-WebUI-Forge (fork) – NVIDIA
  • ComfyUI
  • Fooocus
  • InvokeAI
  • SD.Next
  • Conclusion

Introduction

There are a lot of choices to make when configuring a workstation, especially when it comes to which hardware will comprise the build. But it’s just as important to consider what software your system will be running. That’s precisely why Puget Systems tests a wide variety of software packages: so that we can make informed hardware recommendations based on a given workflow. One particular software choice that may be taken for granted is which Operating System (OS) to install on a system. It comes as little surprise that Windows continues to be a popular choice for professional workstations, and in 2023, about 90% of Puget Systems customers purchased a Windows-based system. Today, we’ll discuss the benefits and drawbacks of Windows-based workstations compared to Linux-based systems, specifically with regard to Stable Diffusion workflows, and provide performance results from our testing across various Stable Diffusion front-end applications.

Image
Open Full Resolution

We will discuss some other considerations with regard to the choice of OS, but our focus will largely be on testing performance with both an NVIDIA and an AMD GPU across several popular SD image generation frontends, including three forks of the ever-popular Stable Diffusion WebUI by AUTOMATIC1111.

Resource Usage

One of the common reasons for using Linux for AI is that it tends to consume less VRAM than Windows during normal usage. With a 4K screen and a single browser window open, we can expect Windows to reserve just over 1GB of VRAM. In comparison, Ubuntu’s default desktop environment, GNOME, uses about 700MB under the same conditions. Although that’s not a huge difference, it could be enough to prevent errors or performance drops from VRAM overflows in certain circumstances. Additionally, unlike Windows, a Linux-based OS also gives us the opportunity to run without a GUI at all, effectively dropping the VRAM usage to zero, which is great if you have a mid or low-end GPU and need to squeeze every last bit of VRAM out of it as you can. Although there are plenty of options for generating images via the command line, in practice, this would likely mean using two systems: one to run the SD backend as a server and another to connect to the front-end image generation GUI via a browser.

Optimizations

Towards the end of 2023, a pair of optimization methods for Stable Diffusion models were released: NVIDIA TensorRT and Microsoft Olive for ONNX runtime. Both of these options operate under the basic principle of converting SD checkpoints into quantized versions optimized for inference, resulting in improved image generation speeds. Because these optimizations were designed for the ONNX runtime, they are generally available regardless of which OS you choose. However, most implementations of Olive are designed for use with DirectML, which relies on DirectX within Windows. For example, Microsoft’s extension for AUTOMATIC1111’s SD-WebUI. We covered these optimizations in a previous article if you want a head-to-head comparison.

Although these optimizations are guaranteed to increase SD inference performance, they have drawbacks. First and foremost, any checkpoints and LoRAs must first be quantized to benefit from the improved performance. This isn’t much of a problem on its own, as the quantization process typically only takes at most a few minutes. However, other variables must be considered during the conversion. In the case of TensorRT, the optimized model must also account for the expected width and height of the generation, along with batch sizes. Improvements have been made in this area with the introduction of “dynamic engines” that can accept a broader range of resolutions but reportedly come with impacts on performance and VRAM usage. In the case of Microsoft Olive, LoRA support is more limited, and they must be baked into the checkpoint during quantization.

For someone with a well-defined image generation workflow without much variation, these downsides may not be a problem, and the performance benefits may be well worth the initial setup time. However, for those who frequently experiment with various checkpoints, LoRAs, and image resolutions, the relative inflexibility of these optimizations makes them far less appealing.

Test Setup

Threadripper PRO Test Platform

CPU: AMD Ryzen Threadripper PRO 7985WX 64-Core
CPU Cooler: Asetek 836S-M1A 360mm Threadripper CPU Cooler
Motherboard: ASUS Pro WS WRX90E-SAGE SE
BIOS Version: 0404
RAM: 8x Kingston DDR5-5600 ECC Reg. 1R 16GB
(128GB total)
GPUs:
AMD Radeon RX 7900 XTX
Driver Version: 23.Q4 (Windows)
Driver Version: 6.3.6 (Ubuntu)
NVIDIA GeForce RTX 4080
Driver Version: 551.76 (Windows)
Driver Version: 545.29.06 (Ubuntu)
PSU: Super Flower LEADEX Platinum 1600W
Storage: Samsung 980 Pro 2TB (Windows)
Samsung 980 Pro 1TB (Ubuntu)
OS: Ubuntu 22.04.3 LTS 6.5.0-26-generic
Windows 11 Pro 23H2 Build 22631.2396

Generation Data

Prompt: photo of a serene lake reflecting milky way at night
Model: stable-diffusion-xl-base-1-0
Resolution: 1024×1024
Scheduler: Euler
Sampler: Normal
Steps: 20

Preceding each test was a quick warmup where a single image was generated to ensure everything was loaded and the pipeline was ready to go. Then, two tests were performed, the first consisting of a batch count of 10 images and batch size 1, and the second being a batch count of 1 and batch size 4. Each test was performed twice, and the results were averaged.

All Windows tests were performed with hardware-accelerated GPU scheduling (HAGS) on and off. Additionally, all tests with the NVIDIA hardware were performed with both xFormers and SDP cross-attention optimizations. As of the writing of this article, xFormers is not available for AMD GPUs, so only SDP was used in the AMD testing.

Tower Computer Icon in Puget Systems Colors

Looking for an AI Workstation?

We build computers tailor-made for your workflow. 

Configure a System!
Talking Head Icon in Puget Systems Colors

Don’t know where to start?
We can help!

Get in touch with our technical consultants today.

Talk to an Expert

SD-WebUI – NVIDIA

Stable Diffusion WebUI batch count 10, batch size 1 chart NVIDIA
Stable Diffusion WebUI batch count 1, batch size 4 chart TensorRT NVIDIA
Stable Diffusion WebUI batch count 10, batch size 1 chart TensorRT NVIDIA
Stable Diffusion WebUI batch count 1, batch size 4 chart TensorRT NVIDIA
Stable Diffusion WebUI batch count 10, batch size 1 chart NVIDIA
Stable Diffusion WebUI batch count 1, batch size 4 chart TensorRT NVIDIA
Stable Diffusion WebUI batch count 10, batch size 1 chart TensorRT NVIDIA
Stable Diffusion WebUI batch count 1, batch size 4 chart TensorRT NVIDIA
Previous Next
System Image
Stable Diffusion WebUI batch count 10, batch size 1 chart NVIDIA
Open Full Resolution
Stable Diffusion WebUI batch count 1, batch size 4 chart TensorRT NVIDIA
Open Full Resolution
Stable Diffusion WebUI batch count 10, batch size 1 chart TensorRT NVIDIA
Open Full Resolution
Stable Diffusion WebUI batch count 1, batch size 4 chart TensorRT NVIDIA
Open Full Resolution
Previous Next

Starting with the ubiquitous SD-WebUI, Ubuntu takes the performance lead, outperforming the Windows equivalent by about 5-8% in both the standard tests and with TensorRT. We also found a slight performance boost with HAGS disabled in all tests, except within the batch count 10 test using TensorRT. Interestingly, the SD-WebUI tests were the only time xFormers outperformed SDPA, as further results will reveal below.

SD-WebUI-DirectML (fork) – AMD

Stable Diffusion WebUI DirectML batch count 10, batch size 1 chart AMD
Stable Diffusion WebUI DirectML batch count 1, batch size 4 chart AMD
Stable Diffusion WebUI DirectML batch count 10, batch size 1 chart AMD
Stable Diffusion WebUI DirectML batch count 1, batch size 4 chart AMD
Previous Next
System Image
Stable Diffusion WebUI DirectML batch count 10, batch size 1 chart AMD
Open Full Resolution
Stable Diffusion WebUI DirectML batch count 1, batch size 4 chart AMD
Open Full Resolution
Previous Next

A new option for AMD GPUs is ZLUDA, which is a translation layer that allows unmodified CUDA applications on AMD GPUs. However, the future of ZLUDA is unclear as the CUDA EULA forbids reverse-engineering CUDA elements for translation targeting non-NVIDIA platforms.

We decided to run some tests, and surprisingly, we found several instances where ZLUDA within Windows outperformed ROCm 5.7 in Linux, such as within the DirectML fork of SD-WebUI. Compared to other options, ZLUDA does not appear to be meaningfully impacted by the presence of HAGS.

SD-WebUI-Forge (fork) – NVIDIA

Stable Diffusion WebUI Forge batch count 10, batch size 1 chart NVIDIA
Stable Diffusion WebUI Forge batch count 1, batch size 4 chart NVIDIA
Stable Diffusion WebUI Forge batch count 10, batch size 1 chart NVIDIA
Stable Diffusion WebUI Forge batch count 1, batch size 4 chart NVIDIA
Previous Next
System Image
Stable Diffusion WebUI Forge batch count 10, batch size 1 chart NVIDIA
Open Full Resolution
Stable Diffusion WebUI Forge batch count 1, batch size 4 chart NVIDIA
Open Full Resolution
Previous Next

With SD-WebUI-Forge, we will start establishing a pattern that continues fairly consistently throughout the rest of the results: that Ubuntu with SDP cross attention is the overall performance winner and that within Windows, disabling HAGS has a slight benefit to performance. Though it’s worth keeping in mind that the difference in OS only represents an average 4-5% increase in iterations per second. Forge also achieved the most iterations per second overall, which is fitting considering it’s presented as being a more performant version of SD-WebUI.

ComfyUI

ComfyUI batch count 10, batch size 1 chart NVIDIA
ComfyUI batch count 1, batch size 4 chart NVIDIA
ComfyUI batch count 10, batch size 1 chart AMD
ComfyUI batch count 1, batch size 4 chart AMD
ComfyUI batch count 10, batch size 1 chart NVIDIA
ComfyUI batch count 1, batch size 4 chart NVIDIA
ComfyUI batch count 10, batch size 1 chart AMD
ComfyUI batch count 1, batch size 4 chart AMD
Previous Next
System Image
ComfyUI batch count 10, batch size 1 chart NVIDIA
Open Full Resolution
ComfyUI batch count 1, batch size 4 chart NVIDIA
Open Full Resolution
ComfyUI batch count 10, batch size 1 chart AMD
Open Full Resolution
ComfyUI batch count 1, batch size 4 chart AMD
Open Full Resolution
Previous Next

In ComfyUI, the results were quite close between Ubuntu and Windows. Ubuntu came out slightly ahead when SDPA was used, but it also received the worst result of all when using xFormers. This represents the only performance reduction we found in Ubuntu compared to Windows. Just as we saw in the SD-WebUI test, disabling HAGS led to a slight performance improvement.

On the AMD side (charts 3-4), Ubuntu + ROCm came out ahead by about 5% in the batch count 10 test but fell behind when the batch size was increased to 4. Once again, ZLUDA did not appear to be significantly affected by the HAGS option.

Fooocus

Fooocus batch count 10, batch size 1 chart NVIDIA
Image
Open Full Resolution

Within Fooocus, the combination of Ubuntu and SDPA comes out ahead overall. The impact of HAGS is less clear, with a notable decrease in performance with xFormers. Fooocus does not appear to support image generation with batch sizes larger than 1, so we only have results for the batch count 10 test.

InvokeAI

InvokeAI batch count 10, batch size 1 chart NVIDIA
Image
Open Full Resolution

Like Fooocus, InvokeAI does not support batch sizes larger than one, so only one set of results is provided here as well. As with the previous results, the Ubuntu + SDPA combination comes out ahead overall. Interestingly, this is one of the few examples where disabling HAGS slightly negatively impacted performance.

SD.Next

SD.Next batch count 10, batch size 1 chart NVIDIA
SD.Next batch count 1, batch size 4 chart NVIDIA
SD.Next batch count 10, batch size 1 chart AMD
SD.Next batch count 1, batch size 4 chart AMD
SD.Next batch count 10, batch size 1 chart NVIDIA
SD.Next batch count 1, batch size 4 chart NVIDIA
SD.Next batch count 10, batch size 1 chart AMD
SD.Next batch count 1, batch size 4 chart AMD
Previous Next
System Image
SD.Next batch count 10, batch size 1 chart NVIDIA
Open Full Resolution
SD.Next batch count 1, batch size 4 chart NVIDIA
Open Full Resolution
SD.Next batch count 10, batch size 1 chart AMD
Open Full Resolution
SD.Next batch count 1, batch size 4 chart AMD
Open Full Resolution
Previous Next

Finally, the SD.Next results once again show that Ubuntu with SDPA is the most performant option overall. However, just like the InvokeAI results above, we found that disabling HAGS slightly decreased performance.

Looking at the AMD results (charts 3-4), we again find Ubuntu + ROCM 5.7 trailing slightly behind ZLUDA within Windows, but only by about 1%. The HAGS on test did report a slightly higher speed, but by less than a single percentage point, further establishing that HAGS has little impact on ZLUDA performance.

Conclusion

Throughout our testing of the NVIDIA GeForce RTX 4080, we found that Ubuntu consistently provided a small performance benefit over Windows when generating images with Stable Diffusion and that, except for the original SD-WebUI (A1111), SDP cross-attention is a more performant choice than xFormers. Additionally, our results show that the Windows 11 default setting of HAGS-enabled is typically not the best option for performance. However, the performance impact of leaving HAGS enabled is generally minor and may be the better choice in some applications, such as SD.Next and InvokeAI.

As long as the ZLUDA project remains available, we found that owners of the AMD Radeon RX 7900 XTX can expect similar performance between Ubuntu/ROCm and Windows/ZLUDA, with the HAGS setting in Windows having little to no effect on image generation speeds.

That said, even with the most generous comparison between the two, Ubuntu only provided a performance gain of about 9.5%, with most examples falling to around a 5% or smaller improvement. This means the performance argument for Ubuntu over Windows may not overcome the typical arguments against a switch from Windows, such as software compatibility. Yet the benefit of reduced VRAM usage could tip the scales towards Linux for less powerful GPUs with smaller amounts of VRAM. However, anyone looking to achieve the absolute fastest possible image generation speeds using Stable Diffusion should look beyond Windows 11.

Tower Computer Icon in Puget Systems Colors

Looking for an AI and Scientific Computing workstation?

We build computers tailor-made for your workflow. 

Configure a System
Talking Head Icon in Puget Systems Colors

Don’t know where to start?
We can help!

Get in touch with one of our technical consultants today.

Talk to an Expert

Related Content

  • NVIDIA GeForce RTX 5090 & 5080 AI Review
  • Exploring GPU Performance Across LLM Sizes
  • LLM Inference – NVIDIA RTX GPU Performance
  • LLM Inference – Consumer GPU performance
View All Related Content

Latest Content

  • Do Video Editors Need GeForce RTX 50 Series GPUs?
  • Adobe Premiere Pro and After Effects – What’s New In Version 25.2?
  • The Future of LED Walls: Arena & Nuke Stage Go Beyond Game Engines
  • 2025 Tariff Impacts at Puget Systems
View All

Tags: AI, AMD, GPU, NVIDIA, Radeon RX 7900 XTX, RTX 4080, SDXL, stable diffusion

Who is Puget Systems?

Puget Systems builds custom workstations, servers and storage solutions tailored for your work.

We provide:

Extensive performance testing
making you more productive and giving better value for your money

Reliable computers
with fewer crashes means more time working & less time waiting

Support that understands
your complex workflows and can get you back up & running ASAP

A proven track record
as shown by our case studies and customer testimonials

Get Started

Browse Systems

Puget Systems Mobile Laptop Workstation Icon

Mobile

Puget Systems Tower Workstation Icon

Workstations

Puget Systems Rackmount Workstation Icon

Rackstations

Puget Systems Rackmount Server Icon

Servers

Puget Systems Rackmount Storage Icon

Storage

Latest Articles

  • Do Video Editors Need GeForce RTX 50 Series GPUs?
  • Adobe Premiere Pro and After Effects – What’s New In Version 25.2?
  • The Future of LED Walls: Arena & Nuke Stage Go Beyond Game Engines
  • 2025 Tariff Impacts at Puget Systems
  • Z890 vs. B860 vs. H810
View All

Post navigation

 Puget Mobile 17″ vs M3 Max MacBook Pro 16″ for Content CreationPuget Mobile 17″ vs M3 Max MacBook Pro 16″ Battery & Thermal Analysis 
Puget Systems Logo
Build Your Own PC Site Map FAQ
facebook instagram linkedin rss twitter youtube

Optimized Solutions

  • Adobe Premiere
  • Adobe Photoshop
  • Solidworks
  • Autodesk AutoCAD
  • Machine Learning

Workstations

  • Content Creation
  • Engineering
  • Scientific PCs
  • More

Support

  • Online Guides
  • Request Support
  • Remote Help

Publications

  • All News
  • Puget Blog
  • HPC Blog
  • Hardware Articles
  • Case Studies

Policies

  • Warranty & Return
  • Terms and Conditions
  • Privacy Policy
  • Delivery Times
  • Accessibility

About Us

  • Testimonials
  • Careers
  • About Us
  • Contact Us
  • Newsletter

© Copyright 2025 - Puget Systems, All Rights Reserved.