Stable Diffusion Installation on Linux

Stable Diffusion is a text-to-image latent diffusion model created by the researchers and engineers from CompVis, Stability AI and LAION. It is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input. Stable Diffusion is a speed and quality breakthrough, meaning it can run on consumer GPUs. Stable Diffusion runs on under 10 GB of VRAM on consumer GPUs, generating images at 512x512 or 768x768 pixels in a few seconds.

To quickly try out the model, you can try out the Stable Diffusion Space. For more control and rapid generation you can try Stability AI DreamStudio beta.

The purpose of this tutorial is to demonstrate how to install Stable Diffusion 2.1 on Debian 11 Linux. However, before we proceed with the installation process, let's take a moment to review the software and hardware specifications of my laptop (Legion 5 Pro 16IAH7H), as well as the Debian 11 virtual machine and Google Colab environments.

1. Host:
Hardware:
- Laptop Legion 5 Pro 16IAH7H
- CPU: 12th Gen Intel(R) Core(TM) i9-12900H
- GPU: NVIDIA GeForce RTX 3070 Ti 8GB 150 W GPU, 8GB vRAM
- RAM: 32GB

Software:
- Debian 11 bullseye
- VirtualBox Version 7.0.6 r155176
- Python: 3.10.6
- Nvidia Driver Version: 470.161.03, CUDA Version: 11.4

2. Guest VM Settings:
Hardware:
- CPU: SD 1.4 - 8 x CPU, Execution Cap 100%, Enabled PAE/NX and Nested VT-X*AMD-V or SD 2.1 - 14 x CPU
- GPU: VMSVGA
- RAM: SD 1.4 - 9379 MB or SD 2.1 - 17274 MB
- Acceleration: Enabled Nested Paging

Software:
- Debian: 11 bullseye
- Python: 3.9.2

Google Collab:
Hardware:
- CPU: Intel(R) Xeon(R) CPU @ 2.20GHz
- GPU: NVIDIA TU104GL [Tesla T4], 16GB vRAM
- RAM: 12985 MB

Software:
- Ubuntu 20.04.5 LTS
- Python: Python 3.9.16
- Driver Version: 525.85.12 CUDA Version: 12.0

Before moving on, it would be beneficial to take a moment and explore the concept of AI models. In AI, a model refers to a computer program or algorithm that is designed to learn and make predictions based on data. We train models with datasets. Training a model involves providing it with input data and the corresponding expected output or label, and then adjusting the model's parameters or weights so that it can make accurate predictions on new, unseen data.

The performance of the model is evaluated using metrics such as accuracy, precision, recall, and F1 score, which indicate how well the model is able to make predictions on new data.

To train a text-to-image AI model, we typically use a dataset of paired text and image examples. The dataset would consist of textual descriptions of images, paired with the corresponding images themselves.

Stable Diffusion Versions

Higher versions have been trained for longer and are thus usually better in terms of image generation quality then lower versions.

1. Stable Diffusion 1.4
Stable Diffusion 1.4 is the initial version of Stable Diffusion developed by CompVis. It was continued from stable-diffusion-v1-2 with a total of 225,000 steps at 512x512 resolution on "laion-aesthetics v2 5+". Additionally, a 10% decrease in text-conditioning was implemented to enhance classifier-free guidance sampling.

2. Stable Diffusion 2.0
StabilityAI utilized a significantly larger image dataset to train SD v2.0, but with the exclusion of adult content through the use of LAION's NSFW filter, a non-profit organization that creates models and datasets for AI researchers. This larger dataset resulted in improved performance for v2.0 in recognizing inanimate objects such as architecture and landscapes.

However, the NSFW filter was too strict and removed many safe-for-work images of people in the dataset, leading to a shortage of such images for training the model.

3. Stable Diffusion 2.1
Stable Diffusion 2.1 was released by StabilityAI on December 7, 2022, and it still incorporates a filter to exclude adult content. However, this version of the filter is less restrictive compared to the previous version. This change offers the best of both worlds by delivering enhancements in recognizing inanimate objects and improving the model's ability to detect people.

1. Generating Stable Diffusion Images Locally on Debian 11 VirtualBox VM Using CPU

To generate images, we will be using the CPU, but note that this process is very slow. It takes around 20 minutes to generate an SD 2.1 image with dimensions of 767x768 and about 5 minutes and 30 seconds to generate an SD 1.4 image with dimensions of 512x512. For this purpose, we will use a VirtualBox machine with Debian 11 installed, and we recommend checking the software and hardware specifications above to ensure optimal performance.

To get started, install the required packages by running the following commands:

$ sudo apt install python3-pip
$ sudo pip3 install diffusers transformers accelerate

You can test the installation by launching Python3 and importing the StableDiffusionPipeline module from diffusers:

$ python3

>>> from diffusers import StableDiffusionPipeline

>>> pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")

Note: The Python will start downloading the preferred AI model. If you want to download SD 2.1 model, use the command below instead of the above line.

>>> StableDiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1") instead.

Remove VAE encoder as it's not needed:

>>> del pipe.vae.encoder

>>> prompt = "astronaut riding a horse on mars"

>>> image = pipe(prompt).images[0]

>>> image.save(f"astronaut_rides_horse.png")

Figure 1 - CPU-Generated 512x512 Image Created with Service Diffusion 1.4

2. Generating Stable Diffusion Images Locally on Debian 11 Using NVIDIA GPU

Assuming you have an NVIDIA GPU with at least 4GB of video RAM and have installed the appropriate NVIDIA driver, you can use the nvidia-smi command to check the status of your GPU. This command line utility enables administrators to query the state of the GPU device. It is bundled with NVIDIA GPU display drivers on Linux systems.

$ nvidia-smi

Figure 2 - Local NVDIA GPU Statistics

$ sudo apt install python3-pip
$ sudo pip3 install diffusers transformers scipy ftfy accelerate

To check if torch supports GPU, run the following command, which should output "True". Please note that CUDA 11 needs to be installed for this. While it may be possible to use a different version, this is the one that has been tested:

$ python3 -c "import torch; print(torch.cuda.is_available())"
True

Now we will use Python console to generate SD 2.1 image - 768x768. Note that you need to have CUDA 11 installed for this to work. If you have a different version of CUDA installed, it may still work, but this is what we tested it with. Also, make sure that your GPU has enough memory to handle the image size you want to generate.

$ python3

>>> import torch
>>> from diffusers import StableDiffusionPipeline

To ensure that Stable Diffusion can run locally with my NVIDIA GeForce RTX 3070 Ti 8GB , I am loading the weights from the half-precision branch fp16 and passing torch_dtype=torch.float16 to diffusers to expect the weights in float16 precision. If you require the highest precision possible, you can remove torch_dtype=torch.float16, but this will result in higher memory usage.

>>> pipe = StableDiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1", torch_dtype=torch.float16)

>>> pipe = pipe.to("cuda")

>>> prompt = "a photo of a monster crab on mars"

>>> image = pipe(prompt).images[0]

>>> image.save(f"crab_mars.png")

Figure 3 - Generating High-Quality Images with Local GPU: Service Diffusion 2.1 Utilized for 768x768 Image Generation

Optimizing Generated Images:

Let's generate another image with the following parameters:

>>> image = pipe(prompt, height=512, width=768, num_inference_steps=100, guidance_scale=20).images[0]

num_inference_steps: is 50 by default. In general, results are better the more steps you use. Stable Diffusion, being one of the latest models, works great with a relatively small number of steps, so we recommend to use the default of 50. If you want faster results you can use a smaller number.
guidance_scale:It is a way to increase the adherence to the conditional signal which in this case is text as well as overall sample quality. In simple terms classifier free guidance forces the generation to better match with the prompt. Numbers like 7 or 8.5 give good results, if you use a very large number the images might look good, but will be less diverse.
height/width: Generate non-square images. Stable Diffusion 2.1 produces images of 768 × 768 pixels by default. But it's very easy to override the default using the height and width arguments, so you can create rectangular images in portrait or landscape ratios. Make sure height and width are both multiples of 8.

Figure 4 - Generating High-Quality Images with Local GPU: Service Diffusion 2.1 Utilized for 768x512 Image Generation

3. Generating Stable Diffusion Images Remotely Using Google Collab

Google Colaboratory, or "Google Colab" for short, is a cloud-based service provided by Google that allows you to write, run, and share Python code from your web browser. It provides a free Jupyter Notebook environment, which is a popular tool for data analysis, scientific computing, and machine learning.

With Google Colab, you can write and execute Python code, import and export data, create visualizations, and collaborate with others on your work. It is particularly useful for data science projects, where you may need to work with large datasets that require significant computational resources.

One of the benefits of using Google Colab is that it provides access to free GPU (graphics processing unit) and TPU (tensor processing unit) resources, which can significantly speed up the training of machine learning models. Additionally, it integrates with other Google services like Google Drive, making it easy to share and collaborate on projects with others.

If you don't have access to a compatible GPU but still want to use Stable Diffusion to generate images using GPU acceleration, you can use Google Colab instead. To begin, make sure you're running Colab on a GPU runtime. You can do this by going to Runtime > Change Runtime Type and selecting GPU.

Here are the steps you need to follow in order to run Stable Diffusion on Google Colab. Additionally, I've shared my notebook sd-2.1-google_collab.ipynb with you so that you can easily replicate my steps and generate your own images. Simply open my notebook and make a copy of it, which you can then edit if necessary. The notebook will be saved in the Collab Notebooks directory on your Google Drive. To run the code cells one by one, either click Runtime > Run All or press Ctrl+F9.

!sudo apt install python3-pip
!sudo pip3 install diffusers transformers scipy ftfy accelerate xformers

import torch
from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1")

pipe = pipe.to("cuda")
pipe.enable_xformers_memory_efficient_attention()
pipe.vae.enable_tiling()

prompt = "a bear ridding a jeep"

image = pipe(prompt, width=2048, height=1024, num_inference_steps=100, guidance_scale=20).images[0]

image

Figure 5 - Generating High-Quality Images on Google Colab: Service Diffusion 2.1 Used for 1024x768 Image Generation

End.

2 thoughts on “Stable Diffusion Installation on Linux”

Akshay says:

July 25, 2023 at 2:47 PM

Hi,

Is there a way to add negative prompts in the google collab method?

1. Radovan Brezula says:
  
  July 26, 2023 at 6:43 PM
  
  Use negative_prompt in the call, e.g.
  image = pipe(prompt, negative_prompt="woods, trees, nature, bushes, grass", width=1024, height=768, num_inference_steps=20, guidance_scale=8.5).images[0]