Build PyTorch from Source with CUDA 11.8 (2024)

Update 8/10/2023: New version for CUDA 12.2 and cuDNN 8.9.3 is out at https://medium.com/@zhanwenchen/build-pytorch-from-source-with-cuda-12-2-1-with-ubuntu-22-04-b5b384b47ac!

The NVIDIA 525 driver provides excellent backwards compatibility with CUDA versions. Meanwhile, as of writing, PyTorch does not yet fully support CUDA 12 (see their CUDA 12 support progress here). Today, we are going to learn how to go from zero to building the latest PyTorch with CUDA 11.8.

  1. Install NVIDIA Driver 525

If you are on Ubuntu Desktop (with GUI), you can use the Additional Drivers interface for installing 525 (use the server-only option if you are installing this on a server instead of a desktop). If you do this, you can skip the rest of part 1.

First, update your system dependencies.

sudo apt update # optional but recommended
sudo apt upgrade # optional but recommended
sudo apt install nvidia-driver-525

Then restart your computer and see if your NVIDIA cards are successfully detected:

nvidia-smi

You should see something like

pct4et@riselab02:~$ nvidia-smi
Fri Feb 24 05:49:06 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.89.02 Driver Version: 525.89.02 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA RTX A5000 Off | 00000000:0F:00.0 Off | Off |
| 30% 31C P8 7W / 230W | 0MiB / 24564MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA RTX A5000 Off | 00000000:1F:00.0 Off | Off |
| 30% 32C P8 7W / 230W | 0MiB / 24564MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 NVIDIA RTX A5000 Off | 00000000:22:00.0 Off | Off |
| 30% 34C P8 11W / 230W | 0MiB / 24564MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 NVIDIA RTX A5000 Off | 00000000:25:00.0 Off | Off |
| 30% 34C P8 27W / 230W | 0MiB / 24564MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 4 NVIDIA RTX A5000 Off | 00000000:26:00.0 Off | Off |
| 30% 33C P0 62W / 230W | 0MiB / 24564MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 5 NVIDIA RTX A5000 Off | 00000000:88:00.0 Off | Off |
| 30% 33C P0 57W / 230W | 0MiB / 24564MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 6 NVIDIA RTX A5000 Off | 00000000:98:00.0 Off | Off |
| 30% 34C P0 59W / 230W | 0MiB / 24564MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 7 NVIDIA RTX A5000 Off | 00000000:9B:00.0 Off | Off |
| 30% 36C P0 62W / 230W | 0MiB / 24564MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 8 NVIDIA RTX A5000 Off | 00000000:9E:00.0 Off | Off |
| 30% 32C P0 62W / 230W | 0MiB / 24564MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 9 NVIDIA RTX A5000 Off | 00000000:9F:00.0 Off | Off |
| 30% 34C P0 63W / 230W | 0MiB / 24564MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+

2. Install CUDA

Install with runfile is actually better in this case because the local/network .deb approach requires some serious wrangling of Ubuntu PGP keys which is extremely frustrating and can break your apt lists so you won’t be able to update/upgrade your apt source lists ever again.

  1. First, download the correct CUDA 11.8.0 installer for your system. For example, for Ubuntu 22.04, it is:
wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run

Then install it with

sudo sh cuda_11.8.0_520.61.05_linux.run

Accept, but on the next screen, unselect the older driver. The installation process is silent and takes a while.

Last, edit your environment to include your CUDA binaries. Add the following line to the bottom of your .bashrc file:

export PATH="/usr/local/cuda/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda/lib64:$LD_LIBRARY_PATH"

And then refresh your environment with

source ~/.bashrc

Now

echo "/usr/local/cuda/lib64" | sudo tee -a /etc/ld.so.conf
sudo ldconfig

You can see that this ldconfig worked by

ldconfig -p | grep cuda
  1. First, verify CUDA version

Placeholder

cat /usr/local/cuda/version.json | grep version -B 2

Should say something about 11.8.

"cuda" : {
"name" : "CUDA SDK",
"version" : "11.8.20220929"
},

2. Verify CUDA 11.8 Installation

Samples are no longer under /usr/local/cuda/samples. It is now on github at https://github.com/nvidia/cuda-samples

git clone https://github.com/NVIDIA/cuda-samples.git

Now go to it

cd cuda-samples/

Then make sure you have the correct version of samples. Otherwise you may encounter some errors:

git checkout v11.8

First, install the FreeImage dependency for the code samples.

sudo apt install libfreeimage-dev

Compile the samples

# using all your available cores
make -j$(nproc) > compile.log 2>&1 &

There will be many warnings about older GPUs, and you shouldn’t worry about them. The -Wno-deprecated-gpu-targets flag doesn’t actually suppress those. Then monitor the compilation process

tail -f compile.log

Note that this compilation may take a minute. Once the bottom of the log ends with “Finished building CUDA samples,” check the log to make sure there are no errors.

Then let’s run some samples:

cd Samples/4_CUDA_Libraries/matrixMulCUBLAS

Run the compiled example

./matrixMulCUBLAS

You should see something like

[Matrix Multiply CUBLAS] - Starting...
GPU Device 0: "Ampere" with compute capability 8.0
GPU Device 0: "NVIDIA A100-SXM4-80GB" with compute capability 8.0MatrixA(640,480), MatrixB(480,320), MatrixC(640,320)
Computing result using CUBLAS...done.
Performance= 10472.73 GFlop/s, Time= 0.019 msec, Size= 196608000 Ops
Computing result using host CPU...done.
Comparing CUBLAS Matrix Multiply with CPU results: PASS
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

If you have multiple GPUs, you might be interested in seeing their communication links.

cd ~/cuda-samples/Samples/5_Domain_Specific/p2pBandwidthLatencyTest
./p2pBandwidthLatencyTest

You can also see your nvlink matrix by

nvidia-smi nvlink -s

3.4 Cleanup

rm -rf ~/cuda_11.8.0_520.61.05_linux.run ~/cuda-samples

3. Install cuDNN 8.8.0

3.1 Download cuDNN 8.8.0

First, download the deb file from NVIDIA. If you are doing this on a Ubuntu Desktop or another Linux distribution with a web browser, this is easy. Go to the NVIDIA Current cuDNN Version Download page. If the current version is no longer 8.x.x, then go to the cuDNN archive page to download previous cuDNN versions.

But if you are on a Ubuntu Server, then you have to use your logged-in cookie because cuDNN downloads are behind NVIDIA’s auth wall. Go to your Chrome (you might be able to do this with another browser, but I prefer Chrome’s devtools). Login to the cuDNN archive page. And turn on your network tab. Click the download link for the Tar file for your architecture (most likely x86_64). Copy as cURL (see https://stackoverflow.com/a/42028789). Make sure you are not doing “Copy all as cURL.” Paste into your terminal but with the extra output option -o cudnn.deb:

# Do not type this unmodified. This is just an example!
curl 'https://developer.download.nvidia.com/compute/cudnn/secure/8.8.0/local_installers/11.8/cudnn-local-repo-ubuntu2204-8.8.0.121_1.0-1_amd64.deb?YOUR_OWN_HASH' -H 'authority: developer.download.nvidia.com' -H 'accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7' -H 'accept-language: en-US,en;q=0.9,zh-CN;q=0.8,zh;q=0.7,zh-TW;q=0.6,de;q=0.5' -H 'cookie: YOUR_OWN_COOKIES!!!!!!!!' -H 'dnt: 1' -H 'referer: https://developer.nvidia.com/' -H 'sec-ch-ua: "Chromium";v="110", "Not A(Brand";v="24", "Google Chrome";v="110"' -H 'sec-ch-ua-mobile: ?0' -H 'sec-ch-ua-platform: "macOS"' -H 'sec-fetch-dest: document' -H 'sec-fetch-mode: navigate' -H 'sec-fetch-site: same-site' -H 'sec-fetch-user: ?1' -H 'upgrade-insecure-requests: 1' -H 'user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36' --compressed -o cudnn.deb
Build PyTorch from Source with CUDA 11.8 (2)

The resulting cudnn.deb should be about 856/857MB. You can see this with

du cudnn.deb -h

Which should look like

857M cudnn.deb

3.2 Install cuDNN with Unzipped Deb

First, unpack the .deb

mkdir cudnn_install
mv cudnn.deb cudnn_install
cd cudnn_install
ar -x cudnn.deb

You will see two new files, control.tar.gz and data.tar.xz. The real deb files we want are in the data.tar.xz, so unzip that:

tar -xvf data.tar.xz

You’ll see a few new folders extracted. Go to the var folder and install the 3 in order:

cd var/cudnn-local-repo-ubuntu2204-8.8.0.121/
sudo dpkg -i libcudnn8_8.8.0.121-1+cuda11.8_amd64.deb
sudo dpkg -i libcudnn8-dev_8.8.0.121-1+cuda11.8_amd64.deb
sudo dpkg -i libcudnn8-samples_8.8.0.121-1+cuda11.8_amd64.deb

You should be all set.

3.3 Verify CUDNN Installation on System

Issue the following command to make sure that the correct CUDNN libraries have in fact been installed on your system:

cat /usr/include/x86_64-linux-gnu/cudnn_version_v8.h | grep CUDNN_MAJOR -A 2

The output should look something like this for both:

#define CUDNN_MAJOR 8
#define CUDNN_MINOR 8
#define CUDNN_PATCHLEVEL 0

However, given so many factors at play, your NVIDIA tooling may not be 100% correct. To double-check, proceed to the next verification step: verify CUDNN with sample scripts.

3.4. Verify CUDNN installation by using it on sample scripts.

CUDNN ships with its own samples (which you already got by installing the libcudnn8-samples apt package). Note that the sample location and dependencies have changed in CUDNN 8.x.x from CUDNN 7.x.x. so you can no longer use the v7 instructions. Like CUDA 11.8, cuDNN v8 samples also require FreeImage.

The second and third differences are the samples path and the make structure. The samples now live in /usr/src/cudnn_samples_v8 and you can no longer compile all samples with a single sudo make command which you can still do with the CUDA samples. Let’s just do the mnistCUDNN check:

cd /usr/src/cudnn_samples_v8/mnistCUDNN
sudo make -j$(nproc)
./mnistCUDNN

The output should look something like this, with a “Test Passed” at the end:

Executing: mnistCUDNN
cudnnGetVersion() : 8800 , CUDNN_VERSION from cudnn.h : 8800 (8.8.0)
Host compiler version : GCC 9.4.0
There are 1 CUDA capable devices on your machine :
device 0 : sms 84 Capabilities 8.6, SmClock 1950.0 Mhz, MemSize (Mb) 24247, MemClock 10501.0 Mhz, Ecc=0, boardGroupID=0
Using device 0
Testing single precision
Loading binary file data/conv1.bin
Loading binary file data/conv1.bias.bin
Loading binary file data/conv2.bin
Loading binary file data/conv2.bias.bin
Loading binary file data/ip1.bin
Loading binary file data/ip1.bias.bin
Loading binary file data/ip2.bin
Loading binary file data/ip2.bias.bin
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.012192 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.015360 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.016384 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.042784 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.087840 time requiring 2057744 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.165664 time requiring 184784 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 128848 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 128000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.052224 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.062464 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.069632 time requiring 1433120 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.086016 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.089856 time requiring 128000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.090112 time requiring 128848 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Resulting weights from Softmax:
0.0000000 0.9999399 0.0000000 0.0000000 0.0000561 0.0000000 0.0000012 0.0000017 0.0000010 0.0000000
Loading image data/three_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.011264 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.011264 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.014336 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.029888 time requiring 2057744 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.039936 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.046080 time requiring 178432 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 128848 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 128000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.035840 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.040960 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.040960 time requiring 1433120 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.062432 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.069632 time requiring 128848 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.088064 time requiring 128000 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 0.9999288 0.0000000 0.0000711 0.0000000 0.0000000 0.0000000 0.0000000
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 0.9999820 0.0000154 0.0000000 0.0000012 0.0000006
Result of classification: 1 3 5Test passed!Testing half precision (math in single precision)
Loading binary file data/conv1.bin
Loading binary file data/conv1.bias.bin
Loading binary file data/conv2.bin
Loading binary file data/conv2.bias.bin
Loading binary file data/ip1.bin
Loading binary file data/ip1.bias.bin
Loading binary file data/ip2.bin
Loading binary file data/ip2.bias.bin
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 5632 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 28800 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.012288 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.021504 time requiring 28800 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.030688 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.034688 time requiring 2057744 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.035840 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.039936 time requiring 5632 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 51584 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 64000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.036864 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.038912 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.045056 time requiring 1433120 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.064512 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.081920 time requiring 64000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.116896 time requiring 51584 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Resulting weights from Softmax:
0.0000001 1.0000000 0.0000001 0.0000000 0.0000563 0.0000001 0.0000012 0.0000017 0.0000010 0.0000001
Loading image data/three_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 5632 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 28800 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.011264 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.019456 time requiring 28800 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.028672 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.030720 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.030720 time requiring 2057744 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.035840 time requiring 5632 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 51584 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 64000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.033664 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.038912 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.045248 time requiring 1433120 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.064320 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.080896 time requiring 64000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.113664 time requiring 51584 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 1.0000000 0.0000000 0.0000714 0.0000000 0.0000000 0.0000000 0.0000000
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 1.0000000 0.0000154 0.0000000 0.0000012 0.0000006
Result of classification: 1 3 5Test passed!

3.5 Cleanup

Remove the files we just downloaded:

cd ~
rm -r cudnn_install

4. Install Miniconda3

You should use the latest from the repo, https://repo.anaconda.com/miniconda, instead of the stale versions on https://docs.conda.io/en/latest/miniconda.html#linux-installers. Pro tip: when you have to agree to a long agreement to install a library on the command line, you can hit q to quickly skip to the bottom where you type yes/accept, etc.

cd ~
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
sha256sum Miniconda3-latest-Linux-x86_64.sh
# Verify that the output matches the one online
sh Miniconda3-latest-Linux-x86_64.sh

Say yes to conda init. It will save you lots of time.

Lastly, clean up with

rm ~/Miniconda3-latest-Linux-x86_64.sh

5. Build PyTorch from Source

We are finally in the final stretch. Building from source. Some instructions are taken from https://github.com/pytorch/pytorch#from-source

5.1 Install PyTorch Prerequisites

Use a new environment with a custom Python version. Say that your codebase prefers 3.8 over 3.9 which is the case for me (I have to use the maskrcnn_benchmark codebase which is friendlier to 3.8). Let’s call my environment rcnn (you can change this to whatever you want — just be consistent).

$ source ~/.bashrc
(base) $ conda create -n rcnn python=3.8

Now — be sure to follow the following steps in your path are all under your newly created environment so they aren’t installed in your base environment:

(base) $ conda activate rcnn

Your shell will now say rcnn instead of base .

(rcnn) $ # conda environment activated!

Now install the PyTorch Linux dependencies:

conda install astunparse numpy ninja pyyaml mkl mkl-include setuptools cmake cffi typing_extensions future six requests dataclassesconda install -c pytorch magma-cuda118

Clone the PyTorch repository (this takes a long time due to recursive):

git clone --recursive git@github.com:pytorch/pytorch.git
cd pytorch

And then switch to the version tag you want. For example, I want v1.12.1:

git checkout v1.13.1 # Or any version you want

Then we want to update the PyTorch dependencies in the folder:

git submodule sync
git submodule update --init --recursive --jobs 0

(Optional) If you have a network cluster, you may benefit from OpenMPI support. I haven’t noticed any positive improvement on my local workstation or laptop (if anything, it decreased `h5py` I/O performance for me).

sudo apt install openmpi-bin

Finally

export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
python setup.py install

This is going to take a long time.

Verify PyTorch installation.

You MUST leave the PyTorch source directory to do this. Otherwise, Python will assume you want to use the torch built under your ~/pytorch build folder which will not be importable and will give you an error even if you installed PyTorch correctly.

# You are probably under ~/pytorch right now. Get out of there!
$ cd # This is equivalent to cd ~, but you can go elsewhere
$ python
>>> import torch;
>>> torch.rand(2, 3, device='cuda') @ torch.rand(3, 2, device='cuda')
>>> exit() # Get out of the Python shell.

The output should look something like this (except the numbers won’t be the same because of randomness) before exiting the Python shell:

tensor([[0.5708, 0.6166],
[0.6130, 0.7249]], device='cuda:0')

6. (Optional) Install Pillow-SIMD and libjpeg-turbo

If your code uses Pillow, you can further optimize your vision pipeline with libjpeg-turbo and Pillow-SIMD (although torchvision.io is supposed to be better if you are writing your own code).

First, build libjpeg-turbo from source

wget https://github.com/libjpeg-turbo/libjpeg-turbo/archive/refs/tags/2.1.5.1.tar.gz

Unzip it

tar -xzf 2.1.5.1.tar.gz

go to it

cd libjpeg-turbo-2.1.5.1/

Install dependencies yasm

sudo apt install yasm

Build from source

https://gist.github.com/soumith/01da3874bf014d8a8c53406c2b95d56b

7. Install

Download the TorchVision source

git clone https://github.com/pytorch/vision.git
cd vision

Check out a specific version

Make sure your libjpeg-turbo and Pillow-SIMD are installed.

python
from setup import get_dist
get_dist('pillow')

You are all set!

6. Lastly, install torchvision from source after installing Pillow-SIMD and libjpeg-turbo

Follow the instructions on https://github.com/pytorch/vision and find a version that matches your built PyTorch version. Here we use 0.12.0. We can go to the 0.12.0 release and download the source code.

Check out a specific version

git checkout v0.14.1
python setup.py install

Voila!

Build PyTorch from Source with CUDA 11.8 (2024)

References

Top Articles
Latest Posts
Article information

Author: Ray Christiansen

Last Updated:

Views: 5523

Rating: 4.9 / 5 (69 voted)

Reviews: 84% of readers found this page helpful

Author information

Name: Ray Christiansen

Birthday: 1998-05-04

Address: Apt. 814 34339 Sauer Islands, Hirtheville, GA 02446-8771

Phone: +337636892828

Job: Lead Hospitality Designer

Hobby: Urban exploration, Tai chi, Lockpicking, Fashion, Gunsmithing, Pottery, Geocaching

Introduction: My name is Ray Christiansen, I am a fair, good, cute, gentle, vast, glamorous, excited person who loves writing and wants to share my knowledge and understanding with you.