tags : Floating Point, Concurrency, Flynn’s Taxonomy, Machine Learning
Learning resources
- GPUs Go Brrr · Hazy Research
- Are GPUs For You
- GPU Programming: When, Why and How? — GPU programming: why, when and how? documentation
- https://dl.acm.org/doi/pdf/10.1145/3570638
- What Every Developer Should Know About GPU Computing
- What is a flop? | Hacker News
- Course on CUDA Programming
- Can we 10x Rust hashmap throughput? - by Win Wang
- 1. Introduction — parallel-thread-execution 8.1 documentation
- Udacity CS344: Intro to Parallel Programming | NVIDIA Developer
- AUB Spring 2021 El Hajj - YouTube
- How GPU Computing Works | GTC 2021 - YouTube
- The CUDA Parallel Programming Model - 1. Concepts - Fang’s Notebook
- Convolutions with cuDNN – Peter Goldsborough
- https://medium.com/@penberg/demystifying-gpus-for-cpu-centric-programmers-e24934a620f1
Performance
- Typically measured in floating point operations per second or
FLOPS
/GFLOPS
- Good if the no. of floating point operations per memory access is high
Floating Point support
See Floating Point
- GPUs support
half
,single
anddouble
precisions double
precision support on GPUs is fairly recent.- GPU vendors have their own things and support
F32
float32 is very widely used in gaming.
- float32 multiplication is really a 24-bit multiplication, which is about 1/2 the cost of a 32-bit multiplication. So an int32 multiplication is about 2x as expensive as a float32 multiplication.
- On modern desktop GPUs, the difference in performance (FLOPS) between float32 and float64 is close to 4x
Nvdia GPUs
CUDA core
- CUDA cores each core can only do one multiply-accumulate(MAC) on 2 FP32 values
- eg. x += x*y
Tensor core
- Tensor core can take a
4x4 FP16
matrix and multiply it by another4x4 FP16
matrix then add either aFP16/FP32 4x4
matrix to the resulting product and return it as a new matrix. - Certain Tensor cores added support for
INT8
andINT4
precision modes for quantization. - Now there are various architecture variants that Nvdia build upon, Like Turing Tensor, Ampere Tensor etc.
See Category:Nvidia microarchitectures - Wikipedia
RAM
???
VRAM
- Memory = how big the model is allowed to be
Frameworks
- OpenCL: Dominant open GPGPU computing language
- OpenAI Titron: Language and compiler for parallel programming
- CUDA: Dominant proprietary framework
More on CUDA
- Graphic cards support upto certain cuda version. Eg. my card when
nvidia-smi
is run shows CUDA 12.1, it doesn’t mean cuda is installed - So I can install cudatoolkit around that version.
- But cudatoolkit is separate from nvdia driver. You can possibly run cudatoolkit for your graphic card without having the driver.
Pytorch
- Eg. To run Pytorch you don’t need cudatoolkit because they ship their own CUDA runtime and math libs.
- Local CUDA toolkit will be used if we build PyTorch from source etc.
- If pytorch-cuda is built w cuda11.7, you need cuda11.7 installed in your machine. Does it not ship the runtime????
nvcc
is the cuda compiler- torhaudio: https://pytorch.org/audio/main/installation.html
Setting up CUDA on NixOS
- So installing nvidia drivers is different game. Which has nothing to do with cuda. Figure that shit out first, that should go in configuration.nix or whatever configures the system.
- Now for the CUDA runtime, there are few knobs. But most importantly LD_LIBRARY_PATH should not be set globally. See this: Problems with rmagik / glibc: `GLIBCXX_3.4.32’ not found - #7 by rgoulter - Help - NixOS Discourse
- So install all CUDA stuff in a flake, and we should be good.
- Check versions
nvidia-smi
will give the cuda driver version- After installing
pkgs.cudaPackages.cudatoolkit
you’ll havenvcc
in your path.- Running
nvcc --version
will give local cuda version
- Running
- For flake
postShellHook = '' #export LD_DEBUG=libs; # debugging export LD_LIBRARY_PATH="${pkgs.lib.makeLibraryPath [ pkgs.stdenv.cc.cc # pkgs.libGL # pkgs.glib # pkgs.zlib # NOTE: for why we need to set it to "/run/opengl-driver", check following: # - This is primarily to get libcuda.so which is part of the # nvidia kernel driver installation and not part of # cudatoolkit # - https://github.com/NixOS/nixpkgs/issues/272221 # - https://github.com/NixOS/nixpkgs/issues/217780 # NOTE: Instead of using /run/opengl-driver we could do # pkgs.linuxPackages.nvidia_x11 but that'd get another # version of libcuda.so which is not compatiable with the # original driver, so we need to refer to the stuff # directly installed on the OS "/run/opengl-driver" # "${pkgs.cudaPackages.cudatoolkit}" "${pkgs.cudaPackages.cudnn}" ]}" '';
- Other packages
- sometimes we need to add these to LD_LIBRARY_PATH directly
pkgs.cudaPackages.cudatoolkit
pkgs.cudaPackages.cudnn
pkgs.cudaPackages.libcublas
pkgs.cudaPackages.cuda_cudart
pkgscudaPackages.cutensor