tags : Floating Point, Concurrency, Flynn’s Taxonomy, Machine Learning

Learning resources

Performance

  • Typically measured in floating point operations per second or FLOPS / GFLOPS
  • Good if the no. of floating point operations per memory access is high

Floating Point support

See Floating Point

  • GPUs support half, single and double precisions
  • double precision support on GPUs is fairly recent.
  • GPU vendors have their own things and support

F32

float32 is very widely used in gaming.

  • float32 multiplication is really a 24-bit multiplication, which is about 1/2 the cost of a 32-bit multiplication. So an int32 multiplication is about 2x as expensive as a float32 multiplication.
  • On modern desktop GPUs, the difference in performance (FLOPS) between float32 and float64 is close to 4x

Nvdia GPUs

CUDA core

  • CUDA cores each core can only do one multiply-accumulate(MAC) on 2 FP32 values
  • eg. x += x*y

Tensor core

  • Tensor core can take a 4x4 FP16 matrix and multiply it by another 4x4 FP16 matrix then add either a FP16/FP32 4x4 matrix to the resulting product and return it as a new matrix.
  • Certain Tensor cores added support for INT8 and INT4 precision modes for quantization.
  • Now there are various architecture variants that Nvdia build upon, Like Turing Tensor, Ampere Tensor etc.

See Category:Nvidia microarchitectures - Wikipedia

RAM

???

VRAM

  • Memory = how big the model is allowed to be

Frameworks

  • OpenCL: Dominant open GPGPU computing language
  • OpenAI Titron: Language and compiler for parallel programming
  • CUDA: Dominant proprietary framework

More on CUDA

  • Graphic cards support upto certain cuda version. Eg. my card when nvidia-smi is run shows CUDA 12.1, it doesn’t mean cuda is installed
  • So I can install cudatoolkit around that version.
  • But cudatoolkit is separate from nvdia driver. You can possibly run cudatoolkit for your graphic card without having the driver.

Pytorch

  • Eg. To run Pytorch you don’t need cudatoolkit because they ship their own CUDA runtime and math libs.
  • Local CUDA toolkit will be used if we build PyTorch from source etc.
  • If pytorch-cuda is built w cuda11.7, you need cuda11.7 installed in your machine. Does it not ship the runtime????
  • nvcc is the cuda compiler
  • torhaudio: https://pytorch.org/audio/main/installation.html

Setting up CUDA on NixOS

  • So installing nvidia drivers is different game. Which has nothing to do with cuda. Figure that shit out first, that should go in configuration.nix or whatever configures the system.
  • Now for the CUDA runtime, there are few knobs. But most importantly LD_LIBRARY_PATH should not be set globally. See this: Problems with rmagik / glibc: `GLIBCXX_3.4.32’ not found - #7 by rgoulter - Help - NixOS Discourse
  • So install all CUDA stuff in a flake, and we should be good.
  • Check versions
    • nvidia-smi will give the cuda driver version
    • After installing pkgs.cudaPackages.cudatoolkit you’ll have nvcc in your path.
      • Running nvcc --version will give local cuda version
  • For flake
    postShellHook = ''
    #export LD_DEBUG=libs; # debugging
     
    export LD_LIBRARY_PATH="${pkgs.lib.makeLibraryPath [
      pkgs.stdenv.cc.cc
      # pkgs.libGL
      # pkgs.glib
      # pkgs.zlib
     
      # NOTE: for why we need to set it to "/run/opengl-driver", check following:
      # - This is primarily to get libcuda.so which is part of the
      #   nvidia kernel driver installation and not part of
      #   cudatoolkit
      # - https://github.com/NixOS/nixpkgs/issues/272221
      # - https://github.com/NixOS/nixpkgs/issues/217780
      # NOTE: Instead of using /run/opengl-driver we could do
      #       pkgs.linuxPackages.nvidia_x11 but that'd get another
      #       version of libcuda.so which is not compatiable with the
      #       original driver, so we need to refer to the stuff
      #       directly installed on the OS
      "/run/opengl-driver"
     
      # "${pkgs.cudaPackages.cudatoolkit}"
      "${pkgs.cudaPackages.cudnn}"
    ]}"
    '';
  • Other packages
    • sometimes we need to add these to LD_LIBRARY_PATH directly
    • pkgs.cudaPackages.cudatoolkit
    • pkgs.cudaPackages.cudnn
    • pkgs.cudaPackages.libcublas
    • pkgs.cudaPackages.cuda_cudart
    • pkgscudaPackages.cutensor