tags : Programming Languages, Machine Learning, Computer Vision

This is to understand what happened in AI post 2020. Also to keep track of things.

NLP

See NLP (Natural Language Processing)

Transformers

Transformers FAQ

zero-shot/one-zero/few-shot learning?

  • Full training: This is not even in the shot spectrum, we train the whole data to get predictions. We can do “shot” kind of things on pre-trained models.
  • few-shot: few more example/runs, we give it few examples etc.
  • one-shot: give one example, it’ll be able to do it.
  • zero-shot
    • We do not want to give any concrete examples at all
    • But instruct the model in a different way. (Eg. Prompting)

What attention?

  • In each layer of the network
    • Encoders taking words from an input sentence, converting them into a representation
    • Each decoder takes all the encoders’ representation of words and transforms them into words in another language.
  • Decoders getting all the encoders’ output provides a wider context and enables Attention.

Usage (Modality)

  • Text
    • Text classification
    • Test generation
    • Summarization
  • Audio
    • Audio classification
    • Automatic speech recognition
  • Vision
    • Object detection
    • Image classification
    • Image segmentation
  • Multi Modal
    • Visual QA
    • Document QA
    • Image captioning

Diffusion models

Fineturning diffusion models (Stable Difussion)

Dreambooth

Textual Inversion

LoRA

Depth-2-Img

ControlNet

  • It’s a training strategy, a way of doing fine tuning
  • It’s different from Dreambooth and LoRA in ways that they don’t freeze the original model.

  • The complimentary external model can be distributed independently or can be baked into one model.
  • The complimentary model is specific to freezed main model so it’ll only work with that version so we need to care about compatibility

TTS

Bark

tortoise

Piper

StyleTTS

STT

Whisper