tags : Programming Languages, Machine Learning, Computer Vision

This is to understand what happened in AI post 2020. Also to keep track of things.


See NLP (Natural Language Processing)


Transformers FAQ

zero-shot/one-zero/few-shot learning?

  • Full training: This is not even in the shot spectrum, we train the whole data to get predictions. We can do “shot” kind of things on pre-trained models.
  • few-shot: few more example/runs, we give it few examples etc.
  • one-shot: give one example, it’ll be able to do it.
  • zero-shot
    • We do not want to give any concrete examples at all
    • But instruct the model in a different way. (Eg. Prompting)

What attention?

  • In each layer of the network
    • Encoders taking words from an input sentence, converting them into a representation
    • Each decoder takes all the encoders’ representation of words and transforms them into words in another language.
  • Decoders getting all the encoders’ output provides a wider context and enables Attention.

Usage (Modality)

  • Text
    • Text classification
    • Test generation
    • Summarization
  • Audio
    • Audio classification
    • Automatic speech recognition
  • Vision
    • Object detection
    • Image classification
    • Image segmentation
  • Multi Modal
    • Visual QA
    • Document QA
    • Image captioning

Diffusion models

Fineturning diffusion models (Stable Difussion)


Textual Inversion




  • It’s a training strategy, a way of doing fine tuning
  • It’s different from Dreambooth and LoRA in ways that they don’t freeze the original model.

  • The complimentary external model can be distributed independently or can be baked into one model.
  • The complimentary model is specific to freezed main model so it’ll only work with that version so we need to care about compatibility






