tags : Programming Languages, Machine Learning, Computer Vision
This is to understand what happened in AI post 2020. Also to keep track of things.
NLP
See NLP (Natural Language Processing)
Transformers
Transformers FAQ
zero-shot/one-zero/few-shot learning?
- Full training: This is not even in the shot spectrum, we train the whole data to get predictions. We can do “shot” kind of things on pre-trained models.
- few-shot: few more example/runs, we give it few examples etc.
- one-shot: give one example, it’ll be able to do it.
- zero-shot
- We do not want to give any concrete examples at all
- But instruct the model in a different way. (Eg. Prompting)
What attention?
- In each layer of the network
- Encoders taking words from an input sentence, converting them into a representation
- Each decoder takes all the encoders’ representation of words and transforms them into words in another language.
- Decoders getting all the encoders’ output provides a wider context and enables Attention.
Usage (Modality)
- Text
- Text classification
- Test generation
- Summarization
- Audio
- Audio classification
- Automatic speech recognition
- Vision
- Object detection
- Image classification
- Image segmentation
- Multi Modal
- Visual QA
- Document QA
- Image captioning
Diffusion models
- These are different from transformers in arch, training process, how they infer, usecase etc.
- See
Fineturning diffusion models (Stable Difussion)
Dreambooth
Textual Inversion
LoRA
Depth-2-Img
ControlNet
- It’s a training strategy, a way of doing fine tuning
- It’s different from Dreambooth and LoRA in ways that they don’t freeze the original model.
- The complimentary external model can be distributed independently or can be baked into one model.
- The complimentary model is specific to freezed main model so it’ll only work with that version so we need to care about compatibility
TTS
Bark
- https://github.com/suno-ai/bark
- https://github.com/coqui-ai/TTS
- https://github.com/serp-ai/bark-with-voice-clone
tortoise
- https://github.com/neonbjb/tortoise-tts
- https://git.ecker.tech/mrq/ai-voice-cloning
- https://github.com/facebookresearch/fairseq/tree/main/examples/mms