tags : Open Source LLMs, NLP (Natural Language Processing), Machine Learning, Modern AI Stack, Information Retrieval
Embeddings are related to what we talk about when we talk about “Autoencoders” from NLP (Natural Language Processing)
FAQ
Can we reverse embedding?
- reversing embeddings is probabilistic.
What about RAG?
See RAG
Does having embedding also mean we’ll be able to do concise similarity
?
NO.
- Mathematically, yes. can always mathematically calculate a distance or similarity between them (e.g., Euclidean distance, cosine similarity, Manhattan distance, etc.). The math will always work.
- But does that calculated distance/similarity mean anything useful or reliable in the context of the problem you’re trying to solve?
The training process shapes the space so that this property holds.
There are different kinds of embeddings, Eg.
- OpenAI’s embedding models (e.g., text-embedding-3) are separate from the internal token embeddings used in LLMs like GPT-4/O3.
- Embedding models are optimized for semantic tasks like search and similarity, while LLM embeddings serve as internal input processing.
- They may share architectural ideas but are trained and used for different purposes.
- This is the same reason we can’t do real similarity search with LLaVa and instead need something like CLIP.
More Clarity
token embeddings
and theembeddings/vectors(output of embedding models)
are separate concepts.token embedding
- numerous token embeddings (one per token) which become contextualized as they propagate through the transformer
embedding/vectors
- single vector/embedding that is output by embedding models
- one per input data, such as long text, photo, or document screenshot
- There are also embedding models which output multiple vectors based on usecase.(bge m3)
Types of embedding
By Modality
Text Embedding
Finding the Best Open-Source Embedding Model for RAG | Timescale
-
ModernBERT (ColBert)
Image Embedding
MultiModal Embedding
- multimodal embeddings where image and text representations are mapped into a common space.
By Purpose
Comparision
LLM Generation
numerous token embeddings (one per token) which become contextualized as they propagate through the transformer
By Language Support
Multilingual
Multilingual Embedding leaderboard: MTEB Leaderboard - a Hugging Face Space by mteb
By Architecture
SAE
Multi Vectors
(ColBERT, ModernBERT, BGE M3)
Deployment
Storing embeddings
VectorDBs / Vector Stores
“Binding generated embeddings to source data, so the vectors automatically update when the data changes is exactly how things should be.”
Vector Store | Type | Search Algorithm | Performance Characteristics |
---|---|---|---|
vectorlite | SQLite extension | HNSW (Approximate Nearest Neighbors) | Faster for large vector sets at the cost of some accuracy |
sqlite-vec | SQLite extension | Brute force | More accurate but slower with large vector sets |
usearch | SQLite extension | Brute force | Similar to sqlite-vec, only exposes vector distance functions |
Qdrant | Standalone vector DB | Not specified | Works well but “heavier” for many applications |
-
sqlite-vec
The problem with Parquet is it’s static. Not good for use cases that involve continuous writes and updates. Although I have had good results with DuckDB and Parquet files in object storage. Fast load times.
If you host your own embedding model, then you can transmit numpy float32 compressed arrays as bytes, then decode back into numpy arrays.
Personally I prefer using SQLite with usearch extension. Binary vectors then rerank top 100 with float32. It’s about 2 ms for ~20k items, which beats LanceDB in my tests. Maybe Lance wins on bigger collections. But for my use case it works great, as each user has their own dedicated SQLite file.
- sqlite-vec v0.1.0 Launch Party Recording! - YouTube
- https://news.ycombinator.com/item?id=40244090: initial v0.1.0 release will only have linear scans, but I want to support ANN indexes like IVF/HNSW in the future!
-
Vector Tile
Serving embeddings
See Deploying ML applications (applied ML)
- https://huggingface.co/spaces/TIGER-Lab/MMEB-Leaderboard
- https://huggingface.co/spaces/vidore/vidore-leaderboard (for colpali and likes)
- https://huggingface.co/spaces/mteb/leaderboard 🌟
Providers
- Nomic
- Nomic Blog: The Nomic Embedding Ecosystem
- They also have a voyeger3 contender now (similar to colpali)
- Voyager
- Jina
- OpenAI
- CoHere
Selfhosting
Additionally there’s also: https://github.com/michaelfeil/infinity
Feature/Aspect | Text Embeddings Inference | Ollama | vLLM |
---|---|---|---|
Primary Use Case | Production embedding serving | Local development & testing | LLM inference with embedding support |
Implementation | Rust | Go | Python |
Setup Complexity | Low | Very Low | High |
Resource Usage | Minimal | Moderate | High |
GPU Support | Yes | Yes | Yes (Optimized) |
CPU Support | Yes | Yes | Limited |
Model Types | Embedding only | Both LLM and Embeddings | Both LLM and Embeddings |
Production Ready | Yes | Limited | Yes |
Deployment Type | Microservice | Local/Container | Distributed Service |
Customization | Limited | High | High |
Throughput | Very High (embeddings) | Moderate | High (both) |
Community Support | Growing | Active | Very Active |
Architecture Support | x86, ARM | x86, ARM | Primarily x86 |
Container Support | Yes | Yes | Yes |
Monitoring/Metrics | Basic | Basic | Extensive |
Hot-reload Support | No | Yes | No |
Memory Efficiency | High | Moderate | Varies (KV-cache focused) |
Documentation Quality | Good | Excellent | Excellent |
Learning resources
- Semantic search engine for ArXiv, biorxiv and medrxiv | Hacker News
- https://github.com/erikbern/ann-benchmarks
- https://huggingface.co/hkunlp/instructor-xl (embeddings) 🌟
- Don’t use cosine similarity carelessly | Hacker News
- The secret ingredients of word2vec (2016) | Hacker News
- Evaluating Similarity Methods: Speed vs. Precision 🌟
- Nomic Blog: Data Maps, Part 2: Embeddings Are For So Much More Than RAG
- Understanding pgvector’s HNSW Index Storage in Postgres | Lantern Blog
- sqlite vec
- How does cosine similarity work? | Hacker News
- https://simonwillison.net/2023/Oct/23/embeddings/
- Binary vector embeddings are so cool | Lobsters
- Embeddings are underrated | Lobsters
- Embeddings are underrated | Hacker News
- Bengaluru System Meetup: Understanding sqlite-vec - YouTube
- https://pamacha.observablehq.cloud/spherical-umap/?s=35
- Exploring Hacker News by mapping and analyzing 40 million posts and comments for fun | Wilson Lin
- Embedding + CRDT (https://x.com/JungleSilicon/status/1867603691005706515)
- Hacker News Data Map [180MB] | Hacker News
- Deploying
- https://blog.brunk.io/posts/similarity-search-with-duckdb/
- https://simonwillison.net/2024/May/10/exploring-hacker-news-by-mapping-and-analyzing-40-million-posts/
- https://modal.com/blog/embedding-wikipedia
- https://modal.com/blog/fine-tuning-embeddings
- https://modal.com/docs/examples/text_embeddings_inference
- https://docs.vllm.ai/en/latest/getting_started/examples/openai_embedding_client.html