Embeddings

tags : Open Source LLMs, NLP (Natural Language Processing), Machine Learning, Modern AI Stack, Information Retrieval

Embeddings are related to what we talk about when we talk about “Autoencoders” from NLP (Natural Language Processing)

FAQ

Can we reverse embedding?

reversing embeddings is probabilistic.

What about RAG?

See RAG

Does having embedding also mean we’ll be able to do `concise similarity`?

NO.

Mathematically, yes. can always mathematically calculate a distance or similarity between them (e.g., Euclidean distance, cosine similarity, Manhattan distance, etc.). The math will always work.

But does that calculated distance/similarity mean anything useful or reliable in the context of the problem you’re trying to solve?

The training process shapes the space so that this property holds.

There are different kinds of embeddings, Eg.

OpenAI’s embedding models (e.g., text-embedding-3) are separate from the internal token embeddings used in LLMs like GPT-4/O3.
Embedding models are optimized for semantic tasks like search and similarity, while LLM embeddings serve as internal input processing.
They may share architectural ideas but are trained and used for different purposes.
This is the same reason we can’t do real similarity search with LLaVa and instead need something like CLIP.

More Clarity

token embeddings and the embeddings/vectors(output of embedding models) are separate concepts.
token embedding
- numerous token embeddings (one per token) which become contextualized as they propagate through the transformer
embedding/vectors
- single vector/embedding that is output by embedding models
- one per input data, such as long text, photo, or document screenshot
- There are also embedding models which output multiple vectors based on usecase.(bge m3)

Types of embedding

By Modality

Text Embedding

Finding the Best Open-Source Embedding Model for RAG | Timescale

ModernBERT (ColBert)

Image Embedding

MultiModal Embedding

multimodal embeddings where image and text representations are mapped into a common space.

By Purpose

Comparision

LLM Generation

numerous token embeddings (one per token) which become contextualized as they propagate through the transformer

By Language Support

Multilingual

Multilingual Embedding leaderboard: MTEB Leaderboard - a Hugging Face Space by mteb

By Architecture

SAE

Multi Vectors

(ColBERT, ModernBERT, BGE M3)

Deployment

Storing embeddings

VectorDBs / Vector Stores

“Binding generated embeddings to source data, so the vectors automatically update when the data changes is exactly how things should be.”

Vector databases are the wrong abstraction | Hacker News

Vector Store	Type	Search Algorithm	Performance Characteristics
vectorlite	SQLite extension	HNSW (Approximate Nearest Neighbors)	Faster for large vector sets at the cost of some accuracy
sqlite-vec	SQLite extension	Brute force	More accurate but slower with large vector sets
usearch	SQLite extension	Brute force	Similar to sqlite-vec, only exposes vector distance functions
Qdrant	Standalone vector DB	Not specified	Works well but “heavier” for many applications

sqlite-vec

The problem with Parquet is it’s static. Not good for use cases that involve continuous writes and updates. Although I have had good results with DuckDB and Parquet files in object storage. Fast load times.

If you host your own embedding model, then you can transmit numpy float32 compressed arrays as bytes, then decode back into numpy arrays.

Personally I prefer using SQLite with usearch extension. Binary vectors then rerank top 100 with float32. It’s about 2 ms for ~20k items, which beats LanceDB in my tests. Maybe Lance wins on bigger collections. But for my use case it works great, as each user has their own dedicated SQLite file.
- sqlite-vec v0.1.0 Launch Party Recording! - YouTube
- https://news.ycombinator.com/item?id=40244090: initial v0.1.0 release will only have linear scans, but I want to support ANN indexes like IVF/HNSW in the future!

Polars
- The best way to use text embeddings portably is with Parquet and Polars | Hacker News

Vector Tile
- https://www.reddit.com/r/LocalLLaMA/comments/1ekdg3m/introducing_vectorlite_a_fast_and_tunable_vector/

PostgreSQL (pgvector)
- Understanding pgvector’s HNSW Index Storage in Postgres | Lantern Blog

DuckDB
- Vector Similarity Search in DuckDB – DuckDB

Serving embeddings

See Deploying ML applications (applied ML)

https://huggingface.co/spaces/TIGER-Lab/MMEB-Leaderboard
https://huggingface.co/spaces/vidore/vidore-leaderboard (for colpali and likes)
https://huggingface.co/spaces/mteb/leaderboard 🌟

Providers

Nomic
- Nomic Blog: The Nomic Embedding Ecosystem
- They also have a voyeger3 contender now (similar to colpali)
Voyager
Jina
OpenAI
CoHere

Selfhosting

Additionally there’s also: https://github.com/michaelfeil/infinity

Feature/Aspect	Text Embeddings Inference	Ollama	vLLM
Primary Use Case	Production embedding serving	Local development & testing	LLM inference with embedding support
Implementation	Rust	Go	Python
Setup Complexity	Low	Very Low	High
Resource Usage	Minimal	Moderate	High
GPU Support	Yes	Yes	Yes (Optimized)
CPU Support	Yes	Yes	Limited
Model Types	Embedding only	Both LLM and Embeddings	Both LLM and Embeddings
Production Ready	Yes	Limited	Yes
Deployment Type	Microservice	Local/Container	Distributed Service
Customization	Limited	High	High
Throughput	Very High (embeddings)	Moderate	High (both)
Community Support	Growing	Active	Very Active
Architecture Support	x86, ARM	x86, ARM	Primarily x86
Container Support	Yes	Yes	Yes
Monitoring/Metrics	Basic	Basic	Extensive
Hot-reload Support	No	Yes	No
Memory Efficiency	High	Moderate	Varies (KV-cache focused)
Documentation Quality	Good	Excellent	Excellent

🐏 mogoz

Table of Contents

Embeddings

FAQ

Can we reverse embedding?

What about RAG?

Does having embedding also mean we’ll be able to do `concise similarity`?

Types of embedding

By Modality

Text Embedding

Image Embedding

MultiModal Embedding

By Purpose

Comparision

LLM Generation

By Language Support

Multilingual

By Architecture

SAE

Multi Vectors

Deployment

Storing embeddings

VectorDBs / Vector Stores

Serving embeddings

Providers

Selfhosting

Learning resources

Graph View

Backlinks

🐏 mogoz

Table of Contents

Embeddings

FAQ §

Can we reverse embedding? §

What about RAG? §

Does having embedding also mean we’ll be able to do concise similarity? §

Types of embedding §

By Modality §

Text Embedding §

Image Embedding §

MultiModal Embedding §

By Purpose §

Comparision §

LLM Generation §

By Language Support §

Multilingual §

By Architecture §

SAE §

Multi Vectors §

Deployment §

Storing embeddings §

VectorDBs / Vector Stores §

Serving embeddings §

Providers §

Selfhosting §

Learning resources §

Graph View

Backlinks

FAQ

Can we reverse embedding?

What about RAG?

Does having embedding also mean we’ll be able to do `concise similarity`?

Types of embedding

By Modality

Text Embedding

Image Embedding

MultiModal Embedding

By Purpose

Comparision

LLM Generation

By Language Support

Multilingual

By Architecture

SAE

Multi Vectors

Deployment

Storing embeddings

VectorDBs / Vector Stores

Serving embeddings

Providers

Selfhosting

Learning resources