tags : Open Source LLMs, NLP (Natural Language Processing), Machine Learning, Modern AI Stack, Information Retrieval

Embeddings are related to what we talk about when we talk about “Autoencoders” from NLP (Natural Language Processing)

What

  • reversing embeddings is probabilistic.

RAG

See RAG

VectorDBs / Vector Stores

“Binding generated embeddings to source data, so the vectors automatically update when the data changes is exactly how things should be.”

Vector StoreTypeSearch AlgorithmPerformance Characteristics
vectorliteSQLite extensionHNSW (Approximate Nearest Neighbors)Faster for large vector sets at the cost of some accuracy
sqlite-vecSQLite extensionBrute forceMore accurate but slower with large vector sets
usearchSQLite extensionBrute forceSimilar to sqlite-vec, only exposes vector distance functions
QdrantStandalone vector DBNot specifiedWorks well but “heavier” for many applications

sqlite-vec

The problem with Parquet is it’s static. Not good for use cases that involve continuous writes and updates. Although I have had good results with DuckDB and Parquet files in object storage. Fast load times.

If you host your own embedding model, then you can transmit numpy float32 compressed arrays as bytes, then decode back into numpy arrays.

Personally I prefer using SQLite with usearch extension. Binary vectors then rerank top 100 with float32. It’s about 2 ms for ~20k items, which beats LanceDB in my tests. Maybe Lance wins on bigger collections. But for my use case it works great, as each user has their own dedicated SQLite file.

Polars

Vector Tile

PostgreSQL (pgvector)

DuckDB

Learning resources

Selfhosting Embeddings

See Deploying ML applications (applied ML)

Additionally there’s also: https://github.com/michaelfeil/infinity

Feature/AspectText Embeddings InferenceOllamavLLM
Primary Use CaseProduction embedding servingLocal development & testingLLM inference with embedding support
ImplementationRustGoPython
Setup ComplexityLowVery LowHigh
Resource UsageMinimalModerateHigh
GPU SupportYesYesYes (Optimized)
CPU SupportYesYesLimited
Model TypesEmbedding onlyBoth LLM and EmbeddingsBoth LLM and Embeddings
Production ReadyYesLimitedYes
Deployment TypeMicroserviceLocal/ContainerDistributed Service
CustomizationLimitedHighHigh
ThroughputVery High (embeddings)ModerateHigh (both)
Community SupportGrowingActiveVery Active
Architecture Supportx86, ARMx86, ARMPrimarily x86
Container SupportYesYesYes
Monitoring/MetricsBasicBasicExtensive
Hot-reload SupportNoYesNo
Memory EfficiencyHighModerateVaries (KV-cache focused)
Documentation QualityGoodExcellentExcellent

Examples