tags : Image Compression, OCR
Resources
- https://excalibur-py.readthedocs.io/en/master/ : table extraction
- ColPali
- Comments
- It’s not an extraction replacement (I wondered the same thing). It’s retrieval that can bypass extraction.
- I’d misunderstood it as a vision LLM that could extract information from PDFs, it looks like it’s more of an embedding model that can represent a page from a PDF as a set of vectors, which means “will it just refuse to run” isn’t actually a concern (unlike Claude 3 Vision etc)
- In essence; ColPali is just an adapter on PaliGemma for retrieval task, while PaliGemma itself can be used for many others tasks. As the authors point out, you can remove or not use the adapter and also have it use the general capabilities to read the page.
- https://huggingface.co/blog/manu/colpali
- https://blog.vespa.ai/retrieval-with-vision-language-models-colpali/
- Comments
- PDF/A - Wikipedia
- https://github.com/ocrmypdf/OCRmyPDF (uses pike, pretty useful)
- Processing