🐏 mogoz

Search

Resources
PDF generation

PDF

May 25, 2025, 1 min read

tags : Image Compression, OCR

https://github.com/dantetemplar/pdf-extraction-agenda 🌟

Resources

https://excalibur-py.readthedocs.io/en/master/ : table extraction
- Reducto Document Ingestion API (extraction dataset)
ColPali (see OCR)
- Ingesting PDFs and why Gemini 2.0 changes everything | Hacker News
- Tips for using Gemini 2.0 for PDF ingestion | Hacker News
Doctly AI
PDF/A - Wikipedia
https://github.com/ocrmypdf/OCRmyPDF (uses pike, pretty useful)
Ask HN: What is the best method for turning a scanned book as a PDF into text? | Hacker News
Processing
- https://zerodha.tech/blog/1-5-million-pdfs-in-25-minutes/
- https://gotenberg.dev/

PDF generation

Using Pandoc and Typst to Produce PDFs | Hacker News

Graph View

Backlinks

VLM(Vision Language Models)

Created with Quartz v4.1.0, © 2025

GitHub
Twitter