🐏 mogoz

Search

SearchSearch
  • Resources
  • PDF generation

PDF

Apr 20, 2025, 1 min read

tags : Image Compression, OCR

Resources §

  • https://excalibur-py.readthedocs.io/en/master/ : table extraction
    • Reducto Document Ingestion API (extraction dataset)
  • ColPali (see OCR)
    • Ingesting PDFs and why Gemini 2.0 changes everything | Hacker News
    • Tips for using Gemini 2.0 for PDF ingestion | Hacker News
  • Doctly AI
  • PDF/A - Wikipedia
  • https://github.com/ocrmypdf/OCRmyPDF (uses pike, pretty useful)
  • Ask HN: What is the best method for turning a scanned book as a PDF into text? | Hacker News
  • Processing
    • https://zerodha.tech/blog/1-5-million-pdfs-in-25-minutes/
    • https://gotenberg.dev/

PDF generation §

  • Using Pandoc and Typst to Produce PDFs | Hacker News

Graph View

PDFImage CompressionOCR

Backlinks

  • OCR

Created with Quartz v4.1.0, © 2025

  • GitHub
  • Twitter