๐Ÿ mogoz

Search

SearchSearch
  • Resources
  • PDF generation

PDF

May 25, 2025, 1 min read

tags : Image Compression, OCR

https://github.com/dantetemplar/pdf-extraction-agenda ๐ŸŒŸ

Resources ยง

  • https://excalibur-py.readthedocs.io/en/master/ : table extraction
    • Reducto Document Ingestion API (extraction dataset)
  • ColPali (see OCR)
    • Ingesting PDFs and why Gemini 2.0 changes everything | Hacker News
    • Tips for using Gemini 2.0 for PDF ingestion | Hacker News
  • Doctly AI
  • PDF/A - Wikipedia
  • https://github.com/ocrmypdf/OCRmyPDF (uses pike, pretty useful)
  • Ask HN: What is the best method for turning a scanned book as a PDF into text? | Hacker News
  • Processing
    • https://zerodha.tech/blog/1-5-million-pdfs-in-25-minutes/
    • https://gotenberg.dev/

PDF generation ยง

  • Using Pandoc and Typst to Produce PDFs | Hacker News

Graph View

PDFImage CompressionOCRVLM(Vision Language Models)

Backlinks

  • VLM(Vision Language Models)

Created with Quartz v4.1.0, ยฉ 2025

  • GitHub
  • Twitter