tags : Image Compression, OCR
Resources
- https://excalibur-py.readthedocs.io/en/master/ : table extraction
- Reducto Document Ingestion API (extraction dataset)
- ColPali (see OCR)
- Doctly AI
- PDF/A - Wikipedia
- https://github.com/ocrmypdf/OCRmyPDF (uses pike, pretty useful)
- Ask HN: What is the best method for turning a scanned book as a PDF into text? | Hacker News
- Processing