Announcing Surya OCR 2:
- 650M params - 83.3% olmocr bench score (top under 3B) - 87% on internal 91-lang benchmark - 5 pages/s on RTX 5090 - Runs on CPU, GPU, MPS
Announcing Surya OCR 2:
- 650M params - 83.3% olmocr bench score (top under 3B) - 87% on internal 91-lang benchmark - 5 pages/s on RTX 5090 - Runs on CPU, GPU, MPS

Get started with `pip install surya-ocr` and `surya_ocr file.pdf`. Needs llama.cpp (CPU) or Docker (GPU).
More info: - Model - https://huggingface.co/datalab-to/surya-ocr-2 - Github - https://github.com/datalab-to/surya - Blog post - https://www.datalab.to/blog/surya-2 - Playground - https://www.datalab.to/playground

We see 5 pages/second on an RTX 5090 (128 concurrency), and .1 pages/s on an M1 Macbook. There are a few performance levers you can tune (see the README).

Surya 2 improves accuracy significantly across tables, handwriting, forms, math, layout. Here are a few examples.

Here are results across a few top languages. You can see the full multilingual results here - https://github.com/datalab-to/surya/blob/master/static/docs/multilingual.md .

Surya still makes small single-character mistakes on some languages, especially with handwriting - we're actively working on this.
And now that surya is updated, expect an update to marker soon.