⚙️ LLM Engines: Text & Vision

Airparser uses Large Language Model (LLM) engines to extract structured data from documents.

Currently, the Vision engine is the default engine for all new inboxes. You can still switch to the Text engine if it better fits your use case.

Below is an overview of both engines, their differences, and when to use each one.

Vision engine (default)

The Vision engine analyzes documents directly in their original visual form.
It processes the full layout of the document, including text placement, tables, formatting, and images — without relying on OCR as an intermediate step.

Because the model “sees” the document as a whole, it usually provides higher accuracy for complex or visually rich documents.

Key benefits

  • Preserves layout, tables, and formatting

  • Works well with scanned documents and images

  • Better at understanding complex document structures

  • No information loss caused by OCR text conversion

Shot 2025-12-17 at 20.36.41@2x.jpg

Text engine

The Text engine is Airparser’s original extraction engine. It works by converting documents into text first and then running the LLM on that text content.

For scanned documents or images, an OCR (Optical Character Recognition) step is required before extraction. While this approach is fast and reliable for many cases, OCR may remove or distort visual details such as table structure, text alignment, fonts, or colors.

When the Text engine works well

  • Clean, machine-generated documents

  • Emails and simple PDFs

  • Documents where layout is not important

  • Large text-heavy documents

Vision engine limitations

While powerful, the Vision engine has some limitations to be aware of:

  • Page limit: only the first 10 pages are analyzed

  • Document alignment: We recommend deskewing (rotating) PDFs and images before importing them

  • Text clarity: Very small, blurry, or low-quality text may be misinterpreted

  • Captchas: Captchas are not supported

  • Human identification: The engine does not identify or recognize people

  • Medical data: Not suitable for medical imaging or diagnostics (e.g. MRIs, X-rays)

Choosing the right LLM engine

For most users, the Vision engine is the best starting point and is therefore used by default.

You may want to switch to the Text engine if:

  • Your documents are simple and text-based

  • Layout and visual structure are not important

  • You are processing large documents (more than 10 pages)

  • Speed and cost predictability are a priority

You can always create multiple inboxes with different engines to test which one performs best for your specific documents.

Supported formats

Both engines support the same document formats, including:

  • Emails

  • PDFs

  • Images (JPG, PNG, etc.)

  • Word documents

  • Excel

  • Text files

  • HTML and more

Pricing

The Vision engine uses the same pricing model as the Text engine.

1 PDF page = 1 credit.


Was this article helpful?