⚙️ GPT Engine: Text & Vision
GPT Text Engine
Historically, Airparser has utilized a single engine, which we refer to as the "Text engine". When users upload documents (such as emails, PDFs, scanned images, or Word documents), we execute a complex series of data pre-processing, preparation, and extraction chains. The core of this extraction process is performed by the GPT engine, which is designed to analyze text data exclusively.
For scanned documents (like photos of invoices), we employ an OCR (Optical Character Recognition) engine to convert the image into a specialized text format that the GPT engine can process. While effective, this approach has a notable drawback: it may result in the loss of crucial information such as text formatting, tables, text styles, images, and colors, which can be critical in certain use cases.
To address this limitation, Airparser now offers an alternative GPT engine that may provide enhanced precision for specific scenarios.
GPT Vision Engine
The GPT Vision engine addresses the primary disadvantage of the Text engine by eliminating the OCR step. This allows the GPT engine to directly analyze entire documents without losing any details during text conversion.
GPT Vision Engine Limitations
-
Maximum 10 "pages" per document: For lengthy emails or HTML documents, the Vision engine may only analyze the first 10 pages.
-
PDFs: Only the first 10 pages are analyzed.
-
Language support: the Vision engine performs best with English content, but it also supports other Latin and non-Latin languages.
-
Document alignment: We recommend deskewing (rotating) your documents (PDFs and images) before importing them to Airparser's Vision inbox.
-
Text clarity: Small and unclear texts may be misinterpreted.
-
Captchas: The Vision engine cannot interpret Captchas.
-
Human identification: The Vision engine does not identify people.
-
Medical data: It is not suitable for analyzing medical data, MRIs, etc.
Choosing the Right GPT Engine
For most of our users, the Text engine is sufficient. If you are unsure which engine to use, we recommend starting with the Text engine. You can always create a new inbox with the Vision engine later to determine which works better for your specific use case.
The Vision engine might be more suitable if:
-
Your document layout is complex and contains intricate formatting and tables.
-
You need to analyze images rather than text (e.g., determining the content of a photo uploaded by your customer).
Supported Formats
Both Text and Vision engines support the same document formats, including emails, PDFs, images, Word documents, text files, and more.
Pricing
The GPT Vision engine uses the same pricing model as the Text engine, with 1 PDF page equivalent to 1 credit.