Data Extraction | Airparser Knowledge Base

⚙️ LLM Engines: Text & Vision

Text Engine Historically, Airparser has utilized a single engine, which we refer to as the "Text engine". When users upload documents (such as emails, PDFs, scanned images, or Word documents), we execute a complex series of data pre-processing, preparati ...

🧠 LLM Parsing Models

Airparser allows you to select the parsing model for each Inbox independently. This means you can fine-tune the parsing performance for different document types without affecting other inboxes. You can change the model at any time from the ...

🔗 Extracting URLs From an Email or HTML Document

By default, Airparser doesn't always parse hidden URLs of links, buttons, and images. You can activate the email and image parsing features from the Inbox Settings page > "Advanced Settings" section. Reparse your documents to see changes. Alternati ...

📦 Extracting XML and XML ADF

Airparser supports automatic XML extraction from emails and documents. This feature allows you to extract structured XML data without needing an extraction schema or using large language models (LLMs). If your email or document already contains structure ...

🚧 Parsing Tips and How to Fix Common Issues

Why are only some pages of my document being parsed? The LLM-powered parser operates within a defined context window, which limits the maximum document size it can process for data extraction. Currently, a key limitation of all LLM engines is their inab ...