❓FAQ: Data Extraction

Can I choose whether Airparser processes the email body or attachments?

Yes. In each inbox you can control exactly what Airparser processes:

  • Process emails and attachments (default) – parses both the email body and all attachments.

  • Process emails, skip attachments – only the email body is parsed.

  • Process attachments, skip emails – only attachments (PDF, DOCX, images, etc.) are parsed.

You can change this anytime in your inbox settings.

How to instruct Airparser to parse only some document types?

By default, Airparser attempts to parse all incoming documents. To restrict parsing to certain document types, follow these steps:

  1. You can automatically set the status of some documents to 'Skipped' based on their file type. This will prevent them from being exported and will not trigger your integrations. Learn how to write a simple post-processing code for this.

  2. You can use automation platforms such as Zapier or Make to create more complex import integrations where you can filter documents and import only those you need to parse into Airparser.

I’m parsing emails with attachments. Can I handle both the email and the attachment in the same extraction?

Yes! You can combine the parsed data from both the email and its attachment into a single document.

To do this:

  1. Go to your Inbox settings.

  2. Enable the "Parent Data" meta field.
    This will inject the parsed email data into the parsed data of the attachment, giving you one unified JSON object.

Additionally, you can use a post-processing step to set the status of the parsed email to "skipped" so it won’t trigger any integrations.

I notice that some of the extracted data isn't correct. How to improve the parsing quality?

For most documents, Airparser extracts data with high accuracy. However, in some complex cases where the document structure is too intricate for LLM engines, the extracted data may be incomplete or inaccurate. To improve results, we offer multiple LLM models — text-based and vision-based — which you can switch between from the Inbox settings page to find the best fit for your use case.

How do I parse different document formats that contain the same type of data?

If you receive documents in many different formats (PDFs, emails, spreadsheets, different vendor templates, etc.) but need to extract the same fields every time, you do not need multiple inboxes or multiple schemas.

Airparser’s LLM engine is designed to extract structured data independently of the document’s layout. This means:

  • Different vendors or platforms

  • Different formats (PDF, email body, attachment, DOCX, etc.)

  • Different layouts and structures

…can all be parsed using a single inbox and a single extraction schema, as long as the fields you need are the same.

How to set this up

  1. Create one inbox.

  2. Define a schema listing the fields you want to extract.

  3. Airparser will extract those fields from any supported document format sent to that inbox.

Controlling what gets parsed (email body vs attachments)

If your data is always inside attachments, you can configure the inbox to parse attachments only.
If some venues put data in the email body, you can allow parsing of both.

Exporting the results

Since all formats produce the same schema output, you can export everything to:

  • Google Sheets

  • Webhooks

  • Zapier / Make / n8n

  • API

…without needing separate integrations per format.

When would I need multiple inboxes?

You only need multiple inboxes if the fields you extract differ across document types.
If the fields are the same, one inbox is the most efficient setup.


Was this article helpful?