Question 1

Will it work on any invoice?

Accepted Answer

It works best on text-based PDFs (the kind your accounting software exports). Scanned image-only PDFs need OCR first — chain through `ocr-pro` or `pdf-vision-ocr`. The heuristics are tuned for English-language invoices using `$`, `£`, `€`, or similar currency symbols.

Question 2

How accurate is the total detection?

Accepted Answer

Two passes: first looks for labelled amounts ("Total:", "Amount Due", "Grand Total"), then falls back to the largest currency value in the document. On well-structured invoices accuracy is high; on unusual layouts (e.g., totals embedded in narrative text) results may be off. The `confidence` field and `warnings` list flag low-confidence extractions.

Question 3

Does it support non-USD currencies?

Accepted Answer

Yes — set the currency symbol parameter to `£`, `€`, `¥`, or any short prefix. The detection logic works the same way for any single-symbol currency.

Question 4

Is anything sent to a server?

Accepted Answer

No. PDF parsing (pdf.js), heuristics, and the entire extraction run in your browser. You can disconnect your network mid-extraction and it still finishes.

Question 5

Can it extract line items?

Accepted Answer

Yes — any row that ends in a currency value and isn't the total/subtotal/tax becomes a line item. The description is everything before the amount; the amount is parsed into a structured `{ value, raw }` object.

PDF Extract Data

About PDF Extract Data

Common uses

Frequently asked questions

Keywords

Try next

About PDF Extract Data

Common uses

Frequently asked questions

Keywords

Try next

Related tools