Extract data from single input
POST/documents/single
Receives a single file input and attempts to both classify the document, to identify its type, and then extract structured data from it. Structured data is presented as a collection of nodes that are inter-related (a directed acylic graph), with the relationships between nodes described using the URIs of another node as property values.
Ensure either a content type or file extension is supplied with the form data. A PDF with no content or an image type will use machine learning to identify text and its bounding boxes before classifiation and extraction. A PDF with text content will use this content directly, and will be slightly faster.
The returned nodes vary depending on the type of document and data within.
This endpoint is synchronous and will return a response within roughly ten seconds for most documents. For multi-page image capture or long PDFs, an extraction session should be used instead as this allows for background processing while new pages are being captured.
The payload must contain exactly one of document (multipart/form-data file upload) or uploadId (JSON reference to a
previously created upload via POST /documents/upload). The filename field is only used with uploadId.
Request
Responses
- 200
- 400
- 500
Document extraction completed without error
Invalid file type (not PDF/JPEG/PNG), missing document field, or payload exceeds 4.5 MB
OCR processing failure, document classification error, or extraction pipeline error