Extract data from single input

POST /documents/single

Receives a single file input and attempts to both classify the document, to identify its type, and then extract structured data from it. Structured data is presented as a collection of nodes that are inter-related (a directed acylic graph), with the relationships between nodes described using the URIs of another node as property values.

Ensure either a content type or file extension is supplied with the form data. A PDF with no content or an image type will use machine learning to identify text and its bounding boxes before classifiation and extraction. A PDF with text content will use this content directly, and will be slightly faster.

The returned nodes vary depending on the type of document and data within.

This endpoint is synchronous and will return a response within roughly ten seconds for most documents. For multi-page image capture or long PDFs, an extraction session should be used instead as this allows for background processing while new pages are being captured.

The payload must contain exactly one of document (multipart/form-data file upload) or uploadId (JSON reference to a previously created upload via POST /documents/upload). The filename field is only used with uploadId.

This endpoint supports the RFC7240 Prefer header for document class and context preferences. Use document-class to specify one or more expected class identifiers (comma-separated), and/or document-context for broader category matching. Combined with handling=strict, a 412 response is returned if the classified document does not match. Without strict handling, a mismatch is reported as a warning in the expectations summary instead.

Request

Responses

Document extraction completed without error

Extract data from single input

/documents/single

Request​

Responses​

Request

Responses