Documents Extract

This endpoint allows you to parse text from documents using OCR.

Headers

AuthorizationstringRequired

Bearer authentication of the form Bearer <token>, where token is your auth token.

ToxicitybooleanOptional
Whether to check the output for toxicity.
PiistringOptional
Whether to check the output for Pii, and what to do if found. Supported values are 'replace' and 'block'.
Replace-MethodstringOptional
Method to replace any found PII. Supported values are 'category', 'fake', 'mask', and 'random'.
InjectionbooleanOptional
Whether to check the output for a prompt injection.

Request

This endpoint expects a multipart form containing a file.
filefileRequired
The document file to upload.
embedImagesbooleanRequired
Whether to embed images from the document.
outputFormatstringRequired
The output format for the content of the document.
chunkDocumentbooleanRequired
Whether to separate the document into chunks.
chunkSizeintegerRequired
The size of chunks for the documents.
enableOCRbooleanRequired
Whether to enable OCR for document parsing.

Response

Successful response.
titlestring or null
The parsed document title.
contentsstring or null
The parsed document contents.
countinteger or null
The word count for the document.

Errors