Documents Extract

This endpoint allows you to parse text from documents using OCR.

Authentication

AuthorizationBearer

Bearer authentication of the form Bearer <token>, where token is your auth token.

Headers

ToxicitybooleanOptional
Whether to check the output for toxicity.
PiistringOptional
Whether to check the output for Pii, and what to do if found. Supported values are 'replace' and 'block'.
Replace-MethodstringOptional
Method to replace any found PII. Supported values are 'category', 'fake', 'mask', and 'random'.
Entity-Listlist of stringsOptional
An array of entity types that the PII check should ignore.
InjectionbooleanOptional
Whether to check the output for a prompt injection.

Request

This endpoint expects a multipart form containing a file.
filefileRequired
The document file to upload.
embedImagesbooleanOptional
Whether to embed images from the document.
outputFormatstringOptional
The output format for the content of the document.
chunkDocumentbooleanOptional
Whether to separate the document into chunks.
chunkSizeintegerOptional
The size of chunks for the documents.
enableOCRbooleanOptional
Whether to enable OCR for document parsing.

Response

Successful response.
titlestring
The parsed document title.
contentsstring
The parsed document contents.
countinteger
The word count for the document.

Errors

400
Bad Request Error
403
Forbidden Error