Perform document ocr

POST /documents/:documentId/ocr

Document optical character recognition (OCR) request; extract text and data from a document;

Tesseract available for all editions, but Textract engine and tables and forms options available as an Add-On Module

Request

documentId stringrequired

Document Identifier

siteId string

Site Identifier

parseTypes string[]

OCR Parse types - TEXT, FORMS, TABLES

addPdfDetectedCharactersAsText boolean

Rewrite PDF document, converting any Image text to searchable text

ocrEngine OcrEngine (string)

Type of OCR Engine to use

Possible values: [TESSERACT, TEXTRACT]

ocrNumberOfPages string

Number of pages to OCR (from start) (-1 all)

ocrOutputType OcrOutputType (string)

OCR Engine output format (textract table only)

Possible values: [CSV]

200 OK

Response Headers

Schema

message string

OCR processing message

{
  "message": "string"
}