Skip to main content

Perform document ocr

POST 

/documents/:documentId/ocr

Document optical character recognition (OCR) request; extract text and data from a document;

Tesseract available for all editions, but Textract engine and tables and forms options available as an Add-On Module

Request

Path Parameters

    documentId stringrequired

    Document Identifier

Query Parameters

    siteId string

    Site Identifier

Body

    parseTypes string[]

    OCR Parse types - TEXT, FORMS, TABLES

    addPdfDetectedCharactersAsText boolean

    Rewrite PDF document, converting any Image text to searchable text

    ocrEngine OcrEngine (string)

    Type of OCR Engine to use

    Possible values: [TESSERACT, TEXTRACT]

    ocrNumberOfPages string

    Number of pages to OCR (from start) (-1 all)

    ocrOutputType OcrOutputType (string)

    OCR Engine output format (textract table only)

    Possible values: [CSV]

Responses

200 OK

Response Headers

  • Access-Control-Allow-Origin

    string

  • Access-Control-Allow-Methods

    string

  • Access-Control-Allow-Headers

    string

Schema

    message string

    OCR processing message

Loading...