Skip to main content

Get document ocr content

GET 

/documents/:documentId/ocr

Get a document's optical character recognition (OCR) result, if exists;

Tesseract available for all editions, but Textract engine and tables and forms options available as an Add-On Module

Request

Path Parameters

    documentId stringrequired

    Document Identifier

Query Parameters

    siteId string

    Site Identifier

    outputType string

    Possible values: [TEXT, KEY_VALUE, CONTENT_URL, TABLES]

    Output Format Type

    contentUrl string

    Whether to return a "contentUrl", set value to 'true' (deprecated)

    text string

    Returns raw 'text' of OCR content. e.g. AWS Textract returns JSON, setting parameter to 'true' converts JSON to Text (deprecated)

    shareKey string

    Share Identifier

Responses

200 OK

Response Headers

  • Access-Control-Allow-Origin

    string

  • Access-Control-Allow-Methods

    string

  • Access-Control-Allow-Headers

    string

Schema

    contentUrls string[]

    Presigned S3 Urls for the OCR content

    keyValues

    object[]

    List of ocr key / values

  • Array [

  • key string

    Ocr Key

    values string[]
  • ]

  • tables

    object[]

  • Array [

  • headers string[]
    data array[]
  • ]

  • data string

    OCR text result

    ocrEngine string

    The OCR technique used

    ocrStatus string

    The status of the OCR request

    contentType string

    Document Content-Type

    isBase64 boolean

    Is the content Base64-encoded?

    userId string

    User who requested the OCR

    documentId string

    Document Identifier

    insertedDate string

    Inserted Timestamp

Loading...