Get document ocr content
GET/documents/:documentId/ocr
Get a document's optical character recognition (OCR) result, if exists;
Tesseract available for all editions, but Textract engine and tables and forms options available as an Add-On Module
Request
Path Parameters
Document Identifier
Query Parameters
Site Identifier
Possible values: [TEXT
, KEY_VALUE
, CONTENT_URL
, TABLES
]
Output Format Type
Whether to return a "contentUrl", set value to 'true' (deprecated)
Returns raw 'text' of OCR content. e.g. AWS Textract returns JSON, setting parameter to 'true' converts JSON to Text (deprecated)
Share Identifier
Responses
- 200
200 OK
Response Headers
Access-Control-Allow-Origin
string
Access-Control-Allow-Methods
string
Access-Control-Allow-Headers
string
- application/json
- Schema
- Example (from schema)
Schema
Array [
]
Array [
]
Presigned S3 Urls for the OCR content
keyValues
object[]
List of ocr key / values
Ocr Key
tables
object[]
OCR text result
The OCR technique used
The status of the OCR request
Document Content-Type
Is the content Base64-encoded?
User who requested the OCR
Document Identifier
Inserted Timestamp
{
"contentUrls": [
"string"
],
"keyValues": [
{
"key": "string",
"values": [
"string"
]
}
],
"tables": [
{
"headers": [
"string"
],
"data": [
[
{
"value": "string"
}
]
]
}
],
"data": "string",
"ocrEngine": "string",
"ocrStatus": "string",
"contentType": "string",
"isBase64": true,
"userId": "string",
"documentId": "string",
"insertedDate": "string"
}