Mappings

Overview

The Mapping feature in FormKiQ provides a powerful system for automatically extracting and organizing document information through custom attribute mappings. This feature enables intelligent document processing by defining how content and metadata should be interpreted and stored as document attributes, i.e., metadata extraction.

Mapping Configuration

Basic Structure

{
  "mapping": {
    "name": "string",           // Required
    "description": "string",    // Optional
    "attributes": []            // Required
  }
}

Attribute Properties

Property	Required	Description	Options
attributeKey	✓	Unique identifier for the attribute	String
sourceType	✓	Source of the attribute data	CONTENT, CONTENT_KEY_VALUE, METADATA,MANUAL
labelTexts	✓	Array of text patterns to match	String[]
labelMatchingType	✓	Type of pattern matching	`FUZZY`, `EXACT`, `BEGINS_WITH`, `CONTAINS`
defaultValue		Fallback value if no match found	String
defaultValues		Array of fallback values	String[]
metadataField		Specific metadata field to match	`USERNAME`, `PATH`, `CONTENT_TYPE`
validationRegex		Validation pattern for matched values	Regex string

Source Types

Content-Based Mapping

Extracts information directly from document text (CONTENT), AWS Textract using parse types of FORMS (CONTENT_KEY_VALUE), Document Metadata(METADATA), or static mapping (MANUAL)
Useful for processing structured documents like invoices or forms
Supports multiple label patterns for flexible matching

Metadata-Based Mapping

Processes document metadata fields
Ideal for file-level attributes like ownership or content types
Available fields:
- USERNAME: Document owner/creator
- PATH: Document location/path
- CONTENT_TYPE: Document format

Label Matching Types

FUZZY

Allows for approximate matches
Handles minor typos and variations
Best for natural language content

EXACT

Requires perfect matches
Case-sensitive comparison
Ideal for standardized formats

BEGINS_WITH

Matches text starting patterns
Useful for prefixed content
Case-sensitive matching

CONTAINS

Finds substrings within content
More flexible than exact matching
Good for embedded information

Practical Example

The following are examples for using the POST /mappings endpoint for creating different kinds of mappings that could be used in workflows for Intelligent Document Processing (IDP).

CONTENT Extraction

This mapping examines the content of the document and does a fuzzy (closest) match for the labelTexts. It looks for a value that matches the validationRegex.

{
  "mapping": {
    "name": "Invoice Number Extractor",
    "description": "Extracts standardized invoice numbers",
    "attributes": [{
      "attributeKey": "invoiceNumber",
      "sourceType": "CONTENT",
      "labelTexts": [
        "invoice no",
        "invoice number",
        "invoice #"
      ],
      "labelMatchingType": "FUZZY",
      "validationRegex": "INV-\\d{5}"
    }]
  }
}

CONTENT_KEY_VALUE Extraction

This mapping uses KEY/Value pairs generated by using the AWS Textract OCR and performs , examines the content of the document and does a fuzzy (closest) match for the labelTexts. It looks for a value that matches the validationRegex.

{
  "mapping": {
    "name": "Invoice Number Extractor",
    "description": "Extracts standardized invoice numbers",
    "attributes": [{
      "attributeKey": "invoiceNumber",
      "sourceType": "CONTENT_KEY_VALUE",
      "labelTexts": [
        "invoice no",
        "invoice number",
        "invoice #"
      ],
      "labelMatchingType": "FUZZY",
      "validationRegex": "INV-\\d{5}"
    }]
  }
}

MANUAL Attribute Value

This mapping can be used in a workflow to set attributes to a specific value.

{
  "mapping": {
    "name": "Set Acme Company",
    "description": "Sets company attribute",
    "attributes": [{
      "attributeKey": "companyName",
      "sourceType": "MANUAL",
      "defaultValue": "ACME INC"
    }]
  }
}

## Best Practices

### Label Design
- **Use multiple label variations:**
  ```json
  "labelTexts": [
    "invoice date",
    "date of invoice",
    "invoice dt",
    "billing date"
  ]

Consider regional variations:

"labelTexts": [
  "zip code",
  "postal code",
  "zip",
  "post code"
]

Validation

Format validation for dates:

"validationRegex": "(0[1-9]|1[0-2])/(0[1-9]|[12][0-9]|3[01])/\\d{4}"

Validation for email addresses:

"validationRegex": "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}"

Default value handling for missing information:

"defaultValue": "Not Specified",
"validationRegex": "\\d{3}-\\d{2}-\\d{4}"

Performance Optimization

Use exact matching for standardized fields:

"labelTexts": ["Customer ID"],
"labelMatchingType": "EXACT"

Balance between precision and recall:
- For critical information (e.g., invoice numbers): Use EXACT matching
- For descriptive fields (e.g., product descriptions): Use CONTAINS matching

Implementation Strategy

Create a tiered approach to mappings:
1. Core document attributes (always extracted)
2. Document-type specific attributes (based on classification)
3. Optional enrichment attributes (for additional context)

API Reference

For complete API documentation, see Mapping API Reference.

Overview​

Mapping Configuration​

Basic Structure​

Attribute Properties​

Source Types​

Content-Based Mapping​

Metadata-Based Mapping​

Label Matching Types​

FUZZY​

EXACT​

BEGINS_WITH​

CONTAINS​

Practical Example​

CONTENT Extraction​

CONTENT_KEY_VALUE Extraction​

MANUAL Attribute Value​

Validation​

Performance Optimization​

Implementation Strategy​

API Reference​