Skip to main content

Mappings

Overview

The Mapping feature in FormKiQ provides a powerful system for automatically extracting and organizing document information through custom attribute mappings. This feature enables intelligent document processing by defining how content and metadata should be interpreted and stored as document attributes, i.e., metadata extraction.

Mapping Configuration

Basic Structure

{
"mapping": {
"name": "string", // Required
"description": "string", // Optional
"attributes": [] // Required
}
}

Attribute Properties

PropertyRequiredDescriptionOptions
attributeKeyUnique identifier for the attributeString
sourceTypeSource of the attribute dataCONTENT, CONTENT_KEY_VALUE, METADATA,MANUAL
labelTextsArray of text patterns to matchString[]
labelMatchingTypeType of pattern matchingFUZZY, EXACT, BEGINS_WITH, CONTAINS
defaultValueFallback value if no match foundString
defaultValuesArray of fallback valuesString[]
metadataFieldSpecific metadata field to matchUSERNAME, PATH, CONTENT_TYPE
validationRegexValidation pattern for matched valuesRegex string

Source Types

Content-Based Mapping

  • Extracts information directly from document text (CONTENT), AWS Textract using parse types of FORMS (CONTENT_KEY_VALUE), Document Metadata(METADATA), or static mapping (MANUAL)
  • Useful for processing structured documents like invoices or forms
  • Supports multiple label patterns for flexible matching

Metadata-Based Mapping

  • Processes document metadata fields
  • Ideal for file-level attributes like ownership or content types
  • Available fields:
    • USERNAME: Document owner/creator
    • PATH: Document location/path
    • CONTENT_TYPE: Document format

Label Matching Types

FUZZY

  • Allows for approximate matches
  • Handles minor typos and variations
  • Best for natural language content

EXACT

  • Requires perfect matches
  • Case-sensitive comparison
  • Ideal for standardized formats

BEGINS_WITH

  • Matches text starting patterns
  • Useful for prefixed content
  • Case-sensitive matching

CONTAINS

  • Finds substrings within content
  • More flexible than exact matching
  • Good for embedded information

Practical Example

The following are examples for using the POST /mappings endpoint for creating different kinds of mappings that could be used in workflows for Intelligent Document Processing (IDP).

CONTENT Extraction

This mapping examines the content of the document and does a fuzzy (closest) match for the labelTexts. It looks for a value that matches the validationRegex.

{
"mapping": {
"name": "Invoice Number Extractor",
"description": "Extracts standardized invoice numbers",
"attributes": [{
"attributeKey": "invoiceNumber",
"sourceType": "CONTENT",
"labelTexts": [
"invoice no",
"invoice number",
"invoice #"
],
"labelMatchingType": "FUZZY",
"validationRegex": "INV-\\d{5}"
}]
}
}

CONTENT_KEY_VALUE Extraction

This mapping uses KEY/Value pairs generated by using the AWS Textract OCR and performs , examines the content of the document and does a fuzzy (closest) match for the labelTexts. It looks for a value that matches the validationRegex.

{
"mapping": {
"name": "Invoice Number Extractor",
"description": "Extracts standardized invoice numbers",
"attributes": [{
"attributeKey": "invoiceNumber",
"sourceType": "CONTENT_KEY_VALUE",
"labelTexts": [
"invoice no",
"invoice number",
"invoice #"
],
"labelMatchingType": "FUZZY",
"validationRegex": "INV-\\d{5}"
}]
}
}

MANUAL Attribute Value

This mapping can be used in a workflow to set attributes to a specific value.

{
"mapping": {
"name": "Set Acme Company",
"description": "Sets company attribute",
"attributes": [{
"attributeKey": "companyName",
"sourceType": "MANUAL",
"defaultValue": "ACME INC"
}]
}
}

## Best Practices

### Label Design
- **Use multiple label variations:**
```json
"labelTexts": [
"invoice date",
"date of invoice",
"invoice dt",
"billing date"
]
  • Consider regional variations:
    "labelTexts": [
    "zip code",
    "postal code",
    "zip",
    "post code"
    ]

Validation

  • Format validation for dates:

    "validationRegex": "(0[1-9]|1[0-2])/(0[1-9]|[12][0-9]|3[01])/\\d{4}"
  • Validation for email addresses:

    "validationRegex": "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}"
  • Default value handling for missing information:

    "defaultValue": "Not Specified",
    "validationRegex": "\\d{3}-\\d{2}-\\d{4}"

Performance Optimization

  • Use exact matching for standardized fields:

    "labelTexts": ["Customer ID"],
    "labelMatchingType": "EXACT"
  • Balance between precision and recall:

    • For critical information (e.g., invoice numbers): Use EXACT matching
    • For descriptive fields (e.g., product descriptions): Use CONTAINS matching

Implementation Strategy

  • Create a tiered approach to mappings:
    1. Core document attributes (always extracted)
    2. Document-type specific attributes (based on classification)
    3. Optional enrichment attributes (for additional context)

API Reference

For complete API documentation, see Mapping API Reference.