Mappings
Overview
The Mapping feature in FormKiQ provides a powerful system for automatically extracting and organizing document information through custom attribute mappings. This feature enables intelligent document processing by defining how content and metadata should be interpreted and stored as document attributes, i.e., metadata extraction.
Mapping Configuration
Basic Structure
{
"mapping": {
"name": "string", // Required
"description": "string", // Optional
"attributes": [] // Required
}
}
Attribute Properties
Property | Required | Description | Options |
---|---|---|---|
attributeKey | ✓ | Unique identifier for the attribute | String |
sourceType | ✓ | Source of the attribute data | CONTENT, CONTENT_KEY_VALUE, METADATA,MANUAL |
labelTexts | ✓ | Array of text patterns to match | String[] |
labelMatchingType | ✓ | Type of pattern matching | FUZZY , EXACT , BEGINS_WITH , CONTAINS |
defaultValue | Fallback value if no match found | String | |
defaultValues | Array of fallback values | String[] | |
metadataField | Specific metadata field to match | USERNAME , PATH , CONTENT_TYPE | |
validationRegex | Validation pattern for matched values | Regex string |
Source Types
Content-Based Mapping
- Extracts information directly from document text (CONTENT), AWS Textract using parse types of FORMS (CONTENT_KEY_VALUE), Document Metadata(METADATA), or static mapping (MANUAL)
- Useful for processing structured documents like invoices or forms
- Supports multiple label patterns for flexible matching
Metadata-Based Mapping
- Processes document metadata fields
- Ideal for file-level attributes like ownership or content types
- Available fields:
- USERNAME: Document owner/creator
- PATH: Document location/path
- CONTENT_TYPE: Document format
Label Matching Types
FUZZY
- Allows for approximate matches
- Handles minor typos and variations
- Best for natural language content
EXACT
- Requires perfect matches
- Case-sensitive comparison
- Ideal for standardized formats
BEGINS_WITH
- Matches text starting patterns
- Useful for prefixed content
- Case-sensitive matching
CONTAINS
- Finds substrings within content
- More flexible than exact matching
- Good for embedded information
Practical Example
The following are examples for using the POST /mappings endpoint for creating different kinds of mappings that could be used in workflows for Intelligent Document Processing (IDP).
CONTENT Extraction
This mapping examines the content of the document and does a fuzzy (closest) match for the labelTexts. It looks for a value that matches the validationRegex.
{
"mapping": {
"name": "Invoice Number Extractor",
"description": "Extracts standardized invoice numbers",
"attributes": [{
"attributeKey": "invoiceNumber",
"sourceType": "CONTENT",
"labelTexts": [
"invoice no",
"invoice number",
"invoice #"
],
"labelMatchingType": "FUZZY",
"validationRegex": "INV-\\d{5}"
}]
}
}
CONTENT_KEY_VALUE Extraction
This mapping uses KEY/Value pairs generated by using the AWS Textract OCR and performs , examines the content of the document and does a fuzzy (closest) match for the labelTexts. It looks for a value that matches the validationRegex.
{
"mapping": {
"name": "Invoice Number Extractor",
"description": "Extracts standardized invoice numbers",
"attributes": [{
"attributeKey": "invoiceNumber",
"sourceType": "CONTENT_KEY_VALUE",
"labelTexts": [
"invoice no",
"invoice number",
"invoice #"
],
"labelMatchingType": "FUZZY",
"validationRegex": "INV-\\d{5}"
}]
}
}
MANUAL Attribute Value
This mapping can be used in a workflow to set attributes to a specific value.
{
"mapping": {
"name": "Set Acme Company",
"description": "Sets company attribute",
"attributes": [{
"attributeKey": "companyName",
"sourceType": "MANUAL",
"defaultValue": "ACME INC"
}]
}
}
## Best Practices
### Label Design
- **Use multiple label variations:**
```json
"labelTexts": [
"invoice date",
"date of invoice",
"invoice dt",
"billing date"
]
- Consider regional variations:
"labelTexts": [
"zip code",
"postal code",
"zip",
"post code"
]
Validation
-
Format validation for dates:
"validationRegex": "(0[1-9]|1[0-2])/(0[1-9]|[12][0-9]|3[01])/\\d{4}"
-
Validation for email addresses:
"validationRegex": "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}"
-
Default value handling for missing information:
"defaultValue": "Not Specified",
"validationRegex": "\\d{3}-\\d{2}-\\d{4}"
Performance Optimization
-
Use exact matching for standardized fields:
"labelTexts": ["Customer ID"],
"labelMatchingType": "EXACT" -
Balance between precision and recall:
- For critical information (e.g., invoice numbers): Use EXACT matching
- For descriptive fields (e.g., product descriptions): Use CONTAINS matching
Implementation Strategy
- Create a tiered approach to mappings:
- Core document attributes (always extracted)
- Document-type specific attributes (based on classification)
- Optional enrichment attributes (for additional context)
API Reference
For complete API documentation, see Mapping API Reference.