Skip to main content

Migration and Data Import

Overview

FormKiQ supports several migration patterns for moving documents, metadata, attributes, and external references from existing systems into a FormKiQ deployment. The right approach depends on the source system, document volume, metadata complexity, cutover window, and whether content should be copied into FormKiQ-managed storage or referenced from an existing location.

Most production migrations follow the same pattern:

  1. Inventory the source repository.
  2. Map source fields to FormKiQ documents, paths, attributes, tags, schemas, and sites.
  3. Run a small pilot migration.
  4. Perform bulk import in repeatable batches.
  5. Validate counts, content, metadata, and search behavior.
  6. Run a final delta migration and cut over users or applications.
tip

For large migrations, treat the migration process as an operational project, not only a data transfer. Plan for retries, validation, rollback, cost monitoring, and source-to-target reconciliation before the first production import.

Choose a Migration Method

Migration methodBest forMain advantagesWatch for
API-based migrationCustom source systems, complex transformations, application-led imports, incremental synchronizationMaximum control over mapping, validation, retries, and business logicRequires custom scripting and careful retry/idempotency design
FileSync CLI and CSV importStructured bulk imports where documents, content locations, and attributes can be prepared as CSV filesRepeatable, batch-oriented, and easier to validate before importAvailable for supported commercial installations; CSV quality matters
FileSync CLI syncLocal folders or S3 locations that should be synchronized into FormKiQGood for file-system style migrations and ongoing sync scenariosMetadata enrichment may require pre-hook or post-hook logic
FKB64 staging bucket importSmaller documents, generated bundles, and staging-bucket based import workflowsCombines content and metadata into one import objectBase64 adds size overhead and requires strict file structure validation
S3 deep linksContent that should remain in an existing S3 locationAvoids copying large content sets into FormKiQ storageExternal object permissions, lifecycle, and retention stay outside FormKiQ-managed storage
Stack-to-stack migrationMoving data between FormKiQ installationsUseful for environment moves or recovery workflowsRequires DynamoDB and S3 coordination; validate target stack compatibility

Before You Migrate

Complete a planning pass before writing migration scripts or preparing import files.

Planning areaQuestions to answer
Source inventoryHow many documents exist, how large are they, what file types are present, and how many versions need to move?
MetadataWhich fields should become FormKiQ attributes, tags, paths, classifications, or external references?
TenancyShould documents import into the default site, defined sites, customer sites, departments, or separate FormKiQ instances?
PermissionsWhich users, groups, folder permissions, or access-control rules must exist before import?
ValidationWhat source-of-truth counts, checksums, sample sets, and reports will prove that the migration succeeded?
ProcessingShould OCR, full-text indexing, rulesets, workflows, or AI processing run during import or after cutover?
CutoverWill there be a read-only freeze, a final delta load, or a period where source and target run in parallel?
RecoveryWhat backups, exports, rollback steps, and re-run strategy are available if a batch fails?

Map Source Data to FormKiQ

Define the target data model before importing content.

Source conceptFormKiQ target
File or objectDocument
Folder pathDocument path
Repository, customer, department, or workspaceSite or separate FormKiQ deployment
Source metadata fieldAttribute definition and document attribute
Keyword or labelTag or attribute, depending on search and governance needs
Document typeClassification or schema
External object URLDeep link
Source record IDAttribute, tag, or migration tracking table
Version historyDocument versions, if version retention is required

Use Attributes, Schemas, and Multi-Tenant and Multi-Instance Deployments to model the target repository before importing production data.

API-Based Migration

API-based migration is the best option when the source repository requires custom extraction, transformation, validation, or incremental synchronization. It is also the best path when an application already owns the migration workflow.

Typical API migration flow:

  1. Extract source document metadata and content references.
  2. Normalize paths, content types, dates, identifiers, and metadata values.
  3. Create or update FormKiQ attribute definitions and schemas.
  4. Create documents or upload document content through the API.
  5. Add attributes, tags, actions, or classifications.
  6. Record the source ID and FormKiQ documentId in a migration tracking table.
  7. Validate the imported document and move to the next batch.

Common API operations:

OperationUse
POST /documentsCreate document records, including inline content or external references.
POST /documents/uploadCreate a presigned upload flow for larger files.
POST /documents/<documentId>/attributesAdd structured metadata to imported documents.
POST /documents/<documentId>/tagsAdd searchable labels or lightweight grouping values.
POST /documents/<documentId>/reindexReindex documents after content or metadata changes.

For SDK examples, see Python SDK and TypeScript SDK.

API Migration Design

Design API migration jobs to be idempotent. If a batch is retried, the script should avoid creating duplicate documents or conflicting metadata.

Recommended practices:

  • Store a source ID to FormKiQ documentId mapping.
  • Use deterministic paths or migration tracking attributes where possible.
  • Validate metadata types before calling the API.
  • Retry transient failures with exponential backoff.
  • Log failed source records with enough detail to replay them.
  • Use presigned upload flows for large files instead of loading files into memory.
  • Separate extraction, transformation, upload, and validation so each stage can be rerun.

FileSync CLI and CSV Import

The FileSync CLI is useful when migration data can be prepared as CSV files or when files can be synchronized from a local directory or S3 location.

The CSV import flow can load:

  • Attribute definitions
  • Document records
  • Document content locations
  • Document attributes

Recommended CSV import order:

  1. Import attributes.
  2. Import documents.
  3. Import document attributes.
  4. Import document content.
  5. Run verification.

For the step-by-step CSV workflow, see Import CSV Data Migration.

FileSync Hooks

For folder or S3 synchronization, pre-hook and post-hook logic can enrich or validate documents during migration.

Use a pre-hook when metadata needs to be derived before import, such as:

  • Looking up a source system record by filename.
  • Adding tags based on folder path.
  • Attaching metadata from a nearby JSON or CSV file.
  • Selecting actions to run after upload.

Use a post-hook when downstream validation or reporting should happen after a document is imported.

For hook details, see Pre-Hook Option.

FKB64 Staging Bucket Import

FKB64 is a FormKiQ Base64 bundle format that combines document content and metadata into a single encoded import file. It is most useful when another process generates FormKiQ-ready import objects and uploads them to the FormKiQ staging bucket.

Use FKB64 when:

  • The import producer can generate strict FormKiQ bundle files.
  • Documents are small enough that Base64 overhead is acceptable.
  • Content and metadata should travel together.
  • A staging bucket workflow is easier than direct API calls.

Avoid FKB64 for very large documents unless the size overhead and processing behavior have been tested. For larger documents, API upload or FileSync content import is usually easier to operate and monitor.

More information is available in the FKB64 File Specification.

Some migrations should register documents in FormKiQ while leaving content in an existing S3 bucket. This can reduce migration time and storage duplication when the source S3 location will continue to be governed outside of FormKiQ.

Use deep links when:

  • The source bucket already has the required retention, encryption, and access controls.
  • Copying content into FormKiQ-managed storage is not required.
  • Applications need FormKiQ metadata, search, workflow, or governance around externally stored content.

Review Document Storage before choosing this pattern. Deep-linked content depends on the external storage location remaining available and properly secured.

Stack-to-Stack Migration

When moving documents between FormKiQ environments, use the FileSync CLI and AWS CLI workflows designed for FormKiQ-to-FormKiQ data movement.

Common scenarios include:

  • Moving from a test environment to a production environment.
  • Rehydrating a replacement stack from restored data.
  • Copying a subset of documents between isolated deployments.
  • Supporting recovery or rollback plans.

For the existing walkthrough, see DynamoDB Data Migration. Also review Backup and Recovery before moving production data.

Validation and Reconciliation

Build validation into every migration stage. A migration is not complete because files uploaded successfully; it is complete when the target repository can be trusted by users and downstream systems.

Recommended validation checks:

CheckWhat to verify
Document countSource document count matches FormKiQ import count for the migrated scope.
Content integritySampled or full-set checksums, file sizes, and content types match the source.
MetadataRequired attributes, tags, classifications, and schemas are present and correctly typed.
PathsDocuments appear in the expected virtual paths and site boundaries.
SearchExpected documents are returned by full-text and attribute searches.
ProcessingOCR, full-text, rulesets, workflows, and actions have completed where expected.
FailuresDead-letter queues, import logs, CloudWatch logs, and CLI verification output are reviewed.
User accessRepresentative users can access only the expected documents.

For failed asynchronous processing, review Dead-Letter Queue.

Performance and Scaling

Large imports can stress the source system, migration host, network path, FormKiQ APIs, DynamoDB, S3, OpenSearch, and downstream processing services.

Performance recommendations:

  • Run migration jobs in the same AWS Region as the FormKiQ deployment when possible.
  • Use EC2 for large migrations instead of a local workstation or AWS CloudShell.
  • Start with a small pilot batch and increase concurrency gradually.
  • Monitor API errors, throttling, Lambda errors, SQS depth, and OpenSearch indexing behavior.
  • Use presigned S3 upload or FileSync content import for large binary files.
  • Avoid running expensive actions on every document during the first bulk load unless they are required for cutover.
  • Reindex or run OCR after the content load if that gives better control over cost and throughput.
  • Keep migration batches small enough that a failed batch can be replayed without manual cleanup.

For service-level scaling considerations, see Scaling FormKiQ Components.

Security and Compliance

Migration jobs often handle the same sensitive documents and metadata as the production repository. Apply production security controls to migration infrastructure.

Security checklist:

  • Use least-privilege IAM roles for migration workers.
  • Confirm the target AWS Region satisfies data residency requirements.
  • Encrypt temporary files, staging buckets, and intermediate exports.
  • Avoid writing sensitive document content or metadata values into logs.
  • Restrict access to source exports, CSV files, and migration tracking tables.
  • Clean up temporary buckets, local files, access keys, and staging data after validation.
  • Confirm whether OCR, AI processing, or external integrations are allowed for the migrated dataset.

For broader guidance, see Security and Compliance, Data Residency, and Data Sovereignty.

Cutover and Rollback

Plan cutover before the production import begins.

Recommended cutover approach:

  1. Run a pilot migration in a non-production environment.
  2. Run a larger dry run in the production-like target.
  3. Freeze or limit writes in the source system.
  4. Run the final bulk or delta import.
  5. Validate source-to-target counts and representative user workflows.
  6. Switch users or applications to FormKiQ.
  7. Keep the source system read-only until the rollback window closes.

Rollback options depend on the migration method and how much the target system was used after cutover. For high-risk migrations, import into a new site or isolated environment first so the migration can be discarded without affecting existing production documents.

Review Backup and Recovery and Updates, Upgrades, and Rollbacks when planning production rollback procedures.

Where to Go Next