SnowConvert AI: Data migration [Preview]¶

Tip

This page is a SnowConvert CLI user-guide overview. For the full, up-to-date reference (Oracle, PostgreSQL, Teradata write_nos/tpt, Extraction strategies, Iceberg targets, Worker generate-config, and workflow YAML/JSON), see SnowConvert AI: Data migration (technical documentation).

Data Migration uses the data-migration skill from the Snowflake AIM Migration Agent. For more details, contact snowconvert-support@snowflake.com.

The Data Migration feature of SnowConvert provides a fault-tolerant, scalable solution for moving data from external sources into Snowflake. This feature is specifically designed for cases where you are moving data from a system you plan to decommission. For replication purposes, other solutions are available that might better fit your use case.

Architecture overview¶

The Data Migration feature uses two main components: an Orchestrator and one or more Workers.

The Orchestrator connects to the Snowflake account. It requires privileges to create and operate the SNOWCONVERT_AI database, where metadata is stored.
One or more Workers connect to both the source system and the Snowflake account. Workers read data from the source system and upload it to a Snowflake stage. Workers pick up tasks created by the Orchestrator and process them in parallel.
Files uploaded to the Snowflake stage are copied into the target tables using a COPY INTO statement submitted and monitored by the Orchestrator.

Deployment options¶

The Orchestrator and Workers can be deployed in multiple ways:

Both on Snowpark Container Services (in the Snowflake account).
Both in the customer’s environment, including custom hardware, virtual machines, or containers.
Orchestrator on Snowpark Container Services and Workers in the customer’s environment, or the other way around.

The following requirements apply to the environment where the Orchestrator and Workers run:

Workers typically require an ODBC driver to connect to the source system.
The Orchestrator must be able to connect to the Snowflake account using a role that has privileges to create the SNOWCONVERT_AI database and create schemas and objects within it.

Prerequisites¶

Before you use Data Migration, make sure the following are in place:

SCAI CLI: The SnowConvert AI (SCAI) CLI must be installed. Use scai data orchestrator setup and scai data worker setup to deploy the Orchestrator and Workers (see Installation).
Snowflake access: Connections for the Orchestrator and Workers in your Snowflake config.toml or connections.toml, using a role that can create the SNOWCONVERT_AI database and its objects. The first time the Orchestrator starts, it creates that database and related resources if they do not exist yet. On later runs, use a role that can administer SNOWCONVERT_AI and its objects; sticking with the same role you used for the initial creation is the simplest way to avoid permission issues.
Source connectivity: For typical source databases, an ODBC driver on the machine where Workers run. Programmatic Access Tokens (PATs) are recommended for Snowflake connections; see Connecting to Snowflake with a PAT.
Hybrid Tables: Hybrid Tables must be enabled and available in your Snowflake account and region for this feature. Review Hybrid tables and Hybrid tables limitations so you understand relevant platform requirements.
Snowpark Container Services (optional): If you deploy the Orchestrator or Workers on Snowflake compute, your account needs SPCS. See the Snowpark Container Services overview. Running both components outside Snowflake does not require SPCS.

Setup¶

Installation¶

Install the SnowConvert AI (SCAI) CLI, then use these commands to set up the Orchestrator and Workers. Cloud deployment requires Snowpark Container Services (SPCS) to be enabled on the Snowflake account.

# Generate Worker TOML from your project (recommended):
scai data worker generate-config .scai/settings/DataExchangeWorkerConfig.toml

# Set up the Orchestrator on SPCS (first time):
scai data orchestrator setup --compute-pool MY_COMPUTE_POOL

# Set up the Worker SPCS service (first time):
scai data worker setup --compute-pool MY_COMPUTE_POOL

Usage¶

To migrate data using this solution, complete the following high-level steps:

Start the Orchestrator.
Start the Workers.
Create a Data Migration Workflow.
Monitor the Data Migration Workflow until completion.

A Data Migration Workflow is essentially an action or goal for the system to complete, such as migrating a specific set of tables with a given configuration. You can submit multiple workflows simultaneously and monitor them. The Orchestrator breaks Data Migration Workflows into smaller tasks, which typically involves splitting a table into partitions before extracting its data and loading it to Snowflake.

Starting the Orchestrator¶

Use these commands to set up and start the Orchestrator. Remember to stop the service when it’s no longer needed.

# Set up the Orchestrator on SPCS (first time):
scai data orchestrator setup --compute-pool MY_COMPUTE_POOL

# Set up with a specific connection:
scai data orchestrator setup --compute-pool MY_COMPUTE_POOL --connection my-snowflake

# Resume a stopped Orchestrator:
scai data orchestrator start

# Stop the Orchestrator:
scai data orchestrator stop

The Orchestrator runs until you stop it. Data Migration Workflows require an active Orchestrator to complete. However, the Orchestrator can be safely stopped at any point and resumed later; ongoing Data Migration Workflows are resumed at that point.

Starting the Workers¶

Start a Worker by running the following commands. See Worker configuration for the configuration file reference.

# Set up Worker SPCS service (first time):
scai data worker setup --compute-pool MY_COMPUTE_POOL

# Set up with a specific connection:
scai data worker setup --compute-pool MY_COMPUTE_POOL --connection my-snowflake

# Resume/start the Worker service:
scai data worker start

# Run a local Worker (optional, for customer-environment deployment):
scai data worker start --local my-worker-config.toml

# Auto-generate local worker config:
scai data worker start --local --auto-config

Workers run until you stop them. Data Migration Workflows require at least one active Worker to complete. However, Workers can be safely stopped at any point and resumed later; ongoing Data Migration Workflows are resumed at that point.

Creating a Data Migration Workflow¶

Generate a Workflow Configuration based on the state of your SCAI project. You can also create one by hand or ask Snowflake CoCo for help. See Workflow configuration reference for the full specification.

# Generate config for all tables:
scai data migrate generate-config

# Filter tables by schema:
scai data migrate generate-config --where "source.schema = 'public'"

# Specific tables by canonical name:
scai data migrate generate-config \
  -o core_tables.yaml \
  --where "source.canonicalName IN ('dbo.orders', 'dbo.customers')"

# Custom output path:
scai data migrate generate-config -o my-config.yaml

Once you have a Workflow Configuration, start a workflow. You can return immediately or wait for completion.

# Start migration (returns immediately):
scai data migrate start --config my-data-migration-config.yaml --connection my-snowflake

# Start migration with default config:
scai data migrate start --connection my-snowflake

# Create workflow and wait for completion:
scai data migrate create-workflow --config my-data-migration-config.yaml --connection my-snowflake --watch

Monitoring a Data Migration Workflow¶

After a workflow starts, check its status with the following commands. The same observability features, including the DATA_MIGRATION_DASHBOARD Streamlit dashboard, are also available.

# Check workflow status:
scai data migrate status DATA_MIGRATION_WORKFLOW_xx_yy_zz

# Watch workflow progress:
scai data migrate status DATA_MIGRATION_WORKFLOW_xx_yy_zz --watch

# List all workflows:
scai data migrate list

# Filter by status:
scai data migrate list --status running

In the SNOWCONVERT_AI.DATA_MIGRATION schema, the following tables and views can be queried to understand the status of one or more workflows:

WORKFLOW: Contains one row per workflow, including start/end time, status, and configuration.
TABLE_PROGRESS_WITH_EXAMPLE_ERROR: Contains one row per table being migrated as part of a workflow. Includes information about how many partitions are in each stage (extraction, loading, completed, or failed), as well as related errors. Can be filtered by WORKFLOW_ID.
DATA_MIGRATION_ERROR: For each partition of a table being migrated, contains the first known error that affected the migration of that partition. Can be filtered by WORKFLOW_ID.

In the same schema, the DATA_MIGRATION_DASHBOARD Streamlit dashboard can be used to monitor workflows. This dashboard presents data from those tables and views.

Redshift UNLOAD¶

For Redshift, it is recommended to use the unload extraction strategy. This works as follows:

Large query results are written directly to an S3 bucket instead of being downloaded to the machine running the Worker.
On the Snowflake side, an external stage is configured to reference the corresponding S3 bucket, so that COPY INTO statements can be executed directly from that stage.

For configuration details, see ExtractionStrategy model.

Incremental synchronization¶

You can migrate tables and then re-migrate them in the future, moving only the data that has changed. For more details, see SynchronizationStrategy model.

Considerations and recommendations¶

Connecting to Snowflake with a PAT¶

Use Programmatic Access Tokens (PATs) for connections used by the Orchestrator and Workers. This ensures there is no need to constantly authenticate through the browser or with an authenticator app. You will need to establish a Network Policy or temporarily bypass the requirement for a Network Policy (this can be done from Snowsight).

Running Orchestrator and Workers on SPCS¶

Use scai data orchestrator setup and scai data worker setup to deploy on Snowpark Container Services (SPCS). These commands create the required SPCS services on the specified compute pool and wait for them to reach a running state.

Keep the following in mind:

It is possible to run only one component (Orchestrator or Workers) in SPCS and the other locally using scai data orchestrator start --local or scai data worker start --local.
Monitor the SPCS service and suspend it when it is not being used (scai data orchestrator stop and scai data worker stop).
Depending on the network configuration of the source system, you might need to configure an External Access Integration so that these services can connect to your source system.

Initial testing¶

Deploy the DDL for the tables you want to migrate before starting data migration. This ensures the target type matches the behavior you want to see in the table and its related views and procedures. You can convert DDL from your source dialect into Snowflake SQL using the code conversion capabilities of SnowConvert AI, Snowflake CoCo, or both.

Note

If you don’t deploy the DDL for the tables before starting data migration, the types will be inferred, which may not be as accurate as required.

For an early test run, use a separate workflow configuration whose tables array lists only the table or small set of tables you want to validate. On each of those entries, set whereClauseCriteria to an SQL-like predicate (as you would in a WHERE clause) so only a subset of rows is migrated, for example a bounded primary-key range or a narrow date range in the source dialect. You can also set a small partitionSize (for example maxRowsPerPartition) to keep partitions tiny during the test. After you confirm connectivity, performance, and results, create your full workflow: remove or relax whereClauseCriteria and use "auto" or your production partitionSize settings.

Managing Workers¶

The time it takes to complete a workflow depends on many variables, but the number of Workers (and threads per Worker) has the greatest impact, as it determines how many extraction tasks can be executed in parallel. Consider the following:

It is not necessary to run two Workers on the same machine. If you want more parallelism on a single machine, increase the thread count instead.
Network bandwidth greatly affects Worker speed and is shared between threads of a Worker.
Even with many Workers and threads running in parallel, the source system might not have enough resources to handle the load.
Keep a low Worker count to avoid overloading your source system.
Consider stopping some or all Workers when the source system is already under heavy load from other operations.

Extraction strategies¶

Each table in a migration workflow can set extraction.strategy (or inherit from defaultTableConfiguration.extraction). For the full field reference, Worker TOML, and Teradata complete configuration, see Extraction strategies and Teradata: Complete configuration reference on the technical documentation page.

Strategy	Platforms	When to use
`regular` (default)	All	Worker pulls data through the source connection, then uploads to the Snowflake internal stage.
`unload`	Redshift only	Redshift UNLOAD to S3; Snowflake loads from an external stage.
`tpt`	Teradata only	TPT EXPORT via `tbuild` on the Worker host.
`write_nos`	Teradata only	Teradata WRITE_NOS to cloud storage; Snowflake external stage load.

regular (default)¶

tables:
  - source:
      databaseName: MY_DB
      schemaName: sales
      tableName: orders
    target:
      databaseName: TARGET_DB
      schemaName: sales
      tableName: orders
    columnNamesToPartitionBy:
      - order_id

Redshift unload¶

tables:
  - source:
      databaseName: dev
      schemaName: public
      tableName: large_events
    target:
      databaseName: TARGET_DB
      schemaName: public
      tableName: large_events
    columnNamesToPartitionBy:
      - event_date
    extraction:
      strategy: unload
      externalStage: TARGET_DB.PUBLIC.S3_EXTERNAL_STAGE

Teradata tpt¶

Requires tbuild on the Worker host. Optional tpt_delimiter and tpt_max_sessions in Worker TOML.

tables:
  - source:
      databaseName: MY_DB
      schemaName: sales
      tableName: large_fact_table
    target:
      databaseName: TARGET_DB
      schemaName: sales
      tableName: large_fact_table
    columnNamesToPartitionBy:
      - sale_date
    extraction:
      strategy: tpt

Teradata write_nos¶

Requires write_nos_* fields in Worker TOML. See the technical Teradata reference for credential modes.

tables:
  - source:
      databaseName: MY_DB
      schemaName: sales
      tableName: very_large_table
    target:
      databaseName: TARGET_DB
      schemaName: sales
      tableName: very_large_table
    columnNamesToPartitionBy:
      - sale_date
    extraction:
      strategy: write_nos
      externalStage: TARGET_DB.SALES.MY_CLOUD_STAGE

Configuration reference¶

Workflow configuration reference¶

The Data Migration Workflow configuration file is a JSON object. The following sections describe its structure and properties.

Note

Names that require quoting (or brackets) must be manually quoted as they would normally be in JSON. For example: tableName: "\"MyCaseSensitiveTable\"".

Top-level object¶

Property	Type	Description
`tables`	`TableConfiguration[]`	An array of table-specific configurations defining which tables to migrate and how.
`defaultTableConfiguration`	`TableConfiguration`	Shared settings that are inherited by all tables in the `tables` array. Table-specific values override these defaults.
`affinity`	`String`	An affinity group string. Ensures that only Orchestrator and Worker instances with a matching affinity process this workflow.

When defaultTableConfiguration is present, each object in tables is merged with those defaults: shared fields apply to every table unless the same field is set again on a specific table entry, in which case the table-level value wins.

TableConfiguration model¶

Defines the settings for migrating a single table.

Property	Type	Description
`source`	`SourceTargetIdentifier`	Identifies the source table.
`target`	`SourceTargetIdentifier`	Identifies the target table in Snowflake.
`columnNamesToPartitionBy`	`String[]`	A list of columns used to partition data during the extraction phase.
`extraction`	`ExtractionStrategy`	Settings to configure how data is extracted from the source database.
`synchronization`	`SynchronizationStrategy`	Settings for incremental synchronization.
`columnTypeMappings`	`ColumnTypeMapping`	Type conversions applied during migration.
`columnNameMappings`	`ColumnNameMapping`	Column renaming mappings.
`primaryKeyColumns`	`String[]`	Primary key columns for the source table. Required for `trackModifications` under the `watermark` synchronization strategy.
`partitionSize`	`PartitionSize`	Configures the target size of each partition during extraction. Defaults to `"auto"`. See Partition size (`partitionSize`).
`whereClauseCriteria`	`String`	An SQL-like filter to select a subset of rows for migration (for example, `"is_deleted = 0"`).

SourceTargetIdentifier model¶

A nested object used within TableConfiguration to specify a database object. For source, use only the properties in the following table. For target, you can also set the optional properties in Additional target properties.

Property	Type	Description
`databaseName`	`String`	The name of the source or target database.
`schemaName`	`String`	The name of the schema containing the table.
`tableName`	`String`	The name of the table to be migrated.

Additional target properties¶

The following optional fields apply only to the target object (not to source).

Property	Type	Required	Description
`tableType`	`String`		`"native"` for a standard Snowflake table (default if omitted) or `"iceberg"` for an Apache Iceberg™ table.
`icebergConfig`	`Object`	For Iceberg targets	Required when `tableType` is `"iceberg"`. Merged with `defaultTableConfiguration.target.icebergConfig` if present; table-level keys override defaults. See Iceberg configuration (`target.icebergConfig`).

Iceberg configuration (`target.icebergConfig`)¶

Used when target.tableType is "iceberg". Account setup (external volumes, catalog integrations, stages, and privileges) follows Snowflake’s Iceberg documentation; see Apache Iceberg™ tables, Create an Iceberg table, and Configure an external volume.

Property	Type	Required	Description
`catalog`	`String`		Default `SNOWFLAKE` for Snowflake-managed Iceberg. Use a catalog integration name for externally cataloged tables (for example AWS Glue).
`externalVolume`	`String`	For `catalog` `SNOWFLAKE`	Snowflake external volume for Iceberg data and metadata.
`baseLocationPrefix`	`String`		Optional path prefix for `BASE_LOCATION` when using Snowflake-managed Iceberg (`catalog` `SNOWFLAKE`).
`catalogTableName`	`String`	For external `catalog`	Fully qualified name of the table in the external catalog (for example `glue_db.my_table`).
`catalogSync`	`String`		Optional catalog integration used to sync Snowflake-managed metadata back to an external catalog.
`sourceDataStage`	`String`		Stage path starting with `@` pointing at existing Parquet files; used for `copy_files`-style loads with Snowflake-managed Iceberg.
`migrationStrategy`	`String`		One of `catalog_link`, `convert_to_managed`, or `copy_files`. When omitted, the Orchestrator infers a strategy from `catalog` and `sourceDataStage`.

Partition size (`partitionSize`)¶

Controls how large each partition should be during extraction. You can use a string or an object.

Form	Description
`"auto"` (default)	The system chooses partition sizes from the source platform, extraction strategy, and table size. Auto mode uses larger partitions for Redshift UNLOAD (S3-friendly large files) and smaller partitions for ODBC-based extraction (SQL Server, Redshift regular), where data flows through the Worker. For very large tables (100+ GB), the maximum number of partitions can increase to allow more parallelism.
`{ "targetSizeMb": N }`	Each partition targets about `N` megabytes of data.
`{ "maxRowsPerPartition": N }`	Each partition contains at most `N` rows, regardless of data size.

When you use the object form, specify only one of targetSizeMb or maxRowsPerPartition.

auto (default):

"partitionSize": "auto"

Target size in MB:

"partitionSize": { "targetSizeMb": 2048 }

Maximum rows per partition:

"partitionSize": { "maxRowsPerPartition": 500000 }

ColumnTypeMapping model¶

A nested object used within TableConfiguration to specify type mappings for a column.

Property	Type	Required	Description
`sourceType`	`String`		The name of the type in the source system.
`targetType`	`String`		The name of the target type in Snowflake.

ColumnNameMapping model¶

A nested object used within TableConfiguration to specify column name mappings.

Property	Type	Required	Description
`sourceName`	`String`		The name of the column in the source system.
`targetName`	`String`		The name of the target column in Snowflake.

ExtractionStrategy model¶

Configures the method for data extraction.

Field	Type	Required	Description
`strategy`	`String` (`"regular"`, `"unload"`, `"write_nos"`, `"tpt"`)		`regular` (default): Worker pulls data over ODBC/BCP/Npgsql. `unload` (Redshift only): UNLOAD to S3. `write_nos` (Teradata only): Teradata WRITE_NOS to cloud storage. `tpt` (Teradata only): TPT EXPORT via `tbuild`.
`externalStage`	`String`	UNLOAD or WRITE_NOS	Snowflake external stage for `unload` or `write_nos`.

Extraction: regular (Default)

"extraction": {
  "strategy": "regular"
}

Extraction: unload (Redshift only)

"extraction": {
  "strategy": "unload",
  "externalStage": "MY_DB.MY_SCHEMA.S3_EXTERNAL_STAGE"
}

Extraction: WRITE_NOS (Teradata only)

"extraction": {
  "strategy": "write_nos",
  "externalStage": "MY_DB.MY_SCHEMA.TD_WRITE_NOS_STAGE"
}

Extraction: TPT (Teradata only)

"extraction": {
  "strategy": "tpt"
}

For workflow YAML examples of each strategy, see Extraction strategies.

SynchronizationStrategy model¶

Configures the approach for incremental data syncing on subsequent runs.

Field	Type	Required	Description
`strategy`	`String` (`"none"`, `"checksum"`, `"watermark"`)		The synchronization method.
`watermarkColumn`	`String`	`watermark` only	Column name to track. Must be monotonically increasing.
`trackModifications`	`Boolean`		If `true`, the system uses the primary key to identify and deduplicate modified rows. Requires `primaryKeyColumns` to be specified in `TableConfiguration`.

Strategy: none (Default)

Performs a full extraction of all partition data on every run. No synchronization metadata is stored.

"synchronization": {
  "strategy": "none"
}

Use when data is small, changes are unpredictable, or guaranteed consistency is needed.

Strategy: checksum

Computes a hash of all column values for each partition on the source. Only changed partitions are cleared and re-extracted in the target.

"synchronization": {
  "strategy": "checksum"
}

Use when you need to detect any change in a partition but lack a reliable monotonic column (for example, dimension tables). Note that this requires a checksum computation on the source for every partition on every run.

Strategy: watermark

Tracks a monotonic column (timestamp, ID, or version) to sync only rows where the watermark value is greater than the maximum observed in the previous sync.

"synchronization": {
  "strategy": "watermark",
  "watermarkColumn": "UPDATED_AT"
}

Use when your table has a reliable monotonic column that increases on insert/update (for example, fact tables or event logs).

Note

Watermark alone can’t currently track deletions. Support for this will be added in the future.

Example workflow: Redshift UNLOAD with Iceberg targets¶

The following workflow excerpt combines Redshift UNLOAD with Iceberg table targets, including Snowflake-managed Iceberg defaults and a per-table external catalog override:

{
  "defaultTableConfiguration": {
    "source": {
      "schemaName": "public",
      "databaseName": "analytics_db"
    },
    "target": {
      "schemaName": "public",
      "databaseName": "TARGET_DB",
      "tableType": "iceberg",
      "icebergConfig": {
        "catalog": "SNOWFLAKE",
        "externalVolume": "my_iceberg_ext_vol",
        "baseLocationPrefix": "migrations/redshift",
        "sourceDataStage": "@TARGET_DB.PUBLIC.ICEBERG_SOURCE_STAGE"
      }
    },
    "extraction": {
      "strategy": "unload",
      "externalStage": "TARGET_DB.PUBLIC.S3_EXTERNAL_STAGE"
    },
    "partitionSize": "auto"
  },
  "tables": [
    {
      "source": { "tableName": "customers" },
      "target": { "tableName": "customers" },
      "columnNamesToPartitionBy": ["customer_id"]
    },
    {
      "source": { "tableName": "events" },
      "target": {
        "tableName": "events",
        "tableType": "iceberg",
        "icebergConfig": {
          "catalog": "my_glue_catalog_integration",
          "externalVolume": "my_iceberg_ext_vol",
          "catalogTableName": "glue_db.events"
        }
      },
      "columnNamesToPartitionBy": ["event_id"]
    }
  ]
}

Affinity¶

By specifying an affinity for a Workflow, you are indicating that you want specific workers to help with the execution of that Workflow. This can be particularly useful in cases in which you want to have some workers extract from one source and other workers extract from a different source. The rules for matching workers with tasks are:

A task without affinity will be picked up by any worker, independently of the worker’s affinity.
A worker without affinity will pick up any task, independently of the task’s affinity.
A task with a given affinity will not be picked up by a worker with different affinity.

Affinity only needs to be a String; its format is defined by the user. For example, all of these are valid: sql-server, DEV_SERVER, my_custom_server, ::blue::.

Worker configuration¶

This file configures the behavior and connections for the Data Exchange Worker. For cloud workflows, set selected_task_source to "snowflake_stored_procedure" and provide [task_source.snowflake_stored_procedure] with connection_name for task-queue stored procedures (separate from [connections.target.snowflake_connection_name] for data sessions).

Section	Property	Type	Description
Top level	`selected_task_source`	`String`	Required. Must be `"snowflake_stored_procedure"`.
`[application]`	`max_parallel_tasks`	`Integer`	The maximum number of tasks the Worker will process in parallel using threads.
`[application]`	`task_fetch_interval`	`Integer`	The interval in seconds between attempts to fetch new tasks from the Orchestrator.
`[application]`	`affinity`	`String`	A user-defined affinity for the worker.
`[application]`	`snowflake_database_for_metadata`	`String`	Optional. Database where the Orchestrator created the task metadata objects (default `SNOWCONVERT_AI`). Must match the Orchestrator’s `CUSTOM_SNOWFLAKE_DATABASE_FOR_METADATA` if you override it.
`[application]`	`snowflake_schema_for_data_migration_metadata`	`String`	Optional. Schema for data migration task metadata (default `DATA_MIGRATION`). Must match the Orchestrator’s `CUSTOM_SNOWFLAKE_SCHEMA_FOR_DATA_MIGRATION_METADATA` if you override it.
`[connections.source.*]`	N/A	`Object`	Configuration for source systems. Workers typically require an ODBC driver to connect to the source system.
`[connections.target.snowflake_connection_name]`	`connection_name`	`String`	The name of the connection entry in the `~/.snowflake/config.toml` file to use.

An example configuration file looks like this:

selected_task_source = "snowflake_stored_procedure"

[task_source.snowflake_stored_procedure]
connection_name = "connection_name"

[application]
max_parallel_tasks = 4
task_fetch_interval = 30
# Optional: only if the Orchestrator uses CUSTOM_SNOWFLAKE_* overrides for metadata location
# snowflake_database_for_metadata = "SNOWCONVERT_AI"
# snowflake_schema_for_data_migration_metadata = "DATA_MIGRATION"

# SQL Server connection (standard authentication)
[connections.source.sqlserver]
username = "username"
password = "password"
database = "database_name"
host = "127.0.0.1"
port = 1433

# Amazon Redshift connection (IAM authentication for provisioned cluster)
[connections.source.redshift]
username = "demo-user"
database = "snowconvert_demo"
auth_method = "iam-provisioned-cluster"
cluster_id = "migrations-aws"
region = "us-west-2"
access_key_id = "your-access-key-id"
secret_access_key = "your-secret-access-key"

# Amazon Redshift connection (standard authentication)
# [connections.source.redshift]
# username = "myuser"
# password = "mypassword"
# database = "mydatabase"
# host = "my-cluster.abcdef123456.us-west-2.redshift.amazonaws.com"
# port = 5439
# auth_method = "standard"

# Snowflake target connection
[connections.target.snowflake_connection_name]
connection_name = "connection_name"

Note

Only one source connection is needed.

Source connection configuration examples¶

The following examples show the three main source connection configurations:

1. SQL Server (Standard Authentication)

[connections.source.sqlserver]
username = "username"
password = "password"
database = "database_name"
host = "127.0.0.1"
port = 1433

2. Amazon Redshift (IAM Authentication)

[connections.source.redshift-iam]
username = "demo-user"
database = "demo_db"
auth_method = "iam-provisioned-cluster"
cluster_id = "my-aws-cluster"
region = "us-west-2"
access_key_id = "your-access-key-id"
secret_access_key = "your-secret-access-key"
# Optional fields for UNLOAD strategy
# unload_s3_bucket = "my-migrations-bucket"
# unload_iam_role_arn = "arn:aws:iam::123456789012:role/MyRole"

3. Amazon Redshift (Standard Authentication)

[connections.source.redshift-standard]
username = "myuser"
password = "mypassword"
database = "mydatabase"
host = "my-cluster.abcdef123456.us-west-2.redshift.amazonaws.com"
port = 5439
auth_method = "standard"
# Optional fields for UNLOAD strategy
# unload_s3_bucket = "my-migrations-bucket"
# unload_iam_role_arn = "arn:aws:iam::123456789012:role/MyRole"

Platform-specific details¶

Migrate Amazon Redshift data¶

In order to use the UNLOAD strategy for extraction of Amazon Redshift data, it will be necessary to set up multiple resources. This strategy enables the data to flow directly from Amazon Redshift into an S3 bucket and for Snowflake to execute COPY INTO operations directly from there (by creating an external stage that is mapped to that S3 bucket). This is faster than having the workers download the data and then upload it to a Snowflake stage.

Create a stage integration to S3¶

If you don’t have an existing stage configured, you need to create a Snowflake external stage that integrates with your S3 bucket. You can create the stage using the following SQL command in Snowflake:

CREATE OR REPLACE STAGE <stage_name>
  URL = 's3china://<your_bucket_name>/<path>/'
  STORAGE_INTEGRATION = <your_storage_integration>
  FILE_FORMAT = (TYPE = 'PARQUET');

Alternatively, if you’re using AWS credentials directly:

CREATE OR REPLACE STAGE <stage_name>
  URL = 's3china://<your_bucket_name>/<path>/'
  CREDENTIALS = (AWS_KEY_ID = '<your_aws_key_id>' AWS_SECRET_KEY = '<your_aws_secret_key>')
  FILE_FORMAT = (TYPE = 'PARQUET');

Replace the placeholders:

<stage_name>: Your desired stage name (for example, my_redshift_stage)
<your_bucket_name>: Your S3 bucket name
<path>: Optional path within the bucket
<your_storage_integration>: Your Snowflake storage integration name (recommended method)
<your_aws_key_id> and <your_aws_secret_key>: Your AWS IAM user credentials (if not using storage integration)

Note

Using a Snowflake storage integration is the recommended approach for better security and credential management. For more information about creating storage integrations, see the Snowflake documentation.

Verify stage integration¶

After setting up your stage, verify that the integration is working correctly before proceeding with data migration. You can verify the stage integration by running the following command in Snowflake:

LIST @<stage_name>;

This command should execute successfully without errors. If the stage is newly created and empty, it may return no results, which is expected.

To perform a more thorough verification, you can test the stage by uploading a test file:

PUT file:///<local_test_file_path> @<stage_name>;
LIST @<stage_name>;

If the commands execute successfully and you can see the uploaded file, your stage integration is configured correctly.

Note

You can also verify stage permissions by checking the stage description:

DESCRIBE STAGE <stage_name>;

This displays the stage configuration, including the URL, credentials type, and file format settings.

Migrate Teradata data¶

Teradata supports regular (default), tpt, and write_nos extraction strategies. Each strategy has different Worker TOML and workflow YAML requirements.

Strategy	Summary
`regular`	Partitioned SQL through the Worker (ODBC or auto-TPT) to the internal Snowflake stage.
`tpt`	Explicit TPT EXPORT; requires `tbuild` on the Worker host.
`write_nos`	Teradata writes Parquet to cloud storage; requires *`write_nos_`** TOML fields and an external stage.

For YAML examples, Worker connection parameters, WRITE_NOS credential modes, sync strategies, type mappings, and validation notes, see Teradata: Complete configuration reference and Extraction strategies on the technical documentation page.

SnowConvert AI: Data migration [Preview]¶

Architecture overview¶

Deployment options¶

Prerequisites¶

Setup¶

Installation¶

Usage¶

Starting the Orchestrator¶

Starting the Workers¶

Creating a Data Migration Workflow¶

Monitoring a Data Migration Workflow¶

Redshift UNLOAD¶

Incremental synchronization¶

Considerations and recommendations¶

Connecting to Snowflake with a PAT¶

Running Orchestrator and Workers on SPCS¶

Initial testing¶

Managing Workers¶

Extraction strategies¶

regular (default)¶

Redshift unload¶

Teradata tpt¶

Teradata write_nos¶

Configuration reference¶

Workflow configuration reference¶

Top-level object¶

TableConfiguration model¶

SourceTargetIdentifier model¶

Additional target properties¶

Iceberg configuration (target.icebergConfig)¶

Partition size (partitionSize)¶

ColumnTypeMapping model¶

ColumnNameMapping model¶

ExtractionStrategy model¶

SynchronizationStrategy model¶

Example workflow: Redshift UNLOAD with Iceberg targets¶

Affinity¶

Worker configuration¶

Source connection configuration examples¶

Platform-specific details¶

Migrate Amazon Redshift data¶

Create a stage integration to S3¶

Verify stage integration¶

Migrate Teradata data¶

Iceberg configuration (`target.icebergConfig`)¶

Partition size (`partitionSize`)¶