SnowConvert AI: Data migration [Preview]¶

Data Migration uses the data-migration skill from the migration-plugin. For more details, contact snowconvert-support@snowflake.com.

The Data Migration feature of SnowConvert provides a fault-tolerant, scalable solution for moving data from external sources into Snowflake. This feature is specifically designed for cases where you are moving data from a system you plan to decommission. For replication purposes, other solutions are available that might better fit your use case.

Architecture overview¶

The Data Migration feature uses two main components: an Orchestrator and one or more Workers.

The Orchestrator connects to the Snowflake account. It requires privileges to create and operate the SNOWCONVERT_AI database, where metadata is stored.
One or more Workers connect to both the source system and the Snowflake account. Workers read data from the source system and upload it to a Snowflake stage. Workers pick up tasks created by the Orchestrator and process them in parallel.
Files uploaded to the Snowflake stage are copied into the target tables using a COPY INTO statement submitted and monitored by the Orchestrator.

Deployment options¶

The Orchestrator and Workers can be deployed in multiple ways:

Both on Snowpark Container Services (in the Snowflake account).
Both in the customer’s environment, including custom hardware, virtual machines, or containers.
Orchestrator on Snowpark Container Services and Workers in the customer’s environment, or the other way around.

The following requirements apply to the environment where the Orchestrator and Workers run:

Use the Snowflake Migration Agent with Cortex Code, or the SnowConvert AI CLI (scai) from an initialized SnowConvert AI project. The skill and CLI install and manage the worker and orchestrator runtimes and their dependencies when needed.
Workers typically require an ODBC driver to connect to the source system (unless the source supports a pure-Python driver, such as Teradata with teradatasql).
The Orchestrator must be able to connect to the Snowflake account using a role that has privileges to create the SNOWCONVERT_AI database and create schemas and objects within it.

Prerequisites¶

Before you use Data Migration, make sure the following are in place:

SnowConvert AI project and skill: Cortex Code with the Snowflake Migration Agent enabled, or the SnowConvert AI CLI (scai) installed and a migration project on disk. The skill installs scai, connection tooling, and worker dependencies on first use—see the skill documentation for prerequisites.
Snowflake access: Connections for the Orchestrator and Workers in your Snowflake config.toml or connections.toml, using a role that can create the SNOWCONVERT_AI database and its objects. The first time the Orchestrator starts, it creates that database and related resources if they do not exist yet. On later runs, use a role that can administer SNOWCONVERT_AI and its objects; sticking with the same role you used for the initial creation is the simplest way to avoid permission issues.
Source connectivity: For typical source databases, an ODBC driver on the machine where Workers run. Programmatic Access Tokens (PATs) are recommended for Snowflake connections; see Connecting to Snowflake with a PAT.
Hybrid Tables: Hybrid Tables must be enabled and available in your Snowflake account and region for this feature. Review Hybrid tables and Hybrid tables limitations so you understand relevant platform requirements.
Snowpark Container Services (optional): If you deploy the Orchestrator or Workers on Snowflake compute, your account needs SPCS. See the Snowpark Container Services overview. Running both components outside Snowflake does not require SPCS.

Source-platform specifics¶

The Worker connects to your source using the TOML [connections.source.*] sections described in Worker configuration. Use the tabs below for connectivity, extraction style, and target options that are unique to each platform.

For Teradata only, the migration workflow’s extraction.strategy chooses among regular (default: partitioned SQL through the Worker to Parquet, then the internal migration stage), write_nos (Teradata WRITE_NOS writes Parquet to S3, Azure Blob, or GCS; Snowflake loads via an external stage you align to that bucket), and tpt (the Worker runs TPT EXPORT with tbuild, converts delimited output to Parquet, then uses the same internal stage path as regular).

Driver: Install a supported Microsoft ODBC Driver for SQL Server on the Worker host (the Worker auto-detects a suitable driver when none is set). You can pin a driver with odbc_driver in TOML.
Authentication: Use SQL authentication (username / password) or Windows integrated security (use_windows_auth = true on Windows Workers).
Encryption: Optional encrypt and trust_server_certificate keys follow ODBC Driver 17 vs 18 defaults; set explicitly for hardened or lab environments.
Bulk copy: Optional use_bcp = true enables BCP-style bulk extraction when supported for faster pulls from SQL Server.
Extraction path: Data is read through the Worker using regular extraction (ODBC/BCP). Size partitions with partitionSize and columnNamesToPartitionBy in the workflow YAML.

Connectivity: Use standard authentication (auth_method = "standard", host, port 5439) or IAM (auth_method = "iam-provisioned-cluster" with cluster_id, region, and AWS keys) in Worker TOML. For UNLOAD, configure optional unload_s3_bucket, unload_s3_prefix, and unload_iam_role_arn so Redshift can write Parquet directly to S3; align that bucket with your Snowflake external stage used in the workflow extraction.externalStage.
Extraction strategy: In workflow JSON/YAML, set "strategy": "unload" with externalStage for large Redshift-sourced tables (see Redshift UNLOAD). Use "regular" when the Worker pulls result sets over ODBC.
Iceberg targets on Snowflake: Redshift as a source does not prevent Apache Iceberg™ tables on Snowflake as targets. Set target.tableType to "iceberg" and supply target.icebergConfig (catalog, externalVolume, optional baseLocationPrefix, external catalog fields, or copy_files / sourceDataStage as applicable). See Iceberg configuration (target.icebergConfig) and Snowflake’s Iceberg tables documentation for account setup (external volume, catalog integration, privileges).
Throughput: UNLOAD + Iceberg copy_files or managed Iceberg loads often suit very large fact tables; use partitionSize "auto" or larger MB targets when UNLOAD produces large Parquet objects.

Teradata supports three orchestrator extraction strategies. Pick one per table (or inherit from defaultTableConfiguration.extraction).

Strategy	Workflow `extraction.externalStage`	Snowflake stage used for load	Worker / site requirements
`regular` (default)	Omit	Internal migration stage (`TASK_RESULTS` pattern)	`teradatasql` or Teradata ODBC; optional `odbc_driver`, `dbc_name`, `authentication`
`write_nos`	Required — must point at the same bucket/container Teradata writes to	External stage	`write_nos_location_scheme` (`/s3/`, `/az/`, `/gs/`), `write_nos_location_host`, `write_nos_location_container`, plus one credential mode (see below). Teradata NOS / WRITE_NOS enabled.
`tpt`	Omit (do not set for `tpt`)	Internal migration stage (same as `regular`)	Teradata Tools and Utilities: `tbuild` on `PATH` or `TPT_TBUILD_EXECUTABLE`. Optional `tpt_delimiter` (default ASCII file separator U+001C), `tpt_max_sessions` (default `4`). TPT scratch output/logs are created under the process temp directory by the Worker; do not rely on legacy `tpt_output_directory` / `tpt_log_directory` keys (they are ignored with a warning).

regular — partitioned SQL extraction

The Worker runs partitioned SELECT streams. teradatasql is used when available; otherwise pyodbc with a registered Teradata ODBC driver. Pin the driver with odbc_driver (exact name from odbcinst -q -d on Linux/macOS or ODBC Administrator on Windows). A legacy driver_name field for the ODBC string may still be accepted; prefer odbc_driver. Use dbc_name when your COP / TDPID alias differs from host. Default gateway port 1025 unless your site uses another. Optional authentication (for example TD2, LDAP, KRB5).

write_nos — server-side export to object storage

The Worker executes Teradata WRITE_NOS so each partition lands in S3, Azure Blob, or GCS as Parquet. STOREDAS, compression, max object size, and overwrite behavior are fixed in the product path (not configured via Worker TOML). In [connections.source.teradata], set write_nos_location_scheme, write_nos_location_host, write_nos_location_container, and exactly one of: write_nos_function_mapping (recommended), write_nos_authorization_name, or write_nos_access_id plus write_nos_access_key. Your migration skill or worker packaging may ship a commented Teradata template with these keys.

tpt — Teradata Parallel Transporter on the Worker

The Worker generates a TPT EXPORT job, runs tbuild, converts delimited output to Parquet, then uploads like regular. Column metadata for the job comes from a WHERE 1=0 probe over the same Teradata connection (teradatasql or ODBC). tpt_delimiter must stay single-byte if you override it (required for the CSV-to-Parquet step).

Partitioning and load

Use columnNamesToPartitionBy on skew-aware keys and tune partitionSize for large tables. With partitionSize: "auto", the orchestrator applies a Teradata-specific profile for regular (smaller partitions, friendlier to Worker memory). write_nos and tpt currently use the generic auto fallback profile (moderate MB targets) until dedicated profiles exist—set an explicit partitionSize object if you need tighter control. Teradata systems are often CPU and BYNET sensitive; keep Worker parallelism moderate and coordinate with your DBA.

ODBC deployment: Install Oracle Instant Client and a matching ODBC driver on the Worker host. Set oracle_connection_mode to basic (EZConnect with database as service name), tns_alias (with tns_name and optional tns_admin), or connect_descriptor for a full DESCRIPTION block.
Driver selection: Set odbc_driver to the string returned by pyodbc.drivers() for your installation, or use auto_detect_driver when appropriate.
Security: Optional wallet_directory / wallet_password support wallet-based TLS; follow your driver’s attribute names for any extra ODBC keys via extra_options in TOML when needed.
Workflow: Declare Oracle sources in your migration workflow the same way as other dialects (source.databaseName / schemaName / tableName, partitions, whereClauseCriteria, native or Iceberg targets). When the migration skill or scai submits the workflow, the source platform must be oracle so tasks are routed to the Oracle extraction path.

Setup¶

Installation¶

Use the Snowflake Migration Agent in Cortex Code (recommended) so the agent can drive setup, or install the SnowConvert AI CLI (scai) and work from a SnowConvert AI project directory. The skill registers the migration plugin and ensures scai and dependencies are available.

Installing the CLI (via the skill or your platform installer) does not require Snowpark Container Services (SPCS). SPCS is required only when you deploy the Orchestrator (or Workers) on Snowflake compute using container services.

Usage¶

To migrate data using this solution, complete the following high-level steps:

Start the Orchestrator.
Start the Workers.
Create a Data Migration Workflow.
Monitor the Data Migration Workflow until completion.

A Data Migration Workflow is essentially an action or goal for the system to complete, such as migrating a specific set of tables with a given configuration. You can submit multiple workflows simultaneously and monitor them. The Orchestrator breaks Data Migration Workflows into smaller tasks, which typically involves splitting a table into partitions before extracting its data and loading it to Snowflake.

You drive orchestration, workers, and workflows through the scai data … commands below. In Cortex Code, the data-migration skill wraps the same flows: ask the agent to run cloud data migration steps, or follow the Snowflake Migration Agent guide for guided prompts.

Using SCAI CLI¶

Cloud data commands are grouped under scai data using nested verbs (migrate, worker, orchestrator, …). The following deprecated top-level names remain available and print a short warning pointing to the new path; prefer the nested commands for new scripts and documentation.

Deprecated (still runs)	Preferred command
`scai data setup-cloud-migration`	`scai data orchestrator setup`
`scai data start-cloud-worker`	`scai data worker start`
`scai data generate-cloud-migration-config`	`scai data migrate generate-config`
`scai data cloud-migrate`	`scai data migrate create-workflow`
`scai data cloud-migrate-status`	`scai data migrate status`
`scai data cloud-list-migrations`	`scai data migrate list`

Global options such as --json (structured stdout for automation), --log-debug, --no-auto-update, and connection overrides (-c / --connection, --warehouse, --role) apply wherever the CLI exposes them; run scai data <branch> <command> --help for the exact surface.

Starting the Orchestrator on SPCS¶

Use this command to create or resume the Orchestrator service on Snowpark Container Services. Stop (or suspend) the service when it is no longer needed; the CLI reminds you to run scai data orchestrator stop after workflows that start the service.

# Default Snowflake connection from your environment / project:
scai data orchestrator setup --compute-pool MY_COMPUTE_POOL

# Named Snowflake connection:
scai data orchestrator setup --compute-pool MY_COMPUTE_POOL --connection MY_SNOWFLAKE_CONNECTION

# Optional custom container image (advanced):
# scai data orchestrator setup --compute-pool MY_COMPUTE_POOL --custom-image /DB/SCHEMA/REPO/IMAGE:tag

scai data orchestrator setup --help

# Suspend the service (default):
scai data orchestrator stop

# Drop the service instead of suspending:
scai data orchestrator stop --drop

scai data orchestrator stop --help

Starting the Workers¶

Start a Worker with a TOML configuration file. See Worker configuration. You can emit a starter template (with placeholders or pre-filled source details when a project is present) using --auto-config.

# Start with a configuration file (positional argument):
scai data worker start my-worker-config.toml

# Same, with an explicit Snowflake connection profile:
scai data worker start my-worker-config.toml --connection my-snowflake

# Generate a template only (default path ~/.snowflake/scai/dew_configuration.toml, or pass a path):
scai data worker start --auto-config
scai data worker start --auto-config /path/to/my-worker-config.toml

scai data worker start --help

scai data worker start installs and runs the Data Exchange Worker in the foreground (the worker process managed by the CLI). Foreground worker start does not support --json; use --auto-config with --json if you need JSON output for template generation only.

Generating a workflow configuration file¶

From an SCAI project directory, generate a YAML workflow configuration (default file name data-migration-config.yaml in the current directory). You can author YAML by hand as long as it matches the workflow object model described in Workflow configuration reference.

scai data migrate generate-config

scai data migrate generate-config --where "source.schema = 'public'"

scai data migrate generate-config -o my-config.yaml

# Optional workflow affinity (must match Workers that should pick up the workflow):
scai data migrate generate-config --affinity my-team

scai data migrate generate-config --help

Creating a Data Migration Workflow (and optional local Orchestrator / Worker)¶

scai data migrate create-workflow submits a workflow to Snowflake. Options combine Snowflake connection overrides, optional SPCS --start-service, and optional local Orchestrator / Worker processes.

Option	Description
`--config <path>`	Workflow YAML file (`.yaml` or `.yml`). If omitted, defaults to `data-migration-config.yaml` in the current directory. The SCAI CLI does not accept JSON workflow files on this command.
`-c` / `--connection`	Snowflake connection name from `config.toml` / `connections.toml`.
`--warehouse`, `--role`	Optional session overrides.
`--watch`	Wait until the workflow reaches a terminal state.
`--start-service`	Start or resume the SPCS data service before creating the workflow. Requires `-p` / `--compute-pool`.
`-p` / `--compute-pool`	Compute pool name (used with `--start-service`).
`--custom-image`	Optional full image path instead of the default resolved image.
`--start-worker`	Start a local Data Exchange Worker (implies `--watch`). Uses the project’s default source dialect and resolved Snowflake connection.
`--start-orchestrator`	Start a local Orchestrator (implies `--watch`). Cannot be combined with `--start-service`.

# Create workflow and return immediately:
scai data migrate create-workflow --config my-data-migration-config.yaml --connection my-snowflake

# Create and wait until completion:
scai data migrate create-workflow --config my-data-migration-config.yaml --connection my-snowflake --watch

# Start SPCS service, then create workflow:
scai data migrate create-workflow --config my-data-migration-config.yaml --start-service \
  -p MY_COMPUTE_POOL --connection my-snowflake

scai data migrate create-workflow --help

Local “all-in-one” run from a project (`migrate start`)¶

scai data migrate start is a narrow wrapper intended for local runs from an SCAI project: if the config file is missing and you did not pass --no-auto-config, it generates a default workflow config, then delegates to create-workflow with --start-worker, --start-orchestrator, and --watch forced on (no --start-service / compute pool on this path). Use --no-pip-upgrade to skip upgrading bundled worker/orchestrator dependencies when the CLI prepares the local environment.

scai data migrate start
scai data migrate start --config my-data-migration-config.yaml --connection my-snowflake
scai data migrate start --help

Monitoring and listing workflows¶

# One-shot status:
scai data migrate status DATA_MIGRATION_WORKFLOW_xx_yy_zz

# Watch until completion:
scai data migrate status DATA_MIGRATION_WORKFLOW_xx_yy_zz --watch

scai data migrate status --help

# Recent workflows (default limit 20):
scai data migrate list

scai data migrate list --status running --connection my-snowflake
scai data migrate list --limit 50

scai data migrate list --help

The same observability features in Snowflake—including the DATA_MIGRATION_DASHBOARD Streamlit dashboard—apply when you use scai or the migration skill.

Workflow lifecycle and Snowflake observability¶

Each workflow goes through different stages throughout its lifecycle:

Pending: No tasks have been created for this workflow yet.
Executing: Tasks have been created for this workflow and there are still tasks that have not reached a terminal state (COMPLETED or FAILED).
Completed: All tasks have reached a terminal state (COMPLETED or FAILED).

In the SNOWCONVERT_AI.DATA_MIGRATION schema, the following tables and views can be queried to understand the status of one or more workflows:

WORKFLOW: Contains one row per workflow, including start/end time, status, and configuration.
TABLE_PROGRESS_WITH_EXAMPLE_ERROR: Contains one row per table being migrated as part of a workflow. Includes information about how many partitions are in each stage (extraction, loading, completed, or failed), as well as related errors. Can be filtered by WORKFLOW_ID.
DATA_MIGRATION_ERROR: For each partition of a table being migrated, contains the first known error that affected the migration of that partition. Can be filtered by WORKFLOW_ID.

In the same schema, the DATA_MIGRATION_DASHBOARD Streamlit dashboard can be used to monitor workflows. This dashboard presents data from those tables and views.

Workflow configuration for scai data migrate create-workflow uses YAML (.yaml / .yml) aligned with your project; supported source platforms in that configuration include SQL Server, Amazon Redshift, Oracle, and Teradata. The full object model is documented in Workflow configuration reference.

Redshift UNLOAD¶

For Redshift, it is recommended to use the unload extraction strategy. This works as follows:

Large query results are written directly to an S3 bucket instead of being downloaded to the machine running the Worker.
On the Snowflake side, an external stage is configured to reference the corresponding S3 bucket, so that COPY INTO statements can be executed directly from that stage.

For configuration details, see ExtractionStrategy model.

Incremental synchronization¶

You can migrate tables and then re-migrate them in the future, moving only the data that has changed. For more details, see SynchronizationStrategy model.

Considerations and recommendations¶

Connecting to Snowflake with a PAT¶

It is recommended to use Programmatic Access Tokens (PAT) for connections used by the Orchestrator and Workers. This ensures there is no need to constantly authenticate through the browser or with an authenticator app. You will need to establish a Network Policy or temporarily bypass the requirement for a Network Policy (this can be done from Snowsight).

Running Orchestrator and Workers on SPCS¶

To leverage Snowflake compute for these tasks:

Prepare Docker images that bundle the Orchestrator and Worker runtimes and configuration expected by your SnowConvert AI release (see Snowflake guidance for SPCS service images).
Push those Docker images to an Image Repository in Snowflake.
Execute the Orchestrator and/or Worker images using Snowpark Container Services.

Keep the following in mind:

It is recommended to execute them as Services, not Jobs.
It is possible to run only one component (Orchestrator or Workers) in SPCS and the other on a different platform.
It is a good practice to monitor the SPCS service and suspend it when it is not being used.
Depending on the network configuration of the source system, you may need to configure an External Access Integration so that these services can connect to your source system.

Initial testing¶

It is recommended to deploy the DDL for the tables you want to migrate before starting data migration. This ensures the target type matches the behavior you want to see in the table and its related views and procedures. You can convert DDL from your source dialect into Snowflake SQL using the code conversion capabilities of SnowConvert AI and/or Cortex Code.

Note

If you don’t deploy the DDL for the tables before starting data migration, the types will be inferred, which may not be as accurate as required.

For an early test run, use a separate workflow configuration whose tables array lists only the table or small set of tables you want to validate. On each of those entries, set whereClauseCriteria to an SQL-like predicate (as you would in a WHERE clause) so only a subset of rows is migrated, for example a bounded primary-key range or a narrow date range in the source dialect. You can also set a small partitionSize (for example maxRowsPerPartition) to keep partitions tiny during the test. After you confirm connectivity, performance, and results, create your full workflow: remove or relax whereClauseCriteria and use "auto" or your production partitionSize settings.

Managing Workers¶

The time it takes to complete a workflow depends on many variables, but the number of Workers (and threads per Worker) has the greatest impact, as it determines how many extraction tasks can be executed in parallel. Consider the following:

It is not necessary to run two Workers on the same machine. If you want more parallelism on a single machine, increase the thread count instead.
Network bandwidth greatly affects Worker speed and is shared between threads of a Worker.
Even with many Workers and threads running in parallel, the source system might not have enough resources to handle the load.
Keep a low Worker count to avoid overloading your source system.
Consider stopping some or all Workers when the source system is already under heavy load from other operations.

Configuration reference¶

Workflow configuration reference¶

The Data Migration Workflow configuration file is a JSON object. The following sections describe its structure and properties.

Note

Names that require quoting (or brackets) must be manually quoted as they would normally be in JSON. For example: tableName: "\"MyCaseSensitiveTable\"".

Top-level object¶

Property	Type	Description
`tables`	`TableConfiguration[]`	An array of table-specific configurations defining which tables to migrate and how.
`defaultTableConfiguration`	`TableConfiguration`	Shared settings that are inherited by all tables in the `tables` array. Table-specific values override these defaults.
`affinity`	`String`	An affinity group string. Ensures that only Orchestrator and Worker instances with a matching affinity process this workflow.

When defaultTableConfiguration is present, each object in tables is merged with those defaults: shared fields apply to every table unless the same field is set again on a specific table entry, in which case the table-level value wins.

TableConfiguration model¶

Defines the settings for migrating a single table.

Property	Type	Description
`source`	`SourceTargetIdentifier`	Identifies the source table.
`target`	`SourceTargetIdentifier`	Identifies the target table in Snowflake.
`columnNamesToPartitionBy`	`String[]`	A list of columns used to partition data during the extraction phase.
`extraction`	`ExtractionStrategy`	Settings to configure how data is extracted from the source database.
`synchronization`	`SynchronizationStrategy`	Settings for incremental synchronization.
`columnTypeMappings`	`ColumnTypeMapping`	Type conversions applied during migration.
`columnNameMappings`	`ColumnNameMapping`	Column renaming mappings.
`primaryKeyColumns`	`String[]`	Primary key columns for the source table. Required for `trackModifications` under the `watermark` synchronization strategy.
`partitionSize`	`PartitionSize`	Configures the target size of each partition during extraction. Defaults to `"auto"`. See Partition size (`partitionSize`).
`whereClauseCriteria`	`String`	An SQL-like filter to select a subset of rows for migration (for example, `"is_deleted = 0"`).

SourceTargetIdentifier model¶

A nested object used within TableConfiguration to specify a database object. For source, use only the properties in the following table. For target, you can also set the optional properties in Additional target properties.

Property	Type	Description
`databaseName`	`String`	The name of the source or target database.
`schemaName`	`String`	The name of the schema containing the table.
`tableName`	`String`	The name of the table to be migrated.

Additional target properties¶

The following optional fields apply only to the target object (not to source).

Property	Type	Required	Description
`tableType`	`String`		`"native"` for a standard Snowflake table (default if omitted) or `"iceberg"` for an Apache Iceberg™ table.
`icebergConfig`	`Object`	For Iceberg targets	Required when `tableType` is `"iceberg"`. Merged with `defaultTableConfiguration.target.icebergConfig` if present; table-level keys override defaults. See Iceberg configuration (`target.icebergConfig`).

Iceberg configuration (`target.icebergConfig`)¶

Used when target.tableType is "iceberg". Account setup (external volumes, catalog integrations, stages, and privileges) follows Snowflake’s Iceberg documentation; see Apache Iceberg™ tables, Create an Iceberg table, and Configure an external volume.

Property	Type	Required	Description
`catalog`	`String`		Default `SNOWFLAKE` for Snowflake-managed Iceberg. Use a catalog integration name for externally cataloged tables (for example AWS Glue).
`externalVolume`	`String`	For `catalog` `SNOWFLAKE`	Snowflake external volume for Iceberg data and metadata.
`baseLocationPrefix`	`String`		Optional path prefix for `BASE_LOCATION` when using Snowflake-managed Iceberg (`catalog` `SNOWFLAKE`).
`catalogTableName`	`String`	For external `catalog`	Fully qualified name of the table in the external catalog (for example `glue_db.my_table`).
`catalogSync`	`String`		Optional catalog integration used to sync Snowflake-managed metadata back to an external catalog.
`sourceDataStage`	`String`		Stage path starting with `@` pointing at existing Parquet files; used for `copy_files`-style loads with Snowflake-managed Iceberg.
`migrationStrategy`	`String`		One of `catalog_link`, `convert_to_managed`, or `copy_files`. When omitted, the Orchestrator infers a strategy from `catalog` and `sourceDataStage`.

Partition size (`partitionSize`)¶

Controls how large each partition should be during extraction. You can use a string or an object.

Form	Description
`"auto"` (default)	The system chooses partition sizes from the source platform, extraction strategy, and table size. Auto mode uses larger partitions for Redshift UNLOAD (S3-friendly large files) and a Teradata-specific smaller profile for Teradata `regular` (data streams through the Worker). SQL Server, Redshift `regular`, and similar ODBC-style paths use their own auto profiles. Teradata `write_nos` and `tpt` currently use the generic auto fallback (moderate MB targets) until dedicated profiles exist—set an explicit `partitionSize` if you need different bounds. For very large tables (100+ GB), the maximum number of partitions can increase to allow more parallelism.
`{ "targetSizeMb": N }`	Each partition targets about `N` megabytes of data.
`{ "maxRowsPerPartition": N }`	Each partition contains at most `N` rows, regardless of data size.

When you use the object form, specify only one of targetSizeMb or maxRowsPerPartition.

auto (default):

"partitionSize": "auto"

Target size in MB:

"partitionSize": { "targetSizeMb": 2048 }

Maximum rows per partition:

"partitionSize": { "maxRowsPerPartition": 500000 }

ColumnTypeMapping model¶

A nested object used within TableConfiguration to specify type mappings for a column.

Property	Type	Required	Description
`sourceType`	`String`		The name of the type in the source system.
`targetType`	`String`		The name of the target type in Snowflake.

ColumnNameMapping model¶

A nested object used within TableConfiguration to specify column name mappings.

Property	Type	Required	Description
`sourceName`	`String`		The name of the column in the source system.
`targetName`	`String`		The name of the target column in Snowflake.

ExtractionStrategy model¶

Configures the method for data extraction.

Field	Type	Required	Description
`strategy`	`String` (`"regular"`, `"unload"`, `"write_nos"`, `"tpt"`)		`"regular"` (default): Worker pulls data over the normal source connection (SQL/BCP patterns by platform). `"unload"` (Redshift only): Redshift UNLOAD writes Parquet to S3; Snowflake loads from the stage. `"write_nos"` (Teradata only): Teradata WRITE_NOS writes Parquet to cloud storage aligned with the external stage. `"tpt"` (Teradata only): Worker runs TPT EXPORT via `tbuild`, converts to Parquet, then uploads.
`externalStage`	`String`	See description	Required when `strategy` is `"unload"` or `"write_nos"`—Snowflake external stage whose URL matches the bucket or container used for UNLOAD or WRITE_NOS. Not required for `"regular"` or `"tpt"`. Optional `externalStageUrl` and `externalStageStorageIntegration` may be used together to auto-create a stage (see orchestrator configuration reference).

Extraction: regular (default)

"extraction": {
  "strategy": "regular"
}

Extraction: unload (Redshift only)

"extraction": {
  "strategy": "unload",
  "externalStage": "MY_DB.MY_SCHEMA.S3_EXTERNAL_STAGE"
}

Extraction: WRITE_NOS (Teradata only)

"extraction": {
  "strategy": "write_nos",
  "externalStage": "MY_DB.MY_SCHEMA.TD_WRITE_NOS_STAGE"
}

Extraction: TPT (Teradata only)

"extraction": {
  "strategy": "tpt"
}

SynchronizationStrategy model¶

Configures the approach for incremental data syncing on subsequent runs.

Field	Type	Required	Description
`strategy`	`String` (`"none"`, `"checksum"`, `"watermark"`)		The synchronization method.
`watermarkColumn`	`String`	`watermark` only	Column name to track. Must be monotonically increasing.
`trackModifications`	`Boolean`		If `true`, the system uses the primary key to identify and deduplicate modified rows. Requires `primaryKeyColumns` to be specified in `TableConfiguration`.

Strategy: none (Default)

Performs a full extraction of all partition data on every run. No synchronization metadata is stored.

"synchronization": {
  "strategy": "none"
}

Use when data is small, changes are unpredictable, or guaranteed consistency is needed.

Strategy: checksum

Computes a hash of all column values for each partition on the source. Only changed partitions are cleared and re-extracted in the target.

"synchronization": {
  "strategy": "checksum"
}

Use when you need to detect any change in a partition but lack a reliable monotonic column (for example, dimension tables). Note that this requires a checksum computation on the source for every partition on every run.

Strategy: watermark

Tracks a monotonic column (timestamp, ID, or version) to sync only rows where the watermark value is greater than the maximum observed in the previous sync.

"synchronization": {
  "strategy": "watermark",
  "watermarkColumn": "UPDATED_AT"
}

Use when your table has a reliable monotonic column that increases on insert/update (for example, fact tables or event logs).

Note

Watermark alone can’t currently track deletions. Support for this will be added in the future.

Example workflow: Redshift UNLOAD with Iceberg targets¶

The following workflow excerpt combines Redshift UNLOAD with Iceberg table targets, including Snowflake-managed Iceberg defaults and a per-table external catalog override:

{
  "defaultTableConfiguration": {
    "source": {
      "schemaName": "public",
      "databaseName": "analytics_db"
    },
    "target": {
      "schemaName": "public",
      "databaseName": "TARGET_DB",
      "tableType": "iceberg",
      "icebergConfig": {
        "catalog": "SNOWFLAKE",
        "externalVolume": "my_iceberg_ext_vol",
        "baseLocationPrefix": "migrations/redshift",
        "sourceDataStage": "@TARGET_DB.PUBLIC.ICEBERG_SOURCE_STAGE"
      }
    },
    "extraction": {
      "strategy": "unload",
      "externalStage": "TARGET_DB.PUBLIC.S3_EXTERNAL_STAGE"
    },
    "partitionSize": "auto"
  },
  "tables": [
    {
      "source": { "tableName": "customers" },
      "target": { "tableName": "customers" },
      "columnNamesToPartitionBy": ["customer_id"]
    },
    {
      "source": { "tableName": "events" },
      "target": {
        "tableName": "events",
        "tableType": "iceberg",
        "icebergConfig": {
          "catalog": "my_glue_catalog_integration",
          "externalVolume": "my_iceberg_ext_vol",
          "catalogTableName": "glue_db.events"
        }
      },
      "columnNamesToPartitionBy": ["event_id"]
    }
  ]
}

Affinity¶

By specifying an affinity for a Workflow, you are indicating that you want specific workers to help with the execution of that Workflow. This can be particularly useful in cases in which you want to have some workers extract from one source and other workers extract from a different source. The rules for matching workers with tasks are:

A task without affinity will be picked up by any worker, independently of the worker’s affinity.
A worker without affinity will pick up any task, independently of the task’s affinity.
A task with a given affinity will not be picked up by a worker with different affinity.

Affinity only needs to be a String; its format is defined by the user. For example, all of these are valid: sql-server, DEV_SERVER, my_custom_server, ::blue::.

Worker configuration¶

This file configures the behavior and connections for the Data Exchange Worker. For cloud data migration and validation, set selected_task_source to "snowflake_stored_procedure" and provide a matching [task_source.snowflake_stored_procedure] section. That section’s connection_name is the Snowflake profile used to call the task-queue stored procedures (PULL_TASKS, COMPLETE_TASK, FAIL_TASK, and related objects). It is separate from [connections.target.snowflake_connection_name], which controls JDBC/ODBC sessions the Worker opens for data movement and validation queries—typically you set both to the same logical Snowflake account connection name.

Section	Property	Type	Description
Top level	`selected_task_source`	`String`	Required for cloud workflows. Use `"snowflake_stored_procedure"`. (Other values such as `"api"` exist for non–Snowflake-queue deployments.)
`[task_source.snowflake_stored_procedure]`	`connection_name`	`String`	Required when `selected_task_source` is `snowflake_stored_procedure`. Snowflake connection name, or the special value `"@SPCS_CONNECTION"` when running inside Snowpark Container Services with injected credentials.
`[application]`	`max_parallel_tasks`	`Integer`	Maximum parallel tasks (threads).
`[application]`	`task_fetch_interval`	`Integer`	Seconds between idle polls for new tasks.
`[application]`	`lease_refresh_interval`	`Integer`	Optional. Seconds between task lease renewals (default `120` when unset in code defaults).
`[application]`	`affinity`	`String`	Optional. User-defined Worker affinity; must align with workflow `affinity` when you use affinity routing.
`[application]`	`snowflake_database_for_metadata`	`String`	Optional. Database for task-queue objects (default `SNOWCONVERT_AI`). Must match `CUSTOM_SNOWFLAKE_DATABASE_FOR_METADATA` on the Orchestrator if overridden.
`[application]`	`snowflake_schema_for_data_migration_metadata`	`String`	Optional. Schema for queue procedures (default `DATA_MIGRATION`). Must match `CUSTOM_SNOWFLAKE_SCHEMA_FOR_DATA_MIGRATION_METADATA` on the Orchestrator if overridden.
`[application]`	`local_results_directory`	`String`	Optional. Base directory for exported Parquet/CSV before upload (default `~/.data_exchange_agent/result_data`; paths are expanded to absolute at load).
`[connections.source.*]`	(per engine)	`Object`	Source database connection. Typically one source section is active.
`[connections.target.snowflake_connection_name]`	`connection_name`	`String`	Snowflake profile used for source/target data sessions (loads, validation, and so on).

An example configuration file for cloud migration:

selected_task_source = "snowflake_stored_procedure"

[task_source.snowflake_stored_procedure]
connection_name = "connection_name"

[application]
max_parallel_tasks = 4
task_fetch_interval = 30
# lease_refresh_interval = 120
# Optional: only if the Orchestrator uses CUSTOM_SNOWFLAKE_* overrides for metadata location
# snowflake_database_for_metadata = "SNOWCONVERT_AI"
# snowflake_schema_for_data_migration_metadata = "DATA_MIGRATION"

# SQL Server connection (standard authentication)
[connections.source.sqlserver]
username = "username"
password = "password"
database = "database_name"
host = "127.0.0.1"
port = 1433

# Amazon Redshift — define only the profile that matches your migration source (examples; keep one block):
# [connections.source.redshift]
# username = "demo-user"
# database = "snowconvert_demo"
# auth_method = "iam-provisioned-cluster"
# cluster_id = "migrations-aws"
# region = "us-west-2"
# access_key_id = "your-access-key-id"
# secret_access_key = "your-secret-access-key"
#
# [connections.source.redshift]
# username = "myuser"
# password = "mypassword"
# database = "mydatabase"
# host = "my-cluster.abcdef123456.us-west-2.redshift.amazonaws.com"
# port = 5439
# auth_method = "standard"

# Snowflake target connection (data sessions)
[connections.target.snowflake_connection_name]
connection_name = "connection_name"

Note

Only one source connection is needed for a given Worker process. You may define multiple [connections.source.*] tables; the Worker uses the section that matches your migration/validation workload’s source engine.

Source connection configuration examples¶

The following examples show common source connection shapes. Table keys under [connections.source.<key>] are arbitrary identifiers; use names that match how you reference the source in your environment.

1. SQL Server (standard authentication)

[connections.source.sqlserver]
username = "username"
password = "password"
database = "database_name"
host = "127.0.0.1"
port = 1433

2. Amazon Redshift (IAM authentication)

[connections.source.redshift]
username = "demo-user"
database = "demo_db"
auth_method = "iam-provisioned-cluster"
cluster_id = "my-aws-cluster"
region = "us-west-2"
access_key_id = "your-access-key-id"
secret_access_key = "your-secret-access-key"
# Optional fields for UNLOAD strategy
# unload_s3_bucket = "my-migrations-bucket"
# unload_iam_role_arn = "arn:aws:iam::123456789012:role/MyRole"

3. Amazon Redshift (standard authentication)

[connections.source.redshift]
username = "myuser"
password = "mypassword"
database = "mydatabase"
host = "my-cluster.abcdef123456.us-west-2.redshift.amazonaws.com"
port = 5439
auth_method = "standard"
# Optional fields for UNLOAD strategy
# unload_s3_bucket = "my-migrations-bucket"
# unload_iam_role_arn = "arn:aws:iam::123456789012:role/MyRole"

4. Teradata (base connection)

Use teradatasql when the Worker environment provides it; otherwise set odbc_driver to the exact Teradata ODBC driver name. For extraction.strategy: "write_nos", add write_nos_location_scheme, write_nos_location_host, write_nos_location_container, and one credential mode under the same [connections.source.teradata] table—see the Teradata tab. For "tpt", install Teradata Tools and Utilities so tbuild is on PATH, or set TPT_TBUILD_EXECUTABLE; optional tpt_delimiter / tpt_max_sessions.

[connections.source.teradata]
host = "your-teradata-host.example.com"
port = 1025
database = "tpcds"
username = "your_username"
password = "your_password"
# odbc_driver = "Teradata Database ODBC Driver 17.20"  # when not using teradatasql
# dbc_name = "TDPID_ALIAS"
# authentication = "LDAP"

Platform-specific Details¶

Additional setup that applies only to certain sources. For connection and extraction options shared across platforms, see Source-platform specifics.

Migrate Amazon Redshift data¶

In order to use the UNLOAD strategy for extraction of Amazon Redshift data, it will be necessary to set up multiple resources. This strategy enables the data to flow directly from Amazon Redshift into an S3 bucket and for Snowflake to execute COPY INTO operations directly from there (by creating an external stage that is mapped to that S3 bucket). This is faster than having the workers download the data and then upload it to a Snowflake stage.

Create a stage integration to S3¶

If you don’t have an existing stage configured, you need to create a Snowflake external stage that integrates with your S3 bucket. You can create the stage using the following SQL command in Snowflake:

CREATE OR REPLACE STAGE <stage_name>
  URL = 's3china://<your_bucket_name>/<path>/'
  STORAGE_INTEGRATION = <your_storage_integration>
  FILE_FORMAT = (TYPE = 'PARQUET');

Alternatively, if you’re using AWS credentials directly:

CREATE OR REPLACE STAGE <stage_name>
  URL = 's3china://<your_bucket_name>/<path>/'
  CREDENTIALS = (AWS_KEY_ID = '<your_aws_key_id>' AWS_SECRET_KEY = '<your_aws_secret_key>')
  FILE_FORMAT = (TYPE = 'PARQUET');

Replace the placeholders:

<stage_name>: Your desired stage name (for example, my_redshift_stage)
<your_bucket_name>: Your S3 bucket name
<path>: Optional path within the bucket
<your_storage_integration>: Your Snowflake storage integration name (recommended method)
<your_aws_key_id> and <your_aws_secret_key>: Your AWS IAM user credentials (if not using storage integration)

Note

Using a Snowflake storage integration is the recommended approach for better security and credential management. For more information about creating storage integrations, see the Snowflake documentation.

Verify stage integration¶

After setting up your stage, verify that the integration is working correctly before proceeding with data migration. You can verify the stage integration by running the following command in Snowflake:

LIST @<stage_name>;

This command should execute successfully without errors. If the stage is newly created and empty, it may return no results, which is expected.

To perform a more thorough verification, you can test the stage by uploading a test file:

PUT file:///<local_test_file_path> @<stage_name>;
LIST @<stage_name>;

If the commands execute successfully and you can see the uploaded file, your stage integration is configured correctly.

Note

You can also verify stage permissions by checking the stage description:

DESCRIBE STAGE <stage_name>;

This displays the stage configuration, including the URL, credentials type, and file format settings.

Migrate Teradata data (`regular`, `write_nos`, and `tpt`)¶

This section describes how Snowflake stages and workflow JSON connect to Teradata extraction strategies. Driver setup, write_nos_* TOML fields, partitioning, and delimiter rules are covered in Source-platform specifics (Teradata tab) and Partition size (partitionSize).

Worker TOML

Use one [connections.source.teradata] block for the Teradata connection on every strategy.
Add write_nos_* keys only when extraction.strategy is "write_nos".

Stage and externalStage

Strategy	`extraction.externalStage` in the workflow	How Snowflake loads the data
`regular` (default)	Omit	Internal migration stage (no external stage in the extraction block).
`write_nos`	Required: must point at the same object-store location Teradata `WRITE_NOS` writes to	External stage (S3, Azure Blob, or GCS).
`tpt`	Omit	Internal migration stage, same pattern as `regular`.

Workers

tpt: The Worker runs TPT export; the host needs tbuild on PATH or TPT_TBUILD_EXECUTABLE set.
write_nos: No tbuild requirement; align your Snowflake external stage and any storage integration with the bucket or container path WRITE_NOS uses (same idea on Azure and GCS as on S3).

SnowConvert AI: Data migration [Preview]¶

Architecture overview¶

Deployment options¶

Prerequisites¶

Source-platform specifics¶

Setup¶

Installation¶

Usage¶

Using SCAI CLI¶

Starting the Orchestrator on SPCS¶

Starting the Workers¶

Generating a workflow configuration file¶

Creating a Data Migration Workflow (and optional local Orchestrator / Worker)¶

Local “all-in-one” run from a project (migrate start)¶

Monitoring and listing workflows¶

Workflow lifecycle and Snowflake observability¶

Redshift UNLOAD¶

Incremental synchronization¶

Considerations and recommendations¶

Connecting to Snowflake with a PAT¶

Running Orchestrator and Workers on SPCS¶

Initial testing¶

Managing Workers¶

Configuration reference¶

Workflow configuration reference¶

Top-level object¶

TableConfiguration model¶

SourceTargetIdentifier model¶

Additional target properties¶

Iceberg configuration (target.icebergConfig)¶

Partition size (partitionSize)¶

ColumnTypeMapping model¶

ColumnNameMapping model¶

ExtractionStrategy model¶

SynchronizationStrategy model¶

Example workflow: Redshift UNLOAD with Iceberg targets¶

Affinity¶

Worker configuration¶

Source connection configuration examples¶

Platform-specific Details¶

Migrate Amazon Redshift data¶

Create a stage integration to S3¶

Verify stage integration¶

Migrate Teradata data (regular, write_nos, and tpt)¶

Local “all-in-one” run from a project (`migrate start`)¶

Iceberg configuration (`target.icebergConfig`)¶

Partition size (`partitionSize`)¶

Migrate Teradata data (`regular`, `write_nos`, and `tpt`)¶