Testing¶

In large-scale database migrations, the testing phase—comprising unit, integration, and user acceptance testing—routinely consumes 3–4x the duration of the actual code conversion. This “Testing Bottleneck” is primarily driven by the manual effort required to verify logic parity across heterogeneous environments. The Snowflake Migration Agent contains an automated solution designed to achieve functional equivalence between legacy systems and Snowflake.

SQL Testing¶

The testing process ensures that as objects are deployed, they are checked against the source database to ensure accuracy. After the conversion and assessment steps have completed, the Snowflake Migration Agent will enter the migration phase to deploy objects to Snowflake.

Example prompts to invoke SQL testing:

> Migrate my objects to Snowflake
> Continue with migration plan
> Deploy and test my stored procedures
> Generate test cases for GetInventoryReport

As part of the process, it enters a testing loop that will:

Generate a series of unit tests for converted objects
Pre-apply known rules to the converted SQL files
Deploy the object to Snowflake
Run a test validation to compare Snowflake output against already captured baselines

If the test fails, the plugin will:

Diagnose the issues
Fix the SQL
Enter the deploy/validate loop again

This will repeat until the tests pass or escalation criteria are hit.

Strategies for Test Generation¶

Generation Strategy	How It Works
Query Logs	Uses execution logs to extract actual EXEC calls to mimic the source system.
AI-Assisted Swarm	Used when source has no logs. Spawns parallel agents that probe the source DB to generate test cases.
Synthetic	Used when no source data is available. Analyzes source sql branches and paths to generate two sided test artifacts with synthetic INSERT data.

Execution Flow¶

Captures baselines from the source system, save to Snowflake stage
Validate against Snowflake after deployment, capture PASS/FAIL/ERROR
Update registry when tests pass

The plugin will first look for a VALIDATION schema. If one is not found, it will create one.

ETL Testing¶

ETL testing takes a different approach. For procs/functions, you capture “what the source actually returns” and match it. For ETL, you derive “what the source should do” from its definition file and write assertions the converted code must satisfy.

Example prompts to invoke ETL testing:

# SSIS
> Migrate my SSIS packages to Snowflake
> Generate dbt tests for my SSIS data flows
> Fix and test the Data_Flow_Task dbt project

# Informatica
> Migrate my Informatica workflows to Snowflake
> Generate dbt tests for my Informatica mappings
> Test the orchestration for my Informatica workflow

The migration plugin converts ETL packages (SSIS and Informatica) into two artifacts:

1. dbt projects (for the data flow / transformation logic)
2. Snowflake Task orchestration SQL (for the control flow)

Platform	Source File	Orchestration Concept	Transformation Concept
SSIS	`.dtsx` (XML)	Control flow (Execute SQL Tasks, Script Tasks, precedence constraints)	Data flow tasks (pipeline components)
Informatica	`.xml` (PowerCenter export)	Workflows (Tasks, WorkflowLinks, Worklets)	Mappings (Transformations, Connectors, Instances)

dbt Data Flow Testing¶

Instead of baseline capture from a source database, the plugin generates dbt-native tests using an Arrange-Act-Assert pattern:

Arrange — Synthetic seed CSVs derived from the source definition file:

3-5 rows per source table
Rows designed to exercise every downstream transform condition
Edge cases (NULLs, zeros, boundary dates)
Stored in .migrate-etl-package/tests/dbt/<project>/seeds/

Where the seed data comes from depends on platform:

Platform	Column Definitions From	Transform Conditions From
SSIS	Pipeline component definitions in `DTS:Executable` elements	`path` elements, Conditional Split expressions
Informatica	`TRANSFORMFIELD` elements on `TRANSFORMATION` nodes	`EXPRESSION` attribute on Filter/Router ports; session-level SQL overrides

Act — dbt seed → dbt run (builds models against seed data)

Assert — Two types of dbt tests:

Schema tests (not_null, unique, accepted_values, relationships)
Singular assertion tests (SQL queries that return rows on failure)

Informatica introduces additional assertion patterns not present in SSIS:

Lookup cache modes — static lookups assert LEFT JOIN behavior; dynamic lookups assert MERGE upsert behavior
Aggregator modes — standard (GROUP BY) vs. incremental (window functions with QUALIFY)
Router DEFAULT group — seed rows that match no group condition to verify catch-all routing
Mapping variable branching — separate test files per $$variable value, run with --vars

Key differences from proc/function testing:

Source definition file is the source of truth, not a live source database execution
Tests are generated BEFORE fixes — failures are expected as a baseline
Tests run entirely in Snowflake via dbt test — no source DB connection needed
The dbt-fixer agent then fixes models until the pre-written tests pass

Orchestration (Control Flow) Testing¶

For the Snowflake Task DAG that replaces the ETL control flow, there’s a separate orchestration-test-gen skill that:

Generates tests for each control flow element
Tests are SQL assertions executed directly against Snowflake
Verifies that the orchestration stored procedures produce correct side effects (table modifications, variable updates)
Uses clone-based isolation for elements that modify data

Platform-specific control flow elements tested:

SSIS	Informatica
Execute SQL Tasks	Session tasks (mapping execution)
Script Tasks	Command tasks (shell/OS scripts)
Variable logic	Assignment tasks, `$$variable` propagation
Precedence constraints	WorkflowLink conditions (`$Task.Status = SUCCEEDED`)
Nested containers	Worklet expansion into parent DAG

Summary of Testing Methods¶

Aspect	Procs & Functions	ETL (SSIS / Informatica)
Test engine	scai test capture + validate	dbt test + direct SQL assertions
“Source of Truth”	Live source DB output	Source definition file (`.dtsx` or `.xml`)
Source DB needed	Yes	No
Test timing	Tests generated, baselines captured, THEN fix loop	Tests generated BEFORE fixes (failures = baseline)
Data	Real data or synthetic via branch analysis	Synthetic seeds from dataflow analysis
Artifacts	YAML test files	CSV seeds + SQL test files + schema YAML