Testing¶
In large-scale database migrations, the testing phase—comprising unit, integration, and user acceptance testing—routinely consumes 3–4x the duration of the actual code conversion. This “Testing Bottleneck” is primarily driven by the manual effort required to verify logic parity across heterogeneous environments. The Snowflake Migration Agent contains an automated solution designed to achieve functional equivalence between legacy systems and Snowflake.
SQL Testing¶
The testing process ensures that as objects are deployed, they are checked against the source database to ensure accuracy. After the conversion and assessment steps have completed, the Snowflake Migration Agent will enter the migration phase to deploy objects to Snowflake.
Example prompts to invoke SQL testing:
As part of the process, it enters a testing loop that will:
- Generate a series of unit tests for converted objects
- Pre-apply known rules to the converted SQL files
- Deploy the object to Snowflake
- Run a test validation to compare Snowflake output against already captured baselines
If the test fails, the plugin will:
- Diagnose the issues
- Fix the SQL
- Enter the deploy/validate loop again
This will repeat until the tests pass or escalation criteria are hit.
Strategies for Test Generation¶
| Generation Strategy | How It Works |
|---|---|
| Query Logs | Uses execution logs to extract actual EXEC calls to mimic the source system. |
| AI-Assisted Swarm | Used when source has no logs. Spawns parallel agents that probe the source DB to generate test cases. |
| Synthetic | Used when no source data is available. Analyzes source sql branches and paths to generate two sided test artifacts with synthetic INSERT data. |
Execution Flow¶
- Captures baselines from the source system, save to Snowflake stage
- Validate against Snowflake after deployment, capture PASS/FAIL/ERROR
- Update registry when tests pass
The plugin will first look for a VALIDATION schema. If one is not found, it will create one.
ETL Testing¶
ETL testing takes a different approach. For procs/functions, you capture “what the source actually returns” and match it. For ETL, you derive “what the source should do” from its definition file and write assertions the converted code must satisfy.
Example prompts to invoke ETL testing:
The migration plugin converts ETL packages (SSIS and Informatica) into two artifacts:
1. dbt projects (for the data flow / transformation logic)
2. Snowflake Task orchestration SQL (for the control flow)
| Platform | Source File | Orchestration Concept | Transformation Concept |
|---|---|---|---|
| SSIS | .dtsx (XML) | Control flow (Execute SQL Tasks, Script Tasks, precedence constraints) | Data flow tasks (pipeline components) |
| Informatica | .xml (PowerCenter export) | Workflows (Tasks, WorkflowLinks, Worklets) | Mappings (Transformations, Connectors, Instances) |
dbt Data Flow Testing¶
Instead of baseline capture from a source database, the plugin generates dbt-native tests using an Arrange-Act-Assert pattern:
Arrange — Synthetic seed CSVs derived from the source definition file:
- 3-5 rows per source table
- Rows designed to exercise every downstream transform condition
- Edge cases (NULLs, zeros, boundary dates)
- Stored in .migrate-etl-package/tests/dbt/<project>/seeds/
Where the seed data comes from depends on platform:
| Platform | Column Definitions From | Transform Conditions From |
|---|---|---|
| SSIS | Pipeline component definitions in DTS:Executable elements | path elements, Conditional Split expressions |
| Informatica | TRANSFORMFIELD elements on TRANSFORMATION nodes | EXPRESSION attribute on Filter/Router ports; session-level SQL overrides |
Act — dbt seed → dbt run (builds models against seed data)
Assert — Two types of dbt tests:
- Schema tests (not_null, unique, accepted_values, relationships)
- Singular assertion tests (SQL queries that return rows on failure)
Informatica introduces additional assertion patterns not present in SSIS:
- Lookup cache modes — static lookups assert LEFT JOIN behavior; dynamic lookups assert MERGE upsert behavior
- Aggregator modes — standard (GROUP BY) vs. incremental (window functions with QUALIFY)
- Router DEFAULT group — seed rows that match no group condition to verify catch-all routing
- Mapping variable branching — separate test files per
$$variablevalue, run with--vars
Key differences from proc/function testing:
- Source definition file is the source of truth, not a live source database execution
- Tests are generated BEFORE fixes — failures are expected as a baseline
- Tests run entirely in Snowflake via dbt test — no source DB connection needed
- The dbt-fixer agent then fixes models until the pre-written tests pass
Orchestration (Control Flow) Testing¶
For the Snowflake Task DAG that replaces the ETL control flow, there’s a separate orchestration-test-gen skill that:
-
Generates tests for each control flow element
-
Tests are SQL assertions executed directly against Snowflake
-
Verifies that the orchestration stored procedures produce correct side effects (table modifications, variable updates)
-
Uses clone-based isolation for elements that modify data
Platform-specific control flow elements tested:
| SSIS | Informatica |
|---|---|
| Execute SQL Tasks | Session tasks (mapping execution) |
| Script Tasks | Command tasks (shell/OS scripts) |
| Variable logic | Assignment tasks, $$variable propagation |
| Precedence constraints | WorkflowLink conditions ($Task.Status = SUCCEEDED) |
| Nested containers | Worklet expansion into parent DAG |
Summary of Testing Methods¶
| Aspect | Procs & Functions | ETL (SSIS / Informatica) |
|---|---|---|
| Test engine | scai test capture + validate | dbt test + direct SQL assertions |
| “Source of Truth” | Live source DB output | Source definition file (.dtsx or .xml) |
| Source DB needed | Yes | No |
| Test timing | Tests generated, baselines captured, THEN fix loop | Tests generated BEFORE fixes (failures = baseline) |
| Data | Real data or synthetic via branch analysis | Synthetic seeds from dataflow analysis |
| Artifacts | YAML test files | CSV seeds + SQL test files + schema YAML |