Testing

In large-scale database migrations, the testing phase—comprising unit, integration, and user acceptance testing—routinely consumes 3–4x the duration of the actual code conversion. This “Testing Bottleneck” is primarily driven by the manual effort required to verify logic parity across heterogeneous environments. The Snowflake Migration Agent contains an automated solution designed to achieve functional equivalence between legacy systems and Snowflake.


SQL Testing

The testing process ensures that as objects are deployed, they are checked against the source database to ensure accuracy. After the conversion and assessment steps have completed, the Snowflake Migration Agent will enter the migration phase to deploy objects to Snowflake.

Example prompts to invoke SQL testing:

> Migrate my objects to Snowflake
> Continue with migration plan
> Deploy and test my stored procedures
> Generate test cases for GetInventoryReport

As part of the process, it enters a testing loop that will:

  • Generate a series of unit tests for converted objects
  • Pre-apply known rules to the converted SQL files
  • Deploy the object to Snowflake
  • Run a test validation to compare Snowflake output against already captured baselines

If the test fails, the plugin will:

  • Diagnose the issues
  • Fix the SQL
  • Enter the deploy/validate loop again

This will repeat until the tests pass or escalation criteria are hit.

Strategies for Test Generation

Generation StrategyHow It Works
Query LogsUses execution logs to extract actual EXEC calls to mimic the source system.
AI-Assisted SwarmUsed when source has no logs. Spawns parallel agents that probe the source DB to generate test cases.
SyntheticUsed when no source data is available. Analyzes source sql branches and paths to generate two sided test artifacts with synthetic INSERT data.

Execution Flow

  1. Captures baselines from the source system, save to Snowflake stage
  2. Validate against Snowflake after deployment, capture PASS/FAIL/ERROR
  3. Update registry when tests pass

The plugin will first look for a VALIDATION schema. If one is not found, it will create one.

ETL Testing

ETL testing takes a different approach. For procs/functions, you capture “what the source actually returns” and match it. For ETL, you derive “what the source should do” from its definition file and write assertions the converted code must satisfy.

Example prompts to invoke ETL testing:

# SSIS
> Migrate my SSIS packages to Snowflake
> Generate dbt tests for my SSIS data flows
> Fix and test the Data_Flow_Task dbt project

# Informatica
> Migrate my Informatica workflows to Snowflake
> Generate dbt tests for my Informatica mappings
> Test the orchestration for my Informatica workflow

The migration plugin converts ETL packages (SSIS and Informatica) into two artifacts:

1. dbt projects (for the data flow / transformation logic)
2. Snowflake Task orchestration SQL (for the control flow)

PlatformSource FileOrchestration ConceptTransformation Concept
SSIS.dtsx (XML)Control flow (Execute SQL Tasks, Script Tasks, precedence constraints)Data flow tasks (pipeline components)
Informatica.xml (PowerCenter export)Workflows (Tasks, WorkflowLinks, Worklets)Mappings (Transformations, Connectors, Instances)

dbt Data Flow Testing

Instead of baseline capture from a source database, the plugin generates dbt-native tests using an Arrange-Act-Assert pattern:

Arrange — Synthetic seed CSVs derived from the source definition file:

  • 3-5 rows per source table
  • Rows designed to exercise every downstream transform condition
  • Edge cases (NULLs, zeros, boundary dates)
  • Stored in .migrate-etl-package/tests/dbt/<project>/seeds/

Where the seed data comes from depends on platform:

PlatformColumn Definitions FromTransform Conditions From
SSISPipeline component definitions in DTS:Executable elementspath elements, Conditional Split expressions
InformaticaTRANSFORMFIELD elements on TRANSFORMATION nodesEXPRESSION attribute on Filter/Router ports; session-level SQL overrides

Act — dbt seed → dbt run (builds models against seed data)

Assert — Two types of dbt tests:

  • Schema tests (not_null, unique, accepted_values, relationships)
  • Singular assertion tests (SQL queries that return rows on failure)

Informatica introduces additional assertion patterns not present in SSIS:

  • Lookup cache modes — static lookups assert LEFT JOIN behavior; dynamic lookups assert MERGE upsert behavior
  • Aggregator modes — standard (GROUP BY) vs. incremental (window functions with QUALIFY)
  • Router DEFAULT group — seed rows that match no group condition to verify catch-all routing
  • Mapping variable branching — separate test files per $$variable value, run with --vars

Key differences from proc/function testing:

  • Source definition file is the source of truth, not a live source database execution
  • Tests are generated BEFORE fixes — failures are expected as a baseline
  • Tests run entirely in Snowflake via dbt test — no source DB connection needed
  • The dbt-fixer agent then fixes models until the pre-written tests pass

Orchestration (Control Flow) Testing

For the Snowflake Task DAG that replaces the ETL control flow, there’s a separate orchestration-test-gen skill that:

  • Generates tests for each control flow element

  • Tests are SQL assertions executed directly against Snowflake

  • Verifies that the orchestration stored procedures produce correct side effects (table modifications, variable updates)

  • Uses clone-based isolation for elements that modify data

    Platform-specific control flow elements tested:

SSISInformatica
Execute SQL TasksSession tasks (mapping execution)
Script TasksCommand tasks (shell/OS scripts)
Variable logicAssignment tasks, $$variable propagation
Precedence constraintsWorkflowLink conditions ($Task.Status = SUCCEEDED)
Nested containersWorklet expansion into parent DAG

Summary of Testing Methods

AspectProcs & FunctionsETL (SSIS / Informatica)
Test enginescai test capture + validatedbt test + direct SQL assertions
“Source of Truth”Live source DB outputSource definition file (.dtsx or .xml)
Source DB neededYesNo
Test timingTests generated, baselines captured, THEN fix loopTests generated BEFORE fixes (failures = baseline)
DataReal data or synthetic via branch analysisSynthetic seeds from dataflow analysis
ArtifactsYAML test filesCSV seeds + SQL test files + schema YAML