Understanding primary keys in dynamic tables¶
Snowflake can use primary keys in dynamic tables and dynamic iceberg tables to track row-level changes more efficiently and to enable incremental refresh downstream of full refresh dynamic tables. Instead of relying on change-tracking columns, Snowflake uses primary keys as stable row identifiers to compute the minimal set of changes between refreshes.
This is especially useful in the following scenarios:
Base tables are periodically rewritten through INSERT OVERWRITE rather than updated in place, which normally prevents Snowflake from detecting what changed between versions.
The pipeline reads from an externally-managed Apache Iceberg™ v2 table, which preceed row lineage.
Some dynamic tables must use full refresh mode because of unsupported incremental constructs, but downstream tables would benefit from incremental processing.
Types of primary keys in dynamic tables¶
Snowflake supports two types of primary key use cases for dynamic tables: 1) row-level change tracking and 2) derivation of a unique key for the dynamic table itself.
Primary key row-level lineage-based change tracking¶
When a base table has a primary key constraint with the RELY property set, Snowflake
uses that key for row-level change tracking in downstream dynamic tables. This is particularly
useful when the base table is periodically rewritten through INSERT OVERWRITE, which normally
prevents change tracking across table versions.
With a reliable primary key, Snowflake identifies which rows changed between refreshes by comparing primary key values instead of relying on internal change-tracking columns. This enables incremental processing even when the underlying data is fully replaced.
To set the RELY property on a base table primary key:
Unique key derivation¶
Snowflake can automatically derive a reliable unique key from the query definition of a dynamic table. For example, the following SQL constructs produce derived unique keys:
GROUP BY: The grouping columns form a unique key because each group produces exactly one output row.
QUALIFY ROW_NUMBER() = 1: The partition-by columns form a unique key because the filter keeps exactly one row per partition.
Reliable base table primary keys: The primary key of the base table is used as the unique key of the dynamic table.
Snowflake registers derived primary keys as unique constraints on the dynamic table. Because these constraints come from the query structure, they’re fully reliable without additional validation.
To check whether a dynamic table has a derived primary key, run:
When to use primary keys¶
Primary keys are useful in the following scenarios:
- Improve change tracking for INSERT OVERWRITE workloads
When a base table is periodically rewritten through INSERT OVERWRITE, Snowflake can’t use standard change-tracking columns to detect what changed. A primary key lets Snowflake compare rows by key value and process only the actual changes, avoiding a full recomputation of the dynamic table.
- Enable incremental refresh downstream of full refresh dynamic tables
Normally, a dynamic table in incremental refresh mode can’t be downstream of a dynamic table in full refresh mode. When the upstream full refresh dynamic table has a system-derived unique key, Snowflake can compute the changes between full refreshes, allowing downstream tables to refresh incrementally. This removes a major blocker for incremental pipelines.
- Reduce change propagation in pipelines
Primary keys enable value-based change reduction at each stage of a pipeline. Snowflake can filter out rows where the primary key exists in both the old and new versions with identical values, reducing the volume of changes that propagate to downstream tables.
Key behaviors¶
Opt-in for downstream incremental refresh: To use incremental refresh on a dynamic table that reads from a full refresh dynamic table with a derived unique key, you must explicitly set
REFRESH_MODE = INCREMENTALon the downstream table. SettingREFRESH_MODE = AUTOcontinues to resolve to FULL.Verify primary key-based change tracking support: Use
SHOW UNIQUE KEYS IN <dt_name>to check whether a dynamic table has a derived unique key. Alternatively, create a downstream dynamic table withREFRESH_MODE = INCREMENTALand check whether the creation succeeds.Masking policies: Masking policies that obfuscate primary key columns prevent Snowflake from using those keys for change tracking. In this case, Snowflake falls back to standard change-tracking columns.
Next steps¶
For examples and implementation guidance, see Use primary keys to optimize dynamic table pipelines.
For the full syntax of CREATE DYNAMIC TABLE, see CREATE DYNAMIC TABLE.