Metadata and retention for Apache Iceberg™ tables

Snowflake handles metadata for Apache Iceberg™ tables according to the type of catalog you use (Snowflake or external).

Note

Specifying the default minimum number of snapshots with the history.expire.min-snapshots-to-keep table property (https://iceberg.apache.org/docs/1.2.1/configuration/#table-behavior-properties) is not supported for any type of Iceberg table.

Tables that use Snowflake as the catalog

Snowflake manages the metadata life cycle for this table type, and deletes old metadata, manifest lists, and manifest files based on the retention period for the table data and snapshots.

To set the retention period for table data and snapshots, set the DATA_RETENTION_TIME_IN_DAYS parameter at the account, database, schema, or table level.

Creation

Snowflake generates metadata for version 2 of the Apache Iceberg specification on a periodic basis, and writes the metadata to files on your external volume. Each new metadata file contains all DML or DDL changes since the last Snowflake-generated metadata file was created.

You can also create metadata on demand by using the SYSTEM$GET_ICEBERG_TABLE_INFORMATION function. For instructions, see Generate snapshots of DML changes.

For information about locating metadata files, see Data and metadata directories.

Viewing metadata creation history

To access a full history of metadata generation attempts, view the query history for your account and filter the results. Search for the SYSTEM$GET_ICEBERG_TABLE_INFORMATION function name in the SQL text.

Snowflake internally uses the same SYSTEM$GET_ICEBERG_TABLE_INFORMATION function to generate table metadata. Attempts made by Snowflake appear under the user called SYSTEM in the query history. The STATUS column in the query history indicates whether metadata was successfully generated.

For viewing options, see Monitor query activity with Query History.

Deletion

Snowflake deletes Iceberg metadata from your external cloud storage when the following events occur:

  • After you drop a table.

  • When the Iceberg metadata refers to snapshots or table data that has expired.

Deletion doesn’t occur immediately after the data retention period expires. As a result, metadata storage might incur costs with your cloud storage provider for longer than a table’s lifetime.

Warning

Snowflake does not support Fail-safe for Snowflake-managed Iceberg tables, because the table data is in external cloud storage that you manage. To protect Iceberg table data, you need to configure data protection and recovery with your cloud provider.

After dropping a table

When you drop a table, you can use the UNDROP ICEBERG TABLE command to restore it within the data retention period.

When the retention period expires, Snowflake deletes table metadata and snapshots that it has written from your external volume location. Deletion occurs asynchronously and can take a few days to complete after the retention period has passed.

Note

For converted tables, Snowflake deletes only metadata that was generated after table conversion.

After snapshots expire

Snowflake deletes Iceberg metadata files related to expired snapshots after the data retention period passes. Deletion usually occurs 7-14 days after a snapshot expires.

Only previous table snapshots can expire. Snowflake never deletes metadata files that represent the latest (current) state of a table from your external cloud storage.

Tables that use an external catalog

For tables that use an external catalog, Snowflake uses the value of the DATA_RETENTION_TIME_IN_DAYS parameter to set a retention period for Snowflake Time Travel and undropping the table. When the retention period expires, Snowflake does not delete the Iceberg metadata or snapshots from your external cloud storage.

Snowflake sets DATA_RETENTION_TIME_IN_DAYS at the table level to the smaller of the following values:

  • The history.expire.max-snapshot-age-ms value in the current metadata file. Snowflake converts the value to days (rounding down).

  • The following value, depending on your Snowflake account edition:

    • Standard Edition: 1 day.

    • Enterprise Edition or higher: 5 days.

You can’t manually change the value of DATA_RETENTION_TIME_IN_DAYS in Snowflake. To change the value, you must update history.expire.max-snapshot-age-ms in your metadata file and then refresh the table.

You can use the following table functions to retrieve information about the files registered to an externally managed Iceberg table or the most recent snapshot refresh history:

Delta-based tables

Note

If you want to use metadata writes for Delta-based Iceberg tables, the 2025_01 behavior change bundle must not be disabled in your account.

For Iceberg tables created from Delta table files, Snowflake automatically writes Iceberg metadata to your external storage if you configure your external volume with write access (see ALLOW_WRITES). For more information about the write location, see Data and metadata directories.

To prevent Snowflake from writing Iceberg metadata, you can set the ALLOW_WRITES parameter to FALSE on your external volume as long as no Snowflake-managed Iceberg tables use the same external volume.

Iceberg partitioning

“Hidden” partitioning (https://iceberg.apache.org/docs/latest/partitioning/#icebergs-hidden-partitioning) for Apache Iceberg™ is metadata-based and adaptable. Iceberg produces partition values based on transforms that you define when you create a table. When they read from a partitioned table, Iceberg engines use the partition values defined in your table metadata to efficiently identify relevant data.

Snowflake supports the following partitioning use cases:

To create a partitioned Iceberg table, include the PARTITION BY clause with one or more partition transforms (https://iceberg.apache.org/spec/#partition-transforms) in your regular CREATE ICEBERG TABLE statement. For an example, see Create an Iceberg table in a catalog-linked database.

Partitioning support matrix

The following table shows which features and actions are supported for each type of partitioned Iceberg table, and indicates compliance with version 2 of the Apache Iceberg specification.

Note

  • Version 3 of the Apache Iceberg specification isn’t supported when using partitioning.

  • CLD stands for catalog-linked database.

Snowflake managed

Externally managed (CLD)

Externally managed (non-CLD)

Iceberg spec V2 compatibility

Comment

COPY commands with the ON_ERROR = ABORT_STATEMENT option

COPY INTO <table>

Limited support

Limited support

Limited support

Limited support

See Usage notes.

CREATE ICEBERG TABLE … AS SELECT (CTAS)

Cloning

See usage notes:

CREATE ICEBERG TABLE … LIKE

See usage notes:

Deletion vectors

N/A

Clustering

TARGET_FILE_SIZE

The partitions are written in a 16 MB file size.

Partition evolution

Limited support

Limited support

Limited support

We support partition evolution if it is done with an external engine.

Partition transforms

For the supported partition transforms, see:

Positional deletes

N/A

Snowpipe

Snowpipe Streaming

Sorting within partitions

Partitioning considerations

Consider the following before you use partitioned writes for Iceberg tables:

  • If you use an external engine to add, drop, or replace a partition field in an externally managed table, Snowflake writes data according to the latest partition specification.

  • Snowflake uses a target file size of 16 MB for partitioned tables. If the default file size is larger, Snowflake automatically decreases the target size to 16 MB when you write to a partitioned table.

  • The GET_DDL function doesn’t include the PARTITION BY clause in its output.

  • The sum of the sizes of the outputs for all partition transforms can’t exceed 1024 bytes for a single row.

  • Because partition evolution isn’t supported for Snowflake-managed tables, you must drop the table and create a new one with partitioning.

  • The DAY(), MONTH(), YEAR() partition transform parameters, which you specify within the PARTITION BY clause under table properties, are part of the Iceberg specification. For multiple days, months, or years, the partition expression parameter returns a partition for each calendar day, month, or year. For example, when the DAY() transform is used on a timestamp column that has 2 months of data, 61 partitions are created.

    In contrast, the DAY(), MONTH(), YEAR() functions in Snowflake are part of the SQL standard. For multiple days, months, or years, these functions extract the corresponding day, month, or year part from a date or timestamp. For example, when the DAY() function is used on a timestamp column that has multiple months of data, this function returns a day of the month ranging from 1 to 31.

Time travel

With Snowflake Time Travel, you can use Snowflake to query historical data for a table.

You can also use a third-party compute engine to perform time travel queries on Snowflake-managed tables when you Sync a Snowflake-managed table with Snowflake Open Catalog or use the Snowflake Catalog SDK.

You can query any snapshots that were committed within the data retention period. To specify the data retention period, set the DATA_RETENTION_TIME_IN_DAYS object parameter.

When you delete table data or drop a table, Snowflake deletes objects after the table retention period expires. This might incur costs with your cloud storage provider for longer than the table’s lifetime.

Language: English