Write support for externally managed Apache Iceberg™ tables

Write support for externally managed Apache Iceberg™ tables lets you perform write operations on tables managed by an external Iceberg REST catalog. The Iceberg table in Snowflake is linked to a table in your remote catalog. When you make changes to the table in Snowflake, Snowflake commits the same changes to your remote catalog.

This preview feature expands interoperability between Snowflake and third-party systems, allowing you to use Snowflake for data engineering workloads with Iceberg even when you use an external Iceberg catalog.

Key use cases include the following:

  • Building complex data engineering pipelines with Iceberg tables: Writing to Iceberg tables in external catalogs from Snowflake allows you to use Snowpark or Snowflake SQL to build complex pipelines that ingest, transform, and process data for Iceberg tables. You can query the data by using Snowflake or other engines. Similarly, you can use Snowflake partner tools to build your Iceberg data engineering pipelines.

  • Making your data available to the Iceberg ecosystem: The ability to write to Iceberg tables in external catalogs lets you make your data available to the Iceberg ecosystem. You can query data that’s already in Snowflake and write it to Iceberg tables. To keep your Iceberg tables in sync with your Snowflake tables, you can use operations like INSERT INTO … SELECT FROM to do the following:

    • Copy existing data from a standard Snowflake table into an Iceberg table.

    • Insert data with Snowflake streams.

Considerations

Consider the following when you use write support for externally managed Iceberg tables:

  • Snowflake provides Data Definition Language (DDL) and Data Manipulation Language (DML) commands for externally managed tables. However, you configure metadata and data retention using your external catalog and the tools provided by your external storage provider. For more information, see Tables that use an external catalog.

    For writes, Snowflake ensures that changes are committed to your remote catalog before updating the table in Snowflake.

  • If you use a catalog-linked database, you can use the CREATE ICEBERG TABLE syntax with column definitions to create a table in Snowflake and in your remote catalog. If you use a standard Snowflake database (not linked to a catalog), you must first create a table in your remote catalog. After that, you can use the CREATE ICEBERG TABLE (Iceberg REST catalog) syntax to create an Iceberg table in Snowflake and write to it.

  • For the AWS Glue Data Catalog: Dropping an externally managed table through Snowflake does not delete the underlying table files. This behavior is specific to the AWS Glue Data Catalog implementation.

  • If you participated in private preview for this feature, position row-level deletes might be enabled by default in your account. To check, run the following command:

    SHOW PARAMETERS LIKE 'ENABLE_ICEBERG_MERGE_ON_READ' IN ACCOUNT;
    
    Copy

    If the parameter is visible in your account (regardless of its value), position deletes are enabled. To turn off position deletes, set the ENABLE_ICEBERG_MERGE_ON_READ parameter to FALSE at the table, schema, or database level.

  • Writing to externally managed tables with the following Iceberg data types isn’t supported:

    • uuid

    • fixed(L)

  • The following features aren’t currently supported when you use Snowflake to write to externally managed Iceberg tables:

    • Catalog-vended credentials.

    • Server-side encryption (SSE) for GCS or Azure external volumes.

    • Multi-statement transactions. Snowflake supports autocommit transactions only.

    • Conversion to Snowflake-managed tables.

    • External Iceberg catalogs that don’t conform to the Iceberg REST protocol.

    • Row-level deletes (merge-on-read).

    • Using the OR REPLACE option when creating a table.

    • Using the CREATE TABLE … AS SELECT syntax if your use AWS Glue as your remote catalog.

  • For creating schemas in a catalog-linked database, be aware of the following:

    • The CREATE SCHEMA command creates a corresponding namespace in your remote catalog only when you use a catalog-linked database.

    • The ALTER and CLONE options aren’t supported.

    • Delimiters aren’t supported for schema names. Only alphanumeric schema names are supported.

Workflow

Use the workflow in this section to get started with this feature:

  1. Configure an external volume

  2. Create a catalog integration

  3. Choose from the following options:

    • Create a catalog-linked database. With this option, you can write to auto-discovered Iceberg tables in your catalog, or use the catalog-linked database to create additional Iceberg tables.

    • Create an Iceberg table (in a catalog-linked database or a standard Snowflake database). With this option, you must first create a table in your remote catalog before you create an externally managed Iceberg table in Snowflake.

After you complete these steps, you can perform write operations on your Iceberg tables.

Configuring an external volume for writes to externally managed tables

To enable writes to externally managed tables, you must use an external volume configured with both read and write permissions for the Snowflake service principal (the same permissions required for Snowflake-managed tables).

Specify the storage location (STORAGE_BASE_URL) where your Iceberg REST catalog writes Iceberg data and metadata.

For example, the following statement creates an external volume for S3 with encryption that allows write access to a bucket named my-iceberg-tables:

CREATE OR REPLACE EXTERNAL VOLUME my_external_volume
  STORAGE_LOCATIONS =
    (
      (
        NAME = 'my-s3-us-west-2'
        STORAGE_PROVIDER = 'S3'
        STORAGE_BASE_URL = 's3://my-iceberg-tables/'
        STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::123456789012:role/my-write-access-role'
        ENCRYPTION = ( TYPE = 'AWS_SSE_S3' )
      )
    )
  ALLOW_WRITES = TRUE;
Copy

For complete instructions, see the following topics:

For more information about the required permissions, see Granting Snowflake access to your storage.

Configuring a catalog integration

For writing to externally managed Iceberg tables, you must configure a catalog integration to connect Snowflake to your remote catalog.

Your remote catalog must comply with the open source Apache Iceberg REST OpenAPI specification (https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml), such as Open Catalog or the AWS Glue Iceberg REST catalog (https://docs.aws.amazon.com/glue/latest/dg/connect-glu-iceberg-rest.html).

To create a catalog integration, see the instructions in the following topics:

Create a catalog-linked database

Snowflake supports creating writable externally managed tables in a catalog-linked database, which is a Snowflake database that you sync with an external Iceberg REST catalog. You can also write to Iceberg tables that Snowflake automatically discovers in your remote catalog. For more information, see Use a catalog-linked database for Apache Iceberg™ tables.

Note

Alternatively, you can create writable externally managed Iceberg tables in a standard Snowflake database.

The following example uses the CREATE DATABASE (catalog-linked) command to create a catalog-linked database that uses an external volume:

CREATE DATABASE my_catalog_linked_db
  LINKED_CATALOG = (
    CATALOG = 'glue_rest_catalog_int'
  ),
  EXTERNAL_VOLUME = 'my_external_vol';
Copy

Use CREATE SCHEMA to create namespaces in your external catalog

To create a namespace for organizing Iceberg tables in your external catalog, you can use the CREATE SCHEMA command with a catalog-linked database. The command creates a namespace in your linked Iceberg REST catalog and a corresponding schema in your Snowflake database.

CREATE SCHEMA 'my_namespace';
Copy

Note

Schema names must be alphanumeric and can’t include delimiters.

DROP SCHEMA

You can also use the DROP SCHEMA command to simultaneously drop a schema from your catalog-linked database and its corresponding namespace from your remote catalog.

DROP SCHEMA 'my_namespace';
Copy

Create an Iceberg table

Creating an externally managed Iceberg table that you can write to from Snowflake varies, depending on the kind of database you use:

Create an Iceberg table in a catalog-linked database

To create a table in Snowflake and in your external catalog at the same time, use the CREATE ICEBERG TABLE (catalog-linked database) command.

The following example creates a writable Iceberg table by using the previously created external volume and catalog integration for AWS Glue REST.

USE DATABASE my_catalog_linked_db;

USE SCHEMA my_namespace;

CREATE OR REPLACE ICEBERG TABLE my_iceberg_table (
  first_name string,
  last_name string,
  amount int,
  create_date date
);
Copy

When you run the command, Snowflake creates a new Iceberg table in your remote catalog and a linked, writable externally managed table in Snowflake.

Create an Iceberg table in a standard Snowflake database

If using a standard Snowflake database, you must first create a table in your remote catalog. For example, you might use Spark to write an Iceberg table to Open Catalog.

After you create the table in your remote catalog, use the CREATE ICEBERG TABLE (Iceberg REST catalog) command to create an Iceberg table object in Snowflake. For the CATALOG_TABLE_NAME, specify the name of the table as it appears in your remote catalog.

For example:

CREATE OR REPLACE ICEBERG TABLE my_iceberg_table
  EXTERNAL_VOLUME = 'my_external_volume'
  CATALOG = 'my_rest_catalog_integration'
  CATALOG_TABLE_NAME = 'my_remote_table_name';
Copy

When you run the command, Snowflake creates a writable externally managed table in Snowflake that is linked to the existing table in your remote catalog.

Writing to externally managed Iceberg tables

You can use the following DML commands for externally managed Iceberg tables:

You can also use the Snowpark API to process Iceberg tables.

Examples

You can use the following basic examples to get started with writing to Iceberg tables.

INSERT

Use INSERT to insert values into an Iceberg table:

INSERT INTO my_iceberg_table VALUES (1, 'a');
INSERT INTO my_iceberg_table VALUES (2, 'b');
INSERT INTO my_iceberg_table VALUES (3, 'c');
Copy

UPDATE

Use UPDATE to update the values in an Iceberg table:

UPDATE my_iceberg_table
  SET a = 10
  WHERE b = 'b';
Copy

DELETE

Use DELETE to remove values from an Iceberg table:

DELETE my_iceberg_table
  WHERE b = 'a';
Copy

MERGE

Use MERGE on an Iceberg table:

MERGE INTO my_iceberg_table USING my_snowflake_table
  ON my_iceberg_table.a = my_snowflake_table.a
  WHEN MATCHED THEN
      UPDATE SET my_iceberg_table.b = my_snowflake_table.b
  WHEN NOT MATCHED THEN
      INSERT VALUES (my_snowflake_table.a, my_snowflake_table.b);
Copy

COPY INTO <table>

Use COPY INTO <table> to load data into an Iceberg table.

COPY INTO customer_iceberg_ingest
  FROM @my_parquet_stage
  FILE_FORMAT = 'my_parquet_format'
  LOAD_MODE = ADD_FILES_COPY
  PURGE = TRUE
  MATCH_BY_COLUMN_NAME = CASE_SENSITIVE;
Copy

For more information, see Load data into Apache Iceberg™ tables for more information.

Change Data Capture using streams

A table stream tracks changes made to rows in a source table for Change Data Capture (CDC). The source table can be a standard Snowflake table, a Snowflake-managed Iceberg table, or an externally managed Iceberg table. You can insert the changes into an externally managed Iceberg table using the INSERT INTO… SELECT FROM… command.

Note

If your source table is an externally managed Iceberg table, you must use INSERT_ONLY = TRUE when you create the stream.

CREATE OR REPLACE STREAM my_stream ON TABLE my_snowflake_table;

//...

INSERT INTO my_iceberg_table(id,name)
  SELECT id, name
  FROM my_stream;
Copy

Using Snowpark

Create a function to copy data into an Iceberg table from a Snowflake table by using Snowpark Python.

def copy_into_iceberg():

  try:
      df = session.table("my_snowflake_table")

      df.write.save_as_table("my_iceberg_table")

  except Exception as e:
      print(f"Error processing {table_name}: {e}")
Copy

Troubleshooting

If an issue occurs when Snowflake attempts to commit table changes to your external catalog, Snowflake returns one of the following error messages.

Error

004185=SQL Execution Error: Failed while committing transaction to external catalog. Error:''{0}''

Or:

004185=SQL Execution Error: Failed while committing transaction to external catalog with unresolvable commit conflicts. Error:''{0}''

Cause

A commit to the external catalog failed, where {0} is the exception returned by the external catalog (if available); otherwise, Snowflake reports Exception unavailable as the cause. The error message includes with unresolvable commit conflicts if Snowflake encountered an unresolvable commit conflict while attempting to commit a transaction to the external catalog.

Error

004500=SQL Execution Error: Cannot verify the status of transaction from external catalog. The statement ''{0}'' with transaction id {1} may or may not have committed to external catalog. Error:''{2}''

Cause

A commit to the external catalog resulted in no response from the external catalog. The message includes the exception returned by the external catalog (if available); otherwise, Snowflake reports Exception unavailable as the cause.

Language: English