Write support for externally managed Apache Iceberg™ tables¶
Write support for externally managed Apache Iceberg™ tables lets you perform write operations on tables managed by an external Iceberg REST catalog. The Iceberg table in Snowflake is linked to a table in your remote catalog. When you make changes to the table in Snowflake, Snowflake commits the same changes to your remote catalog.
This preview feature expands interoperability between Snowflake and third-party systems, allowing you to use Snowflake for data engineering workloads with Iceberg even when you use an external Iceberg catalog.
Key use cases include the following:
Building complex data engineering pipelines with Iceberg tables: Writing to Iceberg tables in external catalogs from Snowflake allows you to use Snowpark or Snowflake SQL to build complex pipelines that ingest, transform, and process data for Iceberg tables. You can query the data by using Snowflake or other engines. Similarly, you can use Snowflake partner tools to build your Iceberg data engineering pipelines.
Making your data available to the Iceberg ecosystem: The ability to write to Iceberg tables in external catalogs lets you make your data available to the Iceberg ecosystem. You can query data that’s already in Snowflake and write it to Iceberg tables. To keep your Iceberg tables in sync with your Snowflake tables, you can use operations like INSERT INTO … SELECT FROM to do the following:
Copy existing data from a standard Snowflake table into an Iceberg table.
Insert data with Snowflake streams.
Considerations¶
Consider the following when you use write support for externally managed Iceberg tables:
Snowflake provides Data Definition Language (DDL) and Data Manipulation Language (DML) commands for externally managed tables. However, you configure metadata and data retention using your external catalog and the tools provided by your external storage provider. For more information, see Tables that use an external catalog.
For writes, Snowflake ensures that changes are committed to your remote catalog before updating the table in Snowflake.
If you use a catalog-linked database, you can use the CREATE ICEBERG TABLE syntax with column definitions to create a table in Snowflake and in your remote catalog. If you use a standard Snowflake database (not linked to a catalog), you must first create a table in your remote catalog. After that, you can use the CREATE ICEBERG TABLE (Iceberg REST catalog) syntax to create an Iceberg table in Snowflake and write to it.
For the AWS Glue Data Catalog: Dropping an externally managed table through Snowflake does not delete the underlying table files. This behavior is specific to the AWS Glue Data Catalog implementation.
If you participated in private preview for this feature, position row-level deletes might be enabled by default in your account. To check, run the following command:
SHOW PARAMETERS LIKE 'ENABLE_ICEBERG_MERGE_ON_READ' IN ACCOUNT;
If the parameter is visible in your account (regardless of its value), position deletes are enabled. To turn off position deletes, set the
ENABLE_ICEBERG_MERGE_ON_READ
parameter to FALSE at the table, schema, or database level.Writing to externally managed tables with the following Iceberg data types isn’t supported:
uuid
fixed(L)
The following features aren’t currently supported when you use Snowflake to write to externally managed Iceberg tables:
Catalog-vended credentials.
Server-side encryption (SSE) for GCS or Azure external volumes.
Multi-statement transactions. Snowflake supports autocommit transactions only.
Conversion to Snowflake-managed tables.
External Iceberg catalogs that don’t conform to the Iceberg REST protocol.
Row-level deletes (merge-on-read).
Using the OR REPLACE option when creating a table.
Using the CREATE TABLE … AS SELECT syntax if your use AWS Glue as your remote catalog.
For creating schemas in a catalog-linked database, be aware of the following:
The CREATE SCHEMA command creates a corresponding namespace in your remote catalog only when you use a catalog-linked database.
The ALTER and CLONE options aren’t supported.
Delimiters aren’t supported for schema names. Only alphanumeric schema names are supported.
Workflow¶
Use the workflow in this section to get started with this feature:
Choose from the following options:
Create a catalog-linked database. With this option, you can write to auto-discovered Iceberg tables in your catalog, or use the catalog-linked database to create additional Iceberg tables.
Create an Iceberg table (in a catalog-linked database or a standard Snowflake database). With this option, you must first create a table in your remote catalog before you create an externally managed Iceberg table in Snowflake.
After you complete these steps, you can perform write operations on your Iceberg tables.
Configuring an external volume for writes to externally managed tables¶
To enable writes to externally managed tables, you must use an external volume configured with both read and write permissions for the Snowflake service principal (the same permissions required for Snowflake-managed tables).
Specify the storage location (STORAGE_BASE_URL) where your Iceberg REST catalog writes Iceberg data and metadata.
For example, the following statement creates an external volume for S3 with encryption that allows write access to a bucket named
my-iceberg-tables
:
CREATE OR REPLACE EXTERNAL VOLUME my_external_volume
STORAGE_LOCATIONS =
(
(
NAME = 'my-s3-us-west-2'
STORAGE_PROVIDER = 'S3'
STORAGE_BASE_URL = 's3://my-iceberg-tables/'
STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::123456789012:role/my-write-access-role'
ENCRYPTION = ( TYPE = 'AWS_SSE_S3' )
)
)
ALLOW_WRITES = TRUE;
For complete instructions, see the following topics:
For more information about the required permissions, see Granting Snowflake access to your storage.
Configuring a catalog integration¶
For writing to externally managed Iceberg tables, you must configure a catalog integration to connect Snowflake to your remote catalog.
Your remote catalog must comply with the open source Apache Iceberg REST OpenAPI specification (https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml), such as Open Catalog or the AWS Glue Iceberg REST catalog (https://docs.aws.amazon.com/glue/latest/dg/connect-glu-iceberg-rest.html).
To create a catalog integration, see the instructions in the following topics:
Configure a catalog integration for AWS Glue Iceberg REST. Use the permissions in the read and write example IAM policy.
Note
If you currently use a catalog integration for AWS Glue, you must create a new REST catalog integration for the AWS Glue Iceberg REST endpoint.
Configure a catalog integration for Apache Iceberg™ REST catalogs
Create a catalog-linked database¶
Snowflake supports creating writable externally managed tables in a catalog-linked database, which is a Snowflake database that you sync with an external Iceberg REST catalog. You can also write to Iceberg tables that Snowflake automatically discovers in your remote catalog. For more information, see Use a catalog-linked database for Apache Iceberg™ tables.
Note
Alternatively, you can create writable externally managed Iceberg tables in a standard Snowflake database.
The following example uses the CREATE DATABASE (catalog-linked) command to create a catalog-linked database that uses an external volume:
CREATE DATABASE my_catalog_linked_db
LINKED_CATALOG = (
CATALOG = 'glue_rest_catalog_int'
),
EXTERNAL_VOLUME = 'my_external_vol';
Use CREATE SCHEMA to create namespaces in your external catalog¶
To create a namespace for organizing Iceberg tables in your external catalog, you can use the CREATE SCHEMA command with a catalog-linked database. The command creates a namespace in your linked Iceberg REST catalog and a corresponding schema in your Snowflake database.
CREATE SCHEMA 'my_namespace';
Note
Schema names must be alphanumeric and can’t include delimiters.
DROP SCHEMA¶
You can also use the DROP SCHEMA command to simultaneously drop a schema from your catalog-linked database and its corresponding namespace from your remote catalog.
DROP SCHEMA 'my_namespace';
Create an Iceberg table¶
Creating an externally managed Iceberg table that you can write to from Snowflake varies, depending on the kind of database you use:
If you use a catalog-linked database, you can use the CREATE ICEBERG TABLE (catalog-linked database) to create a table and register it in your remote catalog. For instructions, see Create an Iceberg table in a catalog-linked database.
If you use a standard Snowflake database (not linked to a catalog), you must first create a a table in your remote catalog. Then, you can use the CREATE ICEBERG TABLE (Iceberg REST catalog) syntax to create an Iceberg table in Snowflake and write to it. For instructions, see Create an Iceberg table in a standard Snowflake database.
Create an Iceberg table in a catalog-linked database¶
To create a table in Snowflake and in your external catalog at the same time, use the CREATE ICEBERG TABLE (catalog-linked database) command.
The following example creates a writable Iceberg table by using the previously created external volume and catalog integration for AWS Glue REST.
USE DATABASE my_catalog_linked_db;
USE SCHEMA my_namespace;
CREATE OR REPLACE ICEBERG TABLE my_iceberg_table (
first_name string,
last_name string,
amount int,
create_date date
);
When you run the command, Snowflake creates a new Iceberg table in your remote catalog and a linked, writable externally managed table in Snowflake.
Create an Iceberg table in a standard Snowflake database¶
If using a standard Snowflake database, you must first create a table in your remote catalog. For example, you might use Spark to write an Iceberg table to Open Catalog.
After you create the table in your remote catalog, use the CREATE ICEBERG TABLE (Iceberg REST catalog) command to create an Iceberg table object in Snowflake. For the CATALOG_TABLE_NAME, specify the name of the table as it appears in your remote catalog.
For example:
CREATE OR REPLACE ICEBERG TABLE my_iceberg_table
EXTERNAL_VOLUME = 'my_external_volume'
CATALOG = 'my_rest_catalog_integration'
CATALOG_TABLE_NAME = 'my_remote_table_name';
When you run the command, Snowflake creates a writable externally managed table in Snowflake that is linked to the existing table in your remote catalog.
Writing to externally managed Iceberg tables¶
You can use the following DML commands for externally managed Iceberg tables:
COPY INTO <table>. For more information, see Load data into Apache Iceberg™ tables.
You can also use the Snowpark API to process Iceberg tables.
Examples¶
You can use the following basic examples to get started with writing to Iceberg tables.
INSERT¶
Use INSERT to insert values into an Iceberg table:
INSERT INTO my_iceberg_table VALUES (1, 'a');
INSERT INTO my_iceberg_table VALUES (2, 'b');
INSERT INTO my_iceberg_table VALUES (3, 'c');
UPDATE¶
Use UPDATE to update the values in an Iceberg table:
UPDATE my_iceberg_table
SET a = 10
WHERE b = 'b';
DELETE¶
Use DELETE to remove values from an Iceberg table:
DELETE my_iceberg_table
WHERE b = 'a';
MERGE¶
Use MERGE on an Iceberg table:
MERGE INTO my_iceberg_table USING my_snowflake_table
ON my_iceberg_table.a = my_snowflake_table.a
WHEN MATCHED THEN
UPDATE SET my_iceberg_table.b = my_snowflake_table.b
WHEN NOT MATCHED THEN
INSERT VALUES (my_snowflake_table.a, my_snowflake_table.b);
COPY INTO <table>¶
Use COPY INTO <table> to load data into an Iceberg table.
COPY INTO customer_iceberg_ingest
FROM @my_parquet_stage
FILE_FORMAT = 'my_parquet_format'
LOAD_MODE = ADD_FILES_COPY
PURGE = TRUE
MATCH_BY_COLUMN_NAME = CASE_SENSITIVE;
For more information, see Load data into Apache Iceberg™ tables for more information.
Change Data Capture using streams¶
A table stream tracks changes made to rows in a source table for Change Data Capture (CDC). The source table can be a standard Snowflake table, a Snowflake-managed Iceberg table, or an externally managed Iceberg table. You can insert the changes into an externally managed Iceberg table using the INSERT INTO… SELECT FROM… command.
Note
If your source table is an externally managed Iceberg table, you must use INSERT_ONLY = TRUE when you create the stream.
CREATE OR REPLACE STREAM my_stream ON TABLE my_snowflake_table;
//...
INSERT INTO my_iceberg_table(id,name)
SELECT id, name
FROM my_stream;
Using Snowpark¶
Create a function to copy data into an Iceberg table from a Snowflake table by using Snowpark Python.
def copy_into_iceberg():
try:
df = session.table("my_snowflake_table")
df.write.save_as_table("my_iceberg_table")
except Exception as e:
print(f"Error processing {table_name}: {e}")
Troubleshooting¶
If an issue occurs when Snowflake attempts to commit table changes to your external catalog, Snowflake returns one of the following error messages.
Error |
004185=SQL Execution Error: Failed while committing transaction to external catalog. Error:''{0}''
Or: 004185=SQL Execution Error: Failed while committing transaction to external catalog with unresolvable commit conflicts. Error:''{0}''
|
---|---|
Cause |
A commit to the external catalog failed, where |
Error |
004500=SQL Execution Error: Cannot verify the status of transaction from external catalog. The statement ''{0}'' with transaction id {1} may or may not have committed to external catalog. Error:''{2}''
|
---|---|
Cause |
A commit to the external catalog resulted in no response from the external catalog.
The message includes the exception returned by the external catalog (if available); otherwise, Snowflake reports
|