Sync a Snowflake-managed table with Snowflake Open Catalog¶
To query a Snowflake-managed Apache Iceberg™ table using a third-party engine such as Apache Spark™, you can sync the table with Snowflake Open Catalog.
This topic covers how to sync a Snowflake-managed Iceberg table with Snowflake Open Catalog using a catalog integration in Snowflake and an external catalog in Open Catalog.
Step 1: Set a BASE_LOCATION_PREFIX¶
Snowflake writes the files for each Iceberg table under a directory that includes a dynamically generated string (random ID).
To ensure that Open Catalog can see all of the Snowflake-managed tables that you sync, we recommend that you use a
BASE_LOCATION_PREFIX (such as my-open-catalog-tables
) at the account, database, or schema level, and
omit the BASE_LOCATION parameter in your CREATE ICEBERG TABLE statements. Doing so organizes the files for all Iceberg tables
that you create in the account, database, or schema under a known directory with the same name as the prefix. For more information, see
Data and metadata directories for Snowflake-managed tables <label-tables_iceberg_configure_external_volume_base_location>
.
The following statement sets a BASE_LOCATION_PREFIX for a schema named open_catalog
:
ALTER SCHEMA open_catalog
SET BASE_LOCATION_PREFIX = 'my-open-catalog-tables';
Step 2: Create an external volume¶
If you don’t have one already, start by creating an external volume in Snowflake that provides access to the cloud storage location where you want to store your table data and metadata.
Note
Don’t include the BASE_LOCATION_PREFIX in the path that you specify for the STORAGE_BASE_URL.
Complete the instructions for your cloud storage service:
Step 3: Configure Open Catalog resources¶
Next, complete the steps in this section to create an external catalog and service connection in your Open Catalog account.
Follow the instructions in Create a catalog to create an external catalog in your Open Catalog account. Make sure that the following settings for the external catalog are configured:
The External toggle is enabled.
The Default base location combines the
STORAGE_BASE_URL
for the external volume you created in Step 2: Create an external volume and theBASE_LOCATION_PREFIX
that you set for the schema; for examplehttps://<storage_base_url>/<base_url_prefix>/
.
Open Catalog syncs your Snowflake-managed tables to this external catalog.
If you don’t already have a service connection for Snowflake, follow the instructions in Configure a service connection to create a connection for the Snowflake engine in your Open Catalog account.
Configure a catalog role for your external catalog with privileges that allow access to your external catalog. For instructions, see Grant privileges to a catalog.
The catalog role must have the following privileges on the catalog:
TABLE_CREATE
TABLE_WRITE_PROPERTIES
TABLE_DROP
NAMESPACE_CREATE
NAMESPACE_DROP
You can either grant each of these privileges to the catalog role, or grant the CATALOG_MANAGE_CONTENT privilege, which includes these privileges. For more information, see Catalog privileges for Snowflake Open Catalog.
Attach the catalog role to the principal role for your service connection. This lets the service connection access the catalog. For instructions, see Grant a catalog role to a principal role.
Step 4: Create a catalog integration for Open Catalog¶
Create a catalog integration for Open Catalog by using the CREATE CATALOG INTEGRATION (Snowflake Open Catalog) command.
For CATALOG_NAME, specify the name of the external catalog that you configured in your Open Catalog account. Snowflake syncs the table and its parent
namespace in Snowflake to this external catalog in Open Catalog. For example, if you have a db1.public.table1
Iceberg table registered in
Snowflake and you specify catalog1
in the catalog integration, Snowflake syncs the table with Open Catalog with the following fully
qualified name: catalog1.db1.public.table1
.
To troubleshoot issues with creating a catalog integration, see You can’t create a catalog integration for Open Catalog.
CREATE OR REPLACE CATALOG INTEGRATION my_open_catalog_int
CATALOG_SOURCE = POLARIS
TABLE_FORMAT = ICEBERG
REST_CONFIG = (
CATALOG_URI = 'https://<orgname>-<my-snowflake-open-catalog-account-name>.snowflakecomputing.cn/polaris/api/catalog'
CATALOG_NAME = 'myOpenCatalogExternalCatalogName'
)
REST_AUTHENTICATION = (
TYPE = OAUTH
OAUTH_CLIENT_ID = 'myClientId'
OAUTH_CLIENT_SECRET = 'myClientSecret'
OAUTH_ALLOWED_SCOPES = ('PRINCIPAL_ROLE:ALL')
)
ENABLED = TRUE;
Note
You can use this catalog integration to sync one or more Snowflake-managed tables.
Step 5: Set up catalog sync¶
For Snowflake to sync Snowflake-managed Iceberg tables to Open Catalog, you must specify the external catalog in Open Catalog that Snowflake should sync the tables to. To configure this, you set the CATALOG_SYNC parameter to the name of a catalog integration for Open Catalog.
Set CATALOG_SYNC at the database level¶
This example sets the CATALOG_SYNC parameter at the database level. After you run these statements, Snowflake syncs all Snowflake-managed Iceberg tables in
the db1
database to the external catalog in Open Catalog that you specified for the my_open_catalog_int
catalog integration.
For more information, see the ALTER DATABASE command.
ALTER DATABASE db1 SET CATALOG_SYNC = 'my_open_catalog_int';
You can also set CATALOG_SYNC at the database level when you create a database. For example:
CREATE DATABASE db2
CATALOG_SYNC = 'my_open_catalog_int';
For more information, see CREATE DATABASE.
Set CATALOG_SYNC at the schema level¶
This example sets the CATALOG_SYNC parameter at the schema level. After you run these statements, Snowflake syncs all Snowflake-managed Iceberg tables in the
public
schema to the external catalog in Open Catalog that you specified for the my_open_catalog_int
catalog integration. For more
information, see the ALTER SCHEMA command.
ALTER SCHEMA public SET CATALOG_SYNC = 'my_open_catalog_int';
You can also set CATALOG_SYNC at the schema level when you create a schema. For example:
CREATE SCHEMA schema1
CATALOG_SYNC = 'my_open_catalog_int';
For more information, see CREATE SCHEMA.
Note
You can also do the following:
Set CATALOG_SYNC at the account or table level.
Override CATALOG_SYNC at different levels. For example, you can set CATALOG_SYNC at the database level but then override its value for the
myschema
schema within the database. As a result, the Snowflake-managed Iceberg tables in themyschema
schema sync to a different external catalog in Open Catalog than the other Snowflake-managed Iceberg tables in the database.
For more information, see CATALOG_SYNC and Parameter hierarchy and types.
To see the name of the catalog integration for Open Catalog that a Snowflake-managed Iceberg table syncs to, run the SHOW ICEBERG TABLES command and see the
catalog_sync_name
column in the output.
Step 6: Create a Snowflake-managed table¶
Create a Snowflake-managed Iceberg table by using the CREATE ICEBERG TABLE (Snowflake as the Iceberg catalog) command.
Important
To ensure that access privileges in Open Catalog are enforced correctly on the table, make sure the table meets certain conditions before creating it. These conditions relate to the directory structure hierarchy for the catalog. For these conditions and instructions on how to meet them, see the note in Organize catalog content in the Snowflake Open Catalog documentation.
USE SCHEMA open_catalog;
CREATE OR REPLACE ICEBERG TABLE my_iceberg_table (col1 INT)
CATALOG = 'SNOWFLAKE'
EXTERNAL_VOLUME = 'my_external_volume';
For the BASE_LOCATION_PREFIX (my-open-catalog-tables
) and table name (my_iceberg_table
) used in the previous example statements,
Snowflake writes the table files to the following paths:
STORAGE_BASE_URL/my-open-catalog-tables/my_iceberg_table.randomId/data/
STORAGE_BASE_URL/my-open-catalog-tables/my_iceberg_table.randomId/metadata/
When you modify the table in Snowflake, the changes are automatically synchronized with the external catalog in your Open Catalog account. Other engines such as Apache Spark™ can query the table by connecting to Open Catalog.
To troubleshoot issues with creating a Snowflake-managed table, see You can’t create a Snowflake-managed table.