CREATE STORAGE INTEGRATION¶
在账户中创建新的存储集成或替换现有集成。
存储集成是一个 Snowflake 对象,用于存储为外部云存储生成的 Identity and Access Management (IAM) 实体,以及一组可选的允许或阻止的存储位置(Amazon S3、Google Cloud Storage 或 Microsoft Azure)。组织中的云提供商管理员会将存储位置的权限授予生成的实体。此选项可让用户在创建暂存区或加载、卸载数据时无需提供凭据。
单个存储集成可以支持多个外部暂存区。暂存区定义中的 URL 必须与为 STORAGE_ALLOWED_LOCATIONS 参数指定的存储位置一致。
Note
-
If your cloud storage is located on a different cloud platform from your Snowflake account, the storage location must be in the public cloud and not a virtual private environment.
Snowflake charges a per-byte fee when you unload data from Snowflake into an external stage in a different region or different cloud provider. For details, see the pricing page.
-
Accessing cloud storage in a government region using a storage integration is limited to Snowflake accounts hosted in the same government region.
Similarly, if you need to access cloud storage in a region in China, you can use a storage integration only from a Snowflake account hosted in the same region in China.
In these cases, use the CREDENTIALS parameter in the CREATE STAGE command (rather than using a storage integration) to provide the credentials for authentication.
- See also:
ALTER STORAGE INTEGRATION , DROP INTEGRATION , SHOW INTEGRATIONS
语法
其中:
必填参数
name字符串,指定集成的标识符(即名称);在账户中必须是唯一的。
In addition, the identifier must start with an alphabetic character and cannot contain spaces or special characters unless the entire identifier string is enclosed in double quotes (e.g.
"My object"). Identifiers enclosed in double quotes are also case-sensitive.For more details, see Identifier requirements.
TYPE = { EXTERNAL_STAGE | POSTGRES_EXTERNAL_STORAGE | POSTGRES_INTERNAL_STORAGE }指定集成类型:
-
EXTERNAL_STAGE: Creates an interface between Snowflake and an external cloud storage location. -
POSTGRES_EXTERNAL_STORAGE: Creates a storage integration for use with Snowflake Postgres. Only one storage location is allowed for this type of integration. -
POSTGRES_INTERNAL_STORAGE: Creates a storage integration for use with the managed storage associated with a Snowflake Postgres instance. For full syntax, see CREATE STORAGE INTEGRATION (Postgres Internal Storage).
-
ENABLED = { TRUE | FALSE }指定此存储集成是否可在暂存区中使用。
TRUEallows users to create new stages that reference this integration. Existing stages that reference this integration function normally.FALSEprevents users from creating new stages that reference this integration. Existing stages that reference this integration cannot access the storage location in the stage definition.
该值不区分大小写。
The default is
TRUE.STORAGE_ALLOWED_LOCATIONS = ( 'cloud_specific_url' )Explicitly limits external stages that use the integration to reference one or more storage locations (i.e. S3 bucket, GCS bucket, or Azure container). Supports a comma-separated list of URLs for existing buckets and, optionally, paths used to store data files for loading/unloading. Alternatively supports the
*wildcard, meaning “allow access to all buckets and/or paths”.Amazon S3
STORAGE_ALLOWED_LOCATIONS = ( 'protocol://bucket/path/' [ , 'protocol://bucket/path/' ... ] )-
protocolis one of the following:s3refers to S3 storage in public AWS regions outside of China.s3chinarefers to S3 storage in public AWS regions in China.s3govrefers to S3 storage in government regions.
-
bucketis the name of an S3 bucket that stores your data files (e.g.mybucket). -
pathis an optional case-sensitive path for files in the cloud storage location (i.e. files have names that begin with a common string) that limits access to a set of files. Paths are alternatively called prefixes or folders by different cloud storage services.
Google Cloud Storage
STORAGE_ALLOWED_LOCATIONS = ( 'gcs://bucket/path/' [ , 'gcs://bucket/path/' ... ] )bucketis the name of a GCS bucket that stores your data files (e.g.mybucket).pathis an optional case-sensitive path for files in the cloud storage location (i.e. files have names that begin with a common string) that limits access to a set of files. Paths are alternatively called prefixes or folders by different cloud storage services.
Microsoft Azure
STORAGE_ALLOWED_LOCATIONS = ( 'azure://account.blob.core.windows.net/container/path/' [ , 'azure://account.blob.core.windows.net/container/path/' ... ] )accountis the name of the Azure storage account (e.g.myaccount). Use theblob.core.windows.netendpoint for all supported types of Azure blob storage accounts, including Data Lake Storage Gen2.containeris the name of a Azure blob storage container that stores your data files (e.g.mycontainer).pathis an optional case-sensitive path for files in the cloud storage location (i.e. files have names that begin with a common string) that limits access to a set of files. Paths are alternatively called prefixes or folders by different cloud storage services.
Microsoft Fabric OneLake
STORAGE_ALLOWED_LOCATIONS = ( 'azure://onelake.blob.fabric.microsoft.com/workspace_id/item_id/Files/path/' [ , ... ] )onelake.blob.fabric.microsoft.comis the global service root for OneLake. This single endpoint automatically routes requests to the correct geographical region where your data resides.workspace_idis the unique 128-bit GUID of the Fabric Workspace; for example,aab1c234-567d-8901-234e-fgh56789ij.item_idis the unique GUID of the specific Fabric item, such as a Lakehouse or Warehouse.Filesis the mandatory path segment for Lakehouse items. This segment points to the unmanaged section of the lake where you store raw data such as CSV, Parquet, or JSON.pathis an optional case-sensitive path to a specific folder or file prefix. Although optional, providing a path is recommended when loading specific datasets to improve performance and prevent accidental processing of unrelated files.
-
可选参数
STORAGE_BLOCKED_LOCATIONS = ( 'cloud_specific_url' )Explicitly prohibits external stages that use the integration from referencing one or more storage locations (i.e. S3 buckets or GCS buckets). Supports a comma-separated list of URLs for existing storage locations and, optionally, paths used to store data files for loading/unloading. Commonly used when STORAGE_ALLOWED_LOCATIONS is set to the
*wildcard, allowing access to all buckets in your account except for blocked storage locations and, optionally, paths.Note
Make sure to enclose only individual cloud storage location URLs in quotes. If you enclose the entire
STORAGE_BLOCKED_LOCATIONSvalue in quotes, the value is invalid. As a result, theSTORAGE_BLOCKED_LOCATIONSparameter setting is ignored when users create stages that reference the storage integration.Amazon S3
STORAGE_BLOCKED_LOCATIONS = ( 'protocol://bucket/path/' [ , 'protocol://bucket/path/' ... ] )-
protocolis one of the following:s3refers to S3 storage in public AWS regions outside of China.s3chinarefers to S3 storage in public AWS regions in China.s3govrefers to S3 storage in government regions.
-
bucketis the name of an S3 bucket that stores your data files (e.g.mybucket). -
pathis an optional path (or directory) in the bucket that further limits access to the data files.
Google Cloud Storage
STORAGE_BLOCKED_LOCATIONS = ( 'gcs://bucket/path/' [ , 'gcs://bucket/path/' ... ] )bucketis the name of a GCS bucket that stores your data files (e.g.mybucket).pathis an optional path (or directory) in the bucket that further limits access to the data files.
Microsoft Azure
STORAGE_BLOCKED_LOCATIONS = ( 'azure://account.blob.core.windows.net/container/path/' [ , 'azure://account.blob.core.windows.net/container/path/' ... ] )accountis the name of the Azure storage account (e.g.myaccount).containeris the name of a Azure blob storage container that stores your data files (e.g.mycontainer).pathis an optional path (or directory) in the bucket that further limits access to the data files.
Microsoft Fabric OneLake
STORAGE_BLOCKED_LOCATIONS = ( 'azure://onelake.blob.fabric.microsoft.com/workspace_id/item_id/Files/path/' [ , ... ] )workspace_idis the unique 128-bit GUID of the Fabric Workspace.item_idis the unique GUID of the specific Fabric item.Filesis the path segment for Lakehouse items.pathis an optional path that further limits the blocked location.
-
COMMENT = 'string_literal'字符串(字面量),用于指定集成注释。
默认:无值
Cloud provider parameters (cloudProviderParams)¶
Amazon S3
STORAGE_PROVIDER = '{ S3 | S3CHINA | S3GOV }'指定存储数据文件的云存储提供商:
'S3': S3 storage in public AWS regions outside of China.'S3CHINA': S3 storage in public AWS regions in China.'S3GOV': S3 storage in AWS government regions.STORAGE_AWS_ROLE_ARN = 'iam_role'Specifies the Amazon Resource Name (ARN) of the AWS identity and access management (IAM) role that grants privileges on the S3 bucket containing your data files. For more information, see Configuring secure access to Amazon S3.
STORAGE_AWS_EXTERNAL_ID = 'external_id'Optionally specifies an external ID that Snowflake uses to establish a trust relationship with AWS. You must specify the same external ID in the trust policy of the IAM role that you configured for this storage integration. For more information, see How to use an external ID when granting access to your AWS resources to a third party (https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-user_externalid.html).
如果未为此参数指定值,当您创建存储集成时 Snowflake 将自动生成外部 ID。
STORAGE_AWS_OBJECT_ACL = 'bucket-owner-full-control'启用对 AWS 访问控制列表 (ACLs) 的支持,以授予桶拥有者完全控制权。通过卸载的表数据在 Amazon S3 桶中创建的文件由 AWS Identity and Access Management (IAM) 角色所拥有。ACLs 支持以下用例:配置一个 AWS 账户中的 IAM 角色,以访问一个或多个其他 AWS 账户中的 S3 桶。若没有 ACL 支持,桶所有者账户中的用户无法访问使用存储集成卸载到外部 (S3) 暂存区的数据文件。
When users unload Snowflake table data to data files in an S3 stage using COPY INTO <location>, the unload operation applies an ACL to the unloaded data files. The data files apply the
"s3:x-amz-acl":"bucket-owner-full-control"privilege to the files, granting the S3 bucket owner full control over them.USE_PRIVATELINK_ENDPOINT = { TRUE | FALSE }Specifies whether to use outbound private connectivity to harden your security posture. For information about using this parameter, see Private connectivity to external stages for Amazon Web Services.
Google Cloud Storage
STORAGE_PROVIDER = 'GCS'指定存储数据文件的云存储提供商。
Microsoft Azure
STORAGE_PROVIDER = 'AZURE'指定存储数据文件的云存储提供商。
AZURE_TENANT_ID = 'tenant_id'为允许和阻止的存储账户所属的 Office 365 租户指定 ID。存储集成只能对一个租户进行身份验证,因此允许和阻止的存储位置必须引用同属于此租户的存储账户。
To find your tenant ID, log into the Azure portal and click Azure Active Directory » Properties. The tenant ID is displayed in the Tenant ID field.
USE_PRIVATELINK_ENDPOINT = { TRUE | FALSE }Specifies whether to use outbound private connectivity to harden your security posture. For information about using this parameter, see Private connectivity to external stages and Snowpipe automation for Microsoft Azure.
Microsoft Fabric OneLake
STORAGE_PROVIDER = 'AZURE'Specifies the cloud storage provider. Use
'AZURE'for Microsoft Fabric OneLake storage.AZURE_TENANT_ID = 'tenant_id'指定 Fabric 工作区所属的 Microsoft Entra ID(前称 Azure Active Directory)租户的 ID。
Note
Microsoft Fabric OneLake 存储位置不支持私有连接端点 (USE_PRIVATELINK_ENDPOINT)。
访问控制要求
A role used to execute this operation must have the following privileges at a minimum:
| 权限 | 对象 | 备注 |
|---|---|---|
| CREATE INTEGRATION | 账户 | Only the ACCOUNTADMIN role has this privilege by default. The privilege can be granted to additional roles as needed. |
For instructions on creating a custom role with a specified set of privileges, see Creating custom roles.
For general information about roles and privilege grants for performing SQL actions on securable objects, see Overview of Access Control.
使用说明
Caution
重新创建存储集成(使用 CREATE OR REPLACE STORAGE INTEGRATION)会导致存储集成与引用它的任何暂存区之间的关联断开。原因在于,暂存区使用隐藏的 ID(而非存储集成的名称)链接到存储集成。在后台, CREATE OR REPLACE 语法会删除对象,并使用其他隐藏 ID 重新创建对象。
If you must recreate a storage integration after it has been linked to one or more stages, you must reestablish the association between
each stage and the storage integration by executing ALTER STAGE stage_name SET STORAGE_INTEGRATION =
storage_integration_name, where:
stage_nameis the name of the stage.storage_integration_nameis the name of the storage integration.
-
关于元数据:
Attention
Customers should ensure that no personal data (other than for a User object), sensitive data, export-controlled data, or other regulated data is entered as metadata when using the Snowflake service. For more information, see Metadata fields in Snowflake.
- The OR REPLACE and IF NOT EXISTS clauses are mutually exclusive. They can’t both be used in the same statement.
-
CREATE OR REPLACE <object> statements are atomic. That is, when an object is replaced, the old object is deleted and the new object is created in a single transaction.
示例
以下示例创建一个集成,该集成显式限制使用该集成引用两个桶和路径之一的外部暂存区:
Amazon S3
If the S3 storage is in a public AWS region in China, use
'S3CHINA'for the STORAGE_PROVIDER parameter ands3china://protocol in STORAGE_ALLOWED_LOCATIONS.
Google Cloud Storage
Microsoft Azure
The following example creates an integration that allows external stages that use the integration to reference any bucket and path in your account except for those that are explicitly blocked:
Amazon S3
If the S3 storage is in a public AWS region in China, use
'S3CHINA'for the STORAGE_PROVIDER parameter ands3china://protocol in STORAGE_BLOCKED_LOCATIONS.
Google Cloud Storage
Microsoft Azure