Option 1: Configure a Snowflake storage integration to access Amazon S3

本主题介绍如何使用存储集成来允许 Snowflake 从外部(即 S3)暂存区中引用的 Amazon S3 桶读取数据,并向其中写入数据。集成是已命名的第一类 Snowflake 对象,无需传递显式云提供商凭据(如密钥或访问令牌)。集成对象存储 AWS Identity and Access Management (IAM) 用户 ID。您组织中的管理员在 AWS 账户向集成授予 IAM 用户权限。

集成还可以列出桶(和可选路径),以限制用户在创建使用集成的外部暂存区时可以指定的位置。

Note

  • Completing the instructions in this section requires permissions in AWS to create and manage IAM policies and roles. If you are not an AWS administrator, ask your AWS administrator to perform these tasks.
  • Access to S3 storage in government regions using a storage integration is limited to Snowflake accounts hosted on AWS in the same government region.
  • Confirm that Snowflake supports the AWS region that your storage is hosted in. For more information, see Supported cloud regions.

下图显示了 S3 暂存区的集成流程:

Amazon S3 Stage Integration Flow
  1. 外部(即 S3)暂存区在其定义中引用了存储集成对象。
  2. Snowflake 会自动将存储集成与为您的账户创建的 S3 IAM 用户相关联。Snowflake 会创建一个 IAM 用户,供 Snowflake 账户中的所有 S3 存储集成引用。
  3. 您组织中的 AWS 管理员向 IAM 用户授予权限,以访问暂存区定义中引用的桶。请注意,许多外部暂存区对象可以引用不同的桶和路径,并使用相同的存储集成进行身份验证。

当用户从暂存区加载或卸载数据时,Snowflake 会在允许或拒绝访问之前验证桶上授予 IAM 用户的权限。

Configure secure access to cloud storage

本部分介绍如何配置 Snowflake 存储集成对象,以将云存储的身份验证责任委托给 Snowflake Identity and Access Management (IAM) 实体。

第 1 步:配置 S3 桶的访问权限

AWS 访问控制要求

Snowflake 需要对 S3 桶和文件夹拥有以下权限才能访问该文件夹(以及子文件夹)中的文件:

  • s3:GetBucketLocation
  • s3:GetObject
  • s3:GetObjectVersion
  • s3:ListBucket

Note

需要以下额外的权限才能执行其他 SQL 操作:

权限SQL 操作
s3:PutObject将文件卸载到桶中。
s3:DeleteObjectEither automatically purge files from the stage after a successful load or execute REMOVE statements to manually remove files.

作为最佳实践,Snowflake 建议创建一个供 Snowflake 访问 S3 桶的 IAM 策略。然后,您可以将策略附加到角色,并使用由 AWS 为角色生成的安全凭据来访问桶中的文件。

Create an IAM policy

以下分步说明介绍如何在 AWS 管理控制台中配置 Snowflake 的访问权限,以便您可以使用 S3 桶来加载和卸载数据:

  1. 登录 AWS 管理控制台。
  2. From the home dashboard, search for and select IAM.
  3. From the left-hand navigation pane, select Account settings.
  4. Under Security Token Service (STS) in the Endpoints list, find the Snowflake region where your account is located. If the STS status is inactive, move the toggle to Active.
  5. From the left-hand navigation pane, select Policies.
  6. Select Create Policy.
  7. For Policy editor, select JSON.
  8. 添加允许 Snowflake 访问 S3 桶和文件夹的策略文档。

以下策略(采用 JSON 格式)为 Snowflake 提供使用单个桶和文件夹路径加载或卸载数据所需的权限。您还可以使用 PURGE 复制选项来清除数据文件。

将文本复制并粘贴到策略编辑器中:

Note

  • Make sure to replace bucket and prefix with your actual bucket name and folder path prefix.
  • The Amazon Resource Names (ARN) for buckets in government regions have a arn:aws-us-gov:s3::: prefix.
  • The ARN for buckets in public AWS regions in China have a arn:aws-cn:s3::: prefix.
  • If you’re using an S3 access point, specify the access point ARN instead of a bucket ARN. For more information, see Configuring IAM policies for using access points (https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-points-policies.html).

{
 "Version": "2012-10-17",
 "Statement": [
     {
         "Effect": "Allow",
         "Action": [
           "s3:PutObject",
           "s3:GetObject",
           "s3:GetObjectVersion",
           "s3:DeleteObject",
           "s3:DeleteObjectVersion"
         ],
         "Resource": "arn:aws:s3:::<bucket>/<prefix>/*"
     },
     {
         "Effect": "Allow",
         "Action": [
             "s3:ListBucket",
             "s3:GetBucketLocation"
         ],
         "Resource": "arn:aws:s3:::<bucket>",
         "Condition": {
             "StringLike": {
                 "s3:prefix": [
                     "<prefix>/*"
                 ]
             }
         }
     }
 ]
}

Note

Setting the "s3:prefix": condition to either ["*"] or ["<path>/*"] grants access to all prefixes in the specified bucket or path in the bucket, respectively.

注意 AWS 策略支持各种不同的安全用例。

The following policy provides Snowflake with the required permissions to load data from a single read-only bucket and folder path. The policy includes the s3:GetBucketLocation, s3:GetObject, s3:GetObjectVersion, and s3:ListBucket permissions:

替代策略:从只读 S3 桶加载

{
 "Version": "2012-10-17",
 "Statement": [
     {
         "Effect": "Allow",
         "Action": [
           "s3:GetObject",
           "s3:GetObjectVersion"
         ],
         "Resource": "arn:aws:s3:::<bucket>/<prefix>/*"
     },
     {
         "Effect": "Allow",
         "Action": [
             "s3:ListBucket",
             "s3:GetBucketLocation"
         ],
         "Resource": "arn:aws:s3:::<bucket>",
         "Condition": {
             "StringLike": {
                 "s3:prefix": [
                     "<prefix>/*"
                 ]
             }
         }
     }
 ]
}
  1. Select Next.
  2. Enter a Policy name (for example, snowflake_access) and an optional Description.
  3. Select Create policy.

Step 2: Create the IAM role in AWS

To configure access permissions for Snowflake in the AWS Management Console, do the following:

  1. From the left-hand navigation pane in the Identity and Access Management (IAM) Dashboard, select Roles.
  2. Select Create role.
  3. Select AWS account as the trusted entity type.
  4. Select Another AWS account
Select trusted entity page in AWS Management Console
  1. In the Account ID field, enter your own AWS account ID temporarily. Later, you modify the trust relationship and grant access to Snowflake.

  2. Select the Require external ID option. An external ID is used to grant access to your AWS resources (such as S3 buckets) to a third party like Snowflake.

    Enter a placeholder ID such as 0000. In a later step, you will modify the trust relationship for your IAM role and specify the external ID for your storage integration.

  3. Select Next.

  4. Select the policy you created in Step 1: Configure access permissions for the S3 bucket_ (in this topic).

  5. Select Next.

Review Page in AWS Management Console
  1. Enter a name and description for the role, then select Create role.

You have now created an IAM policy for a bucket, created an IAM role, and attached the policy to the role.

  1. On the role summary page, locate and record the Role ARN value. In the next step, you will create a Snowflake integration that references this role.

Note

Snowflake caches the temporary credentials for a period that cannot exceed the 60-minute expiration time. If you revoke access from Snowflake, users might be able to list files and access data from the cloud storage location until the cache expires.

Step 3: Create a cloud storage integration in Snowflake

Create a storage integration using the CREATE STORAGE INTEGRATION command. A storage integration is a Snowflake object that stores a generated identity and access management (IAM) user for your S3 cloud storage, along with an optional set of allowed or blocked storage locations (that is, buckets). Cloud provider administrators in your organization grant permissions on the storage locations to the generated user. This option allows users to avoid supplying credentials when creating stages or loading data.

A single storage integration can support multiple external (that is, S3) stages. The URL in the stage definition must align with the S3 buckets (and optional paths) specified for the STORAGE_ALLOWED_LOCATIONS parameter.

Note

Only account administrators (users with the ACCOUNTADMIN role) or a role with the global CREATE INTEGRATION privilege can execute this SQL command.

CREATE STORAGE INTEGRATION <integration_name>
  TYPE = EXTERNAL_STAGE
  STORAGE_PROVIDER = 'S3'
  ENABLED = TRUE
  STORAGE_AWS_ROLE_ARN = '<iam_role>'
  STORAGE_ALLOWED_LOCATIONS = ('<protocol>://<bucket>/<path>/', '<protocol>://<bucket>/<path>/')
  [ STORAGE_BLOCKED_LOCATIONS = ('<protocol>://<bucket>/<path>/', '<protocol>://<bucket>/<path>/') ]

Where:

  • integration_name is the name of the new integration.

  • iam_role is the Amazon Resource Name (ARN) of the role you created in Step 2: Create the IAM role in AWS (in this topic).

  • protocol is one of the following:

    • s3 refers to S3 storage in public AWS regions outside of China.
    • s3china refers to S3 storage in public AWS regions in China.
    • s3gov refers to S3 storage in government regions.
  • bucket is the name of a S3 bucket that stores your data files (for example, mybucket). The required STORAGE_ALLOWED_LOCATIONS parameter and optional STORAGE_BLOCKED_LOCATIONS parameter restrict or block access to these buckets, respectively, when stages that reference this integration are created or modified.

  • path is an optional path that can be used to provide granular control over objects in the bucket.

The following example creates an integration that allows access to all buckets in the account but blocks access to the defined sensitivedata folders.

Additional external stages that also use this integration can reference the allowed buckets and paths:

CREATE STORAGE INTEGRATION s3_int
  TYPE = EXTERNAL_STAGE
  STORAGE_PROVIDER = 'S3'
  ENABLED = TRUE
  STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::001234567890:role/myrole'
  STORAGE_ALLOWED_LOCATIONS = ('*')
  STORAGE_BLOCKED_LOCATIONS = ('s3china://mybucket1/mypath1/sensitivedata/', 's3china://mybucket2/mypath2/sensitivedata/');

Note

Optionally, use the STORAGE_AWS_EXTERNAL_ID parameter to specify your own external ID. You might choose this option to use the same external ID across multiple external volumes and/or storage integrations.

Step 4: Retrieve the AWS IAM user for your Snowflake account

  1. To retrieve the ARN for the IAM user that was created automatically for your Snowflake account, use the DESCRIBE INTEGRATION.

    DESC INTEGRATION <integration_name>;

    Where:

    For example:

    DESC INTEGRATION s3_int;
    +---------------------------+---------------+--------------------------------------------------------------------------------+------------------+
    | property                  | property_type | property_value                                                                 | property_default |
    +---------------------------+---------------+--------------------------------------------------------------------------------+------------------|
    | ENABLED                   | Boolean       | true                                                                           | false            |
    | STORAGE_ALLOWED_LOCATIONS | List          | s3china://mybucket1/mypath1/,s3china://mybucket2/mypath2/                                | []               |
    | STORAGE_BLOCKED_LOCATIONS | List          | s3china://mybucket1/mypath1/sensitivedata/,s3china://mybucket2/mypath2/sensitivedata/    | []               |
    | STORAGE_AWS_IAM_USER_ARN  | String        | arn:aws:iam::123456789001:user/abc1-b-self1234                                 |                  |
    | STORAGE_AWS_ROLE_ARN      | String        | arn:aws:iam::001234567890:role/myrole                                          |                  |
    | STORAGE_AWS_EXTERNAL_ID   | String        | MYACCOUNT_SFCRole=2_a123456/s0aBCDEfGHIJklmNoPq=                               |                  |
    +---------------------------+---------------+--------------------------------------------------------------------------------+------------------+
  2. Record the values for the following properties:

PropertyDescription
STORAGE_AWS_IAM_USER_ARNThe AWS IAM user created for your Snowflake account; for example, arn:aws:iam::123456789001:user/abc1-b-self1234. Snowflake provisions a single IAM user for your entire Snowflake account. All S3 storage integrations in your account use that IAM user.
STORAGE_AWS_EXTERNAL_IDThe external ID that Snowflake uses to establish a trust relationship with AWS. If you didn’t specify an external ID (STORAGE_AWS_EXTERNAL_ID) when you created the storage integration, Snowflake generates an ID for you to use.

You provide these values in the next section.

Step 5: Grant the IAM user permissions to access bucket objects

The following step-by-step instructions describe how to configure IAM access permissions for Snowflake in your AWS Management Console so that you can use a S3 bucket to load and unload data:

  1. Sign in to the AWS Management Console.

  2. Select IAM.

  3. From the left-hand navigation pane, select Roles.

  4. Select the role you created in Step 2: Create the IAM role in AWS (in this topic).

  5. Select the Trust relationships tab.

  6. Select Edit trust policy.

  7. Modify the policy document with the DESC STORAGE INTEGRATION output values you recorded in Step 4: Retrieve the AWS IAM user for your Snowflake account (in this topic):

    Policy document for IAM role

    {
      "Version": "2012-10-17",
      "Statement": [
     {
       "Sid": "",
       "Effect": "Allow",
       "Principal": {
         "AWS": "<snowflake_user_arn>"
       },
       "Action": "sts:AssumeRole",
       "Condition": {
         "StringEquals": {
           "sts:ExternalId": "<snowflake_external_id>"
         }
       }
     }
      ]
    }

    Where:

  • snowflake_user_arn is the STORAGE_AWS_IAM_USER_ARN value you recorded.
  • snowflake_external_id is the STORAGE_AWS_EXTERNAL_ID value you recorded.

In this example, the snowflake_external_id value is MYACCOUNT_SFCRole=2_a123456/s0aBCDEfGHIJklmNoPq=.

Note

For security reasons, if you create a new storage integration (or recreate an existing storage integration using the CREATE OR REPLACE STORAGE INTEGRATION syntax) without specifying an external ID, the new integration has a different external ID and can’t resolve the trust relationship unless you update the trust policy.

  1. Select Update policy to save your changes.

Note

Snowflake caches the temporary credentials for a period that cannot exceed the 60-minute expiration time. If you revoke access from Snowflake, users might be able to list files and load data from the cloud storage location until the cache expires.

Note

You can use the SYSTEM$VALIDATE_STORAGE_INTEGRATION function to validate the configuration for your storage integration.

第 6 步:创建外部暂存区

Create an external (i.e. S3) stage that references the storage integration you created in Step 3: Create a Cloud Storage Integration in Snowflake (in this topic).

Note

创建使用存储集成的暂存区需要具有以下权限的角色:架构的 CREATE STAGE 权限以及存储集成的 USAGE 权限。例如:

GRANT CREATE STAGE ON SCHEMA public TO ROLE myrole;

GRANT USAGE ON INTEGRATION s3_int TO ROLE myrole;

Create the stage using the CREATE STAGE command.

For example, set mydb.public as the current database and schema for the user session, and then create a stage named my_s3_stage. In this example, the stage references the S3 bucket and path mybucket1/path1, which are supported by the integration. The stage also references a named file format object called my_csv_format:

USE SCHEMA mydb.public;

CREATE STAGE my_s3_stage
  STORAGE_INTEGRATION = s3_int
  URL = 's3china://bucket1/path1/'
  FILE_FORMAT = my_csv_format;

Note

  • 暂存区所有者(即具有暂存区 OWNERSHIP 权限的角色)必须拥有存储集成的 USAGE 权限。

  • Append a forward slash (/) to the URL value to filter to the specified folder path. If the forward slash is omitted, all files and folders starting with the prefix for the specified path are included.

    Note that the forward slash is required to access and retrieve unstructured data files in the stage.

  • To load or unload data from or to a stage that uses an integration, a role must have the USAGE privilege on the stage. It is not necessary to also have the USAGE privilege on the storage integration.

  • STORAGE_INTEGRATION 参数与其他暂存区参数分开处理,例如 FILE_FORMAT。无论用于访问 S3 桶的集成如何,对这些其他参数的支持都是相同的。

Next: AWS data file encryption