将块存储卷与服务一起使用

对于容器化的应用程序,Snowflake 支持以下存储卷类型:Snowflake 内部暂存区、本地存储、内存存储卷和块存储卷。

在服务规范中指定块存储

要创建使用块存储的服务(包括作业服务),请在服务规范中提供必要的配置,如下所示:

第 1 步:定义块存储卷

Specify the spec.volumes field to define the block storage volumes to create.

spec:
  containers:
  ...
  volumes:
    - name: <name>
      source: block
      size: <size-in-Gi>
      blockConfig:                             # optional
        initialContents:
          fromSnapshot: <snapshot-name>
        iops: <number-of-operations>
        throughput: <MiB-per-second>
        encryption: SNOWFLAKE_SSE | SNOWFLAKE_FULL

以下字段为必填字段:

  • name: Name of the volume.
  • source: Type of the volume. For block storage volume, the value is block.
  • size: Storage capacity of the block storage volume measured in bytes. The value must always be an integer, specified using the Gi unit suffix. For example, 5Gi means 5*1024*1024*1024 bytes. The size value ranges for cloud providers:
    • 1Gi to 65536Gi for AWS.
    • 1Gi to 16384Gi for Azure.
    • 4Gi to 16384Gi for Google Cloud.

以下是可选字段:

  • blockConfig.initialContents.fromSnapshot: Specifies a previously taken snapshot of another volume to initialize the block volume. The snapshot name can be a fully qualified object identifier, such as TUTORIAL_DB.DATA_SCHEMA.MY_SNAPSHOT. Also, the snapshot name is resolved relative to the database and the schema of the service. For example, if you created your service in TUTORIAL_DB.DATA_SCHEMA, then fromSnapshot: MY_SNAPSHOT is equivalent to fromSnapshot: TUTORIAL_DB.DATA_SCHEMA.MY_SNAPSHOT.

请注意以下事项:

  • 快照必须处于 CREATED 状态,才能用于创建卷,否则服务创建将失败。
  • 快照的加密类型必须与正在创建的卷的加密类型匹配。

Use the DESCRIBE SNAPSHOT command to get the snapshot’s status and encryption type.

  • blockConfig.iops: Specifies the supported peak number of input/output operations per second. Note that the data size per operation is capped at 256 KiB.

    • For AWS: The supported range is 3000-80000, with a default of 3000.
    • 对于 Azure:支持范围为 3000-80000,默认值为 3000。
    • 对于 Google Cloud:
      • Google Cloud CPU instances: The supported range is 2000-160000, with the following defaults:

        • 2000 IOPS(适用于 4 Gi 磁盘大小)
        • 2500 IOPS(适用于 5 Gi 磁盘大小)
        • 3000 IOPS(适用于所有其他磁盘大小)
      • Google Cloud GPU instances: Snowflake recommends specifying only throughput. blockConfig.iops must be 16 * blockConfig.throughput for GPU instances in Google Cloud.

  • blockConfig.throughput: Specifies the peak throughput, in MiB/second, to provision for the volume.

    • For AWS: The supported range is 125 - 2000, with a default of 125.
    • 对于 Azure:支持范围为 125-1200,默认值为 125。
    • 对于 Google Cloud:
      • Google Cloud CPU 实例:支持范围为 140-2400,默认值为 140。
      • Google Cloud GPU 实例:支持范围为 400-1,200,000,默认值为 400,但每 GB 卷大小不小于 0.12。
  • blockConfig.encryption: Specify encryption type of the volume: SNOWFLAKE_SSE or SNOWFLAKE_FULL. For more information, see 加密支持.

第 2 步:指定在容器中挂载卷的位置

After you define a block storage volume by adding the spec.volumes field, use the spec.containers.volumeMounts field to describe where to mount the volume in your application containers, as shown in the following example:

spec:
  containers:
  - name: ...
    image: ...
    volumeMounts:
    - name: <volume-name>
      mountPath: <absolute_directory_path>

示例

  • Create a service with a block storage volume with size 10Gi. The volume is mounted at path /opt/block/path in the main container.

    CREATE SERVICE my_service
    IN COMPUTE POOL tutorial_compute_pool
    FROM SPECIFICATION $$
    spec:
      containers:
      - name: echo
     image: /tutorial_db/data_schema/tutorial_repository/my_echo_service_image:latest
     volumeMounts:
     - name: block-vol
       mountPath: /opt/block/path
     readinessProbe:
       port: 8080
       path: /healthcheck
      endpoints:
      - name: echoendpoint
     port: 8080
     public: true
      volumes:
      - name: block-vol
     source: block
     size: 10Gi
    $$;
  • 创建一个使用快照初始化的块存储卷服务。

    CREATE SERVICE new_service
      IN COMPUTE POOL tutorial_compute_pool
      FROM SPECIFICATION $$
    spec:
      containers:
      - name: echo
     image: /tutorial_db/data_schema/tutorial_repository/my_echo_service_image:tutorial
     volumeMounts:
     - name: vol-from-snapshot
       mountPath: /opt/block/path
     readinessProbe:
       port: 8080
       path: /healthcheck
      endpoints:
      - name: echoendpoint
     port: 8080
     public: true
      volumes:
      - name: vol-from-snapshot
     source: block
     size: 50Gi
     blockConfig:
       initialContents:
         fromSnapshot: BACKUP_DB.SNAPSHOTS.MY_SNAPSHOT
    $$

For an example with step-by-step instructions, see Tutorial 5: Create a service with a block storage volume mounted. This tutorial shows you how to create a service with a block storage volume mounted.

关于 IOPS 和吞吐量

如果您的服务 IO 性能没有达到预期,并且服务受到块卷 IO 或吞吐量的影响,您可以考虑增加 IOPS 或吞吐量。在当前实施中,任何此类更改都需要您重新创建服务。

You can review these available platform metrics to identify if your service is bottlenecked on block storage:

  • container.cpu.usage
  • volume.read.iops
  • volume.write.iops
  • volume.read.throughput
  • volume.write.throughput

根据云提供商的不同,需注意以下事项:

  • 为 AWS 配置 IOPS 和吞吐量:

    • The maximum IOPS that can be configured is 500 IOPS per GiB of volume size, to a maximum of 80,000 IOPS. For example, the maximum IOPS of a 10 GiB volume can be 500 * 10 = 5000. Accordingly, note that the maximum IOPS of 80,000 can only be configured if your volume is 160 GiB or larger.
    • The maximum throughput that can be configured is 1 MiB/second for every 4 IOPS, to a maximum of 2000 MiBs/second. For example, with the default 3000 IOPS you can configure throughput up to 750 MiB/second (3000/4=750).
  • 为 Azure 配置 IOPS 和吞吐量:

    • 在卷大小达到 6 GB 后,超出 6 GB 的每 GB(磁盘类型)所支持的 IOPS 数量将增加 500。10GB 卷的最大 IOPS 可以是 500 * 4 + 3000 = 5000。因此,请注意,只有在卷为 160 GiB 或更大时,才能配置最大 IOPS 80000。
    • 6 GB 后,可配置的最大吞吐量是每 IOPS 为 0.25 MiB/秒,最大为 1200 MiBs/秒。例如,使用默认的 3000 IOPS,您可以配置高达 750 MiB/秒的吞吐量 (3000*0.25=750)。
  • 为 Google Cloud 配置 IOPS 和吞吐量:

    • 对于 CPU 实例:

      • IOPS 可配置为每 Gi 卷大小 500 IOPS,最大为 160,000 IOPS。例如,10 Gi 卷可以达到最大 5,000 IOPS (500 IOPS * 10 Gi)。要达到最大 160,000 IOPS,卷大小必须为 320 Gi 或更大。
      • 可以配置 2400 MiB/秒的最大吞吐量,每 4 IOPS 的速率为 1 MiB/秒。例如,3000 IOPS 支持高达 750 MiB/秒的吞吐量 (3000 / 4 = 750)。
    • 对于 GPU 实例:

      • IOPS 不能独立于吞吐量进行设置;IOPS 的计算方法为 16 乘以吞吐量值。因此,指定吞吐量会自动确定 IOPS。对于与 GPU 实例一起使用的磁盘,不建议配置 IOPS。
      • 您必须配置最低吞吐量。对于每 GiB 的卷大小,吞吐量必须至少为 400 MiB/s,或者 0.12 MiB/s,以较高者为准。
      • 对于每 GiB 的卷大小,可配置的吞吐率为 1600 MiB/s,最大值为 1200,000 MiB/s。例如,一个 10 GiB 卷可以实现最大吞吐量 16,000 MiB/s (1600 * 10)。请注意,仅当卷为 750 GiB 或更大时,才能达到 1,200,000 MiB/s 的上限。

删除时生成快照

执行以下任一命令都会导致删除与服务关联的块卷:

  • DROP SERVICE <service-name> FORCE
  • ALTER COMPUTE POOL <compute-pool-name> STOP ALL
  • ALTER SERVICE <service-name> RESTORE VOLUME <volume-name> FROM SNAPSHOT

The snapshotOnDelete option defaults to true for services and false for jobs. When the value is true, Snowflake takes a snapshot of the volume before deletion, to protect you from accidental data loss. You add this option in the service specification as part of the blockConfig configuration.

Unlike other snapshots, these snapshots are automatically deleted after a period of time. The snapshot retention period defaults to 7 days and can be configured using the snapshotDeleteAfter field.

Snowflake assigns a snapshot name in this format: SYS_BACKUP_ON_DELETE<string>_<timestamp>.

访问控制要求

If you want to use an existing snapshot (fromSnapshot is in the specification) to initialize the volume, the service’s owner role must have the USAGE privilege on the snapshot.

服务的所有者角色还必须对包含快照的数据库和架构拥有 USAGE 权限。

管理快照

您可以拍摄块存储卷的快照,并在以后使用备份,如下所示:

  • 使用快照备份恢复现有的块存储卷。
  • 创建新服务时,使用快照备份作为种子数据来初始化新的块存储卷。

在拍摄快照之前,您应确保所有更新都刷新到磁盘。

Snowflake 提供了以下命令来创建和管理快照:

In addition, to restore a snapshot on an existing block storage volume, you can execute the ALTER SERVICE … RESTORE VOLUME command. Note that you need to suspend the service before you can restore a snapshot. After restoring a volume, the service is automatically resumed.

块存储成本

For more information, see the Snowflake Service Consumption Table.

当块存储卷与作业服务一起使用时,在作业服务由用户删除或在完成后由 Snowflake 清理后,Snowflake 会停止收取块存储费用。

After a snapshot is dropped, you will continue to be billed through the configured data retention period. The default data retention period is 1 day.

加密支持

块存储卷和快照支持同样用于其他 Snowflake 管理的存储的两种加密模式:

  • SNOWFLAKE_SSE: 仅限服务器端加密。这是未在 Snowflake 账户上启用 Tri-Secret-Secret 的客户的默认配置。

Snowflake 对块存储卷和快照使用云服务提供商 (CSP) 的加密。

  • SNOWFLAKE_FULL: 主机和服务器端加密。这是已在 Snowflake 账户上启用 Tri-Secret-Secret 的客户的默认配置。

    Data is first encrypted at the client (Snowpark Container Services host) before being sent to a CSP for storage. Each volume is encrypted with a unique volume key. The same key is used for encrypting snapshots that you create from that volume.

    Because Snowflake performs additional encryption of data, there is a performance and resource usage impact associated with using SNOWFLAKE_FULL volumes. Snowflake uses the encryption mechanisms provided by the Linux kernel, so the effect should not be significant. Any performance impact is likely workload-specific, so we recommend that you identify service or job bottlenecks, increase volume throughput, or provide a more powerful server.

Snowflake 中的块存储卷和快照不支持密钥轮换或密钥更新。要更改卷的加密密钥,请创建一个新卷并复制快照中的数据。

对于在其账户上启用了 Tri-Secret Secure 的客户,请注意,当撤销对客户管理密钥的访问权限时,卷数据仍可用于当前正在使用该卷运行的服务。我们建议您在撤销对客户管理密钥的访问权限时关闭这些服务,这样数据就不可用了。此外,在您撤销密钥后,带加密卷的服务将无法启动。

Volume snapshots retain the encryption type of their source volume. For example, a snapshot of a SNOWFLAKE_SSE volume also uses SNOWFLAKE_SSE encryption. When a snapshot is used as the initial content of a volume or with the ALTER SERVICE … RESTORE VOLUME command, its encryption type must match the volume’s encryption type. Otherwise, the command fails.

You can require the SNOWFLAKE_FULL encryption type for all Snowpark Container Services block-storage volumes and snapshots in the account by setting the ENABLE_SPCS_BLOCK_STORAGE_SNOWFLAKE_FULL_ENCRYPTION_ENFORCEMENT parameter to TRUE for the account.

启用此参数后,将禁止使用 SNOWFLAKE_SSE 加密类型创建块存储卷和快照。

示例

For an example, see Tutorial. The tutorial provides step-by-step instructions to create a service with a block storage volume mounted.

准则和限制

以下限制适用于使用块存储卷的服务:

  • 一般限制。如果您在这些限制方面遇到任何问题,请联系您的账户代表。

    • 每项服务的最大块存储卷数为 3。

    • 每个 Snowflake 账户的最大块存储卷数为 100。

    • 下表列出了每个计算池节点可以挂载的最大块存储卷数,具体取决于节点的实例类型。Snowflake 确保使用块存储卷的服务实例的放置符合这些限制。这可能导致服务处于 PENDING 状态,等待其他资源。

      Instance familyAWS limitAzure limitGCP limit
      CPU_X64_XS22314
      CPU_X64_S22814
      CPU_X64_M221614
      CPU_X64_SL273114
      CPU_X64_L223214
      HIGHMEM_X64_S221614
      HIGHMEM_X64_M223214
      HIGHMEM_X64_SLn/a3214
      HIGHMEM_X64_L22n/an/a
      GPU_NV_S (AWS only)22n/an/a
      GPU_NV_M (AWS only)21n/an/a
      GPU_NV_L (AWS only)14n/an/a
      GPU_NV_XS (Azure only)n/a8n/a
      GPU_NV_SM (Azure only)n/a32n/a
      GPU_NV_2M (Azure only)n/a32n/a
      GPU_NV_3M (Azure only)n/a16n/a
      GPU_NV_SL (Azure only)n/a32n/a
      GPU_GCP_NV_L4_1_24G (Google Cloud only)n/an/a14
      GPU_GCP_NV_L4_4_24G (Google Cloud only)n/an/a14
      GPU_GCP_NV_A100_8_40G (Google Cloud only)n/an/a14
    • 每个 Snowflake 账户允许的最大快照数为 100。

  • 使用块存储卷的服务必须具有相同的最小和最大实例数。

  • 创建服务后,以下限制将适用:

    • 无法使用 ALTER SERVICE …SET … 命令(当服务使用块存储卷时)。
    • You can’t change the size, iops, throughput, or encryption fields of block storage volumes.
    • 不能添加新的块存储卷,也不能移除现有的块存储卷。
    • Block storage volumes are preserved if a service is upgraded, or suspended and resumed. When a service is suspended, you continue to pay for the volume because it is preserved. After you upgrade or resume a service, Snowflake attaches each block storage volume to the same service instance ID as before.
    • Block storage volumes are deleted if the service is dropped. To preserve data in the volumes, take snapshots of the volumes. You can use the snapshots later to initialize new volumes.