将 Snowflake 暂存区卷与服务搭配使用¶
Snowflake 支持应用程序容器的 各种存储卷类型,包括内部暂存区、本地存储、内存存储和块存储卷。本部分介绍如何为内部暂存区配置卷和卷挂载。
Stage volumes provide your service with access to internal stages by using FileSystem APIs, simplifying your code compared to using a Snowflake driver and GET and PUT SQL commands.
Stage volumes can also be more performant for scenarios with streaming reads or writes of large data files. Stage volumes work best when the application uses the stage volume as a convenient alternative API for stages, rather than expecting a fully capable file system.
There are currently two implementations of stage volumes: the generally available version and a newer version that is currently in preview.
About the new stage volume implementation¶
The generally available stage volume implementation uses a shared cache for reads and writes. Although this works well for some scenarios, you can't control whether data is read from the cache or directly from the stage, which might not be suitable for all applications. Additionally, when you use the cache for reads and writes, this can introduce performance overhead.
The new stage volume implementation, currently in preview, uses only limited in-memory caching. This provides more predictable behavior and significantly faster throughput. This version will eventually replace the current generally available implementation. Snowflake recommends evaluating this preview version unless your workload requires random writes or file appends, which aren't currently supported.
Keep in mind the following additional considerations when you use the new stage volume implementation:
It is optimized for large, sequential reads and writes, providing strong performance for these access patterns. For best results, read and write data in large, contiguous chunks.
Random read performance might be lower with the new implementation because of the removal of caching. However, without a local disk cache, consistency across volumes is improved. File changes are written directly to the backing stage when the file is closed, with no local disk buffering.
Reads always return the latest data, making this configuration better for sharing data between services.
Random writes or file appends isn't supported. Attempting these operations results in an error.
Specify a Snowflake stage volume in a service specification¶
To create a service where service containers use a Snowflake stage volume, specify the required settings in the service specification as shown in the following steps:
To specify the stage volume, use the
spec.volumesfield.Use the generally available version of the stage volume implementation, as shown in the following example:
volumes: - name: <name> source: <stage_name>
以下字段为必填字段:
name:卷的名称。source: The Snowflake internal stage or folder on the stage to mount, for example@my_stage,@my_stage/folder. Quotes must surround this value.
Use the new preview version of the stage volume implementation, as shown in the following example:
volumes: - name: <name> source: stage stageConfig: name: <stage_name>
以下字段为必填字段:
name:卷的名称。source: Identifies the type of the volume (stage).stageConfig: Identifies the Snowflake internal stage or folder on a stage to mount, for example@my_stage,@my_stage/folder, or@my_db.my_schema.my_stage/folder/nestedfolder. Double quotes must surround this value.
Use the
spec.containers.volumeMountsfield to describe where to mount the stage volume in your application containers, as show in the following example:volumeMounts: - name: <name> mountPath: <absolute_directory_path>
The information you provide in this field is consistent across all supported storage volume types and applies to both implementations of stage volumes.
示例¶
This examples illustrates the difference between mounting a Snowflake stage by using the existing and the new stage volumes. In the example service specification, the app container mounts an internal stage @model_stage by using both the existing and new stage volume implementations.
@model-legacystage volume configuration directs Snowflake to use the generally available implementation of the stage volume.@model-newstage volume configuration specifies thestageConfigfield that directs Snowflake to use the preview implementation of the stage volume.
spec:
containers:
- name: app
image: <image1-name>
volumeMounts:
- name: models-legacy
mountPath: /opt/model-legacy
- name: models-new
mountPath: /opt/model-new
volumes:
- name: models-legacy
source: "@model_stage"
- name: models-new
source: stage
stageConfig:
name: "@model_stage"
The volumeMounts field specifies where inside the container to mount the stage volume. This specification remains same for both the stage volume implementations.
访问控制要求¶
服务的所有者角色是用于创建服务的角色。这也是服务在与 Snowflake 交互时使用的角色。所有者角色确定授予应用程序容器访问已挂载暂存区的权限。所有者角色必须对暂存区具有 READ 权限。
If the owner role doesn't have the WRITE privilege on a stage, the mount for that stage is read-only. That is, the containers can only read the files from the stage. The owner role needs the WRITE privilege on a stage for the stage mount to support both read and write.
Guidelines when using stage volumes¶
This section provides you with guidelines to follow when you implement application code in which containers use stage volumes.
Common guidelines for both implementations of stage volumes¶
暂存区挂载针对顺序读写进行了优化。
暂存区挂载 I/O 操作的延迟可能高于容器文件系统和块存储卷上的 I/O 操作。您应始终检查 I/O 操作的状态码,以确保它们成功。
Stage mounts upload file updates asynchronously. Changes to files on a stage mount are only guaranteed to be persisted to the stage after the file descriptor is successfully closed or flushed. There might be a delay before the changes to files on a stage mount become visible to other containers and Snowflake.
已挂载暂存区中的每个目录应包含少于 100,000 个文件。预计
readdir延迟会随着目录中文件数量的增加而增加。
Guidelines when using the generally available version of the stage volume implementation¶
避免在暂存区挂载中同时写入多个文件。
Stage mount isn't a network file system. Don't use stage mounts for multi-client coordination.
不要同时打开同一个文件的多个句柄。使用打开的文件句柄进行读取或写入操作,但不能将两者混用。要在写入文件后读取文件,请关闭该文件,然后在读取之前重新打开该文件。
Guidelines when using the new preview version of the stage volume implementation¶
Concurrent writes to the same file from multiple stage mounts --- same stage volume mounted on different containers --- aren't recommended.
The absence of a local disk cache improves consistency across mounts. File changes are flushed directly to the backing stage upon closing the file, with no local disk buffering. Reads always return the latest data, making the new stage mount better for sharing data between services.
Read and write data in large, contiguous chunks for optimal performance. The performance penalty for small reads and writes when compared to the generally available stage volume implementation, can mitigate the performance gains from the new implementation.
Limitations when using stage volumes¶
This section describes limitations you should be aware of when you implement application code in which containers use stage volumes. If you encounter any issues with these limits, contact your account representative.
Common limitations for both implementations of stage volumes¶
You can only mount a stage or a subdirectory in a stage; for example, @my_stage,
@my_stage/folder. You can't mount a single file in a stage; for example,@my_stage/folder/file.External stages aren't supported. Only Snowflake internal stages are supported.
A maximum of 5 stage volumes is allowed per service. For more information, see spec.volumes.
每个计算池节点最多支持 8 个暂存区卷。Snowflake 管理每个节点的暂存区挂载限制,与管理内存、CPU 和 GPU 的方式类似。当现有节点无法支持所请求的暂存区挂载时,启动新的服务实例可能会导致 Snowflake 启动新节点。
暂存区挂载不是完全 POSIX 兼容的文件系统。例如:
文件和目录重命名不是原子的。
不支持硬链接。
The Linux kernel subsystem inode notify (
inotify) that monitors changes to file systems doesn't work on stage mounts.
Limitations when using the generally available version of the stage volume implementation¶
The stage volume capabilities vary depending on the cloud platform for your Snowflake account:
Accounts on AWS support internal stages with both SNOWFLAKE_FULL and SNOWFLAKE_SSE stage encryption. For more information, see Internal stage parameters.
Accounts on Azure currently support internal stages with SNOWFLAKE_SSE encryption. When you run CREATE STAGE, use the ENCRYPTION parameter to specify the encryption type:
CREATE STAGE my_stage ENCRYPTION = (TYPE = 'SNOWFLAKE_SSE');Accounts on Google Cloud aren't supported.
Concurrent writes to the same file from multiple stage mounts --- that is, the same stage volume mounted on different containers --- aren't supported.
Limitations when using the new preview version of the stage volume implementation¶
Random writes, and file appends aren't supported.
Each stage that is mounted requires 512 MB memory per stage. This means that there is a limitation on the number of stage volumes that can be used based on instance size. Mounting the volume on multiple containers doesn't increase memory consumption.
