Snowpark Container Services：使用计算池¶

计算池是一个或多个虚拟机 (VM) 节点的集合，Snowflake 会在这些节点上运行您的 Snowpark Container Services 服务（包括作业服务）。可使用 CREATE COMPUTE POOL 命令创建计算池。然后，可在创建服务或执行作业服务时指定该计算池。

创建计算池¶

计算池是一种账户级结构，类似于 Snowflake 虚拟仓库。计算池的命名范围是账户。也就是说，账户中不能有多个同名的计算池。

创建计算池至少需要以下信息：

为计算池节点提供的机器类型（称为 实例系列）
启动计算池所需的最小节点数
计算池可扩展到的最大节点数（Snowflake 负责管理扩展）。

如果预计在计算池中运行的服务会有大量负载或突发活动，可以将最小节点数设置为大于 1。这种方法可确保在需要时可以随时使用其他节点，而不是等待自动扩缩启动。

设置最大节点限制可防止 Snowflake 自动扩缩时将大量意外节点添加到计算池中。这在遇到意外负载峰值或代码中出现问题等情况下至关重要，这些情况可能导致 Snowflake 分配的计算池节点数量超过最初计划。

使用 Snowsight 或 SQL 创建计算池：

Snowsight:

In the navigation menu, select Compute » Compute Pools.
在导航栏底部选择用户名，然后切换到 ACCOUNTADMIN 角色或任何允许创建计算池的角色。
选择 + Compute Pool。
在 New compute pool UI 中，指定所需的信息（计算池名称、实例系列和节点限制）。
选择 Create Compute Pool。

SQL:

执行 CREATE COMPUTE POOL 命令。

例如，下面的命令创建了一个单节点计算池：

CREATE COMPUTE POOL tutorial_compute_pool
  MIN_NODES = 1
  MAX_NODES = 1
  INSTANCE_FAMILY = CPU_X64_XS;

The instance family identifies the type of machine you want to provision for compute pool nodes. Specifying instance family in creating a compute pool is similar to specifying warehouse size (XSMALL, SMALL, MEDIUM, LARGE and so on) when creating a warehouse. The following table lists the available machine types. You can also use the SHOW COMPUTE POOL INSTANCE FAMILIES command to get this list of available instance families.

Compute pool placement¶

A placement group is a fault-isolation domain within a Snowflake region, similar to an availability zone (AZ) in AWS or Azure. You can optionally specify which placement group to provision compute pool nodes in by using the placement_group parameter in the CREATE COMPUTE POOL statement.

If placement_group is not specified, Snowflake places compute pool nodes based on availability, which might span multiple placement groups.

If you choose to specify a placement_group, you have two options:

Specify a specific placement group: When you specify placement_group, Snowflake provisions all nodes for that pool from the specified placement group. You should set placement_group to a specific placement group in the following situations:
- You need reduced cross-node latency and lower communication costs for highly interactive, tightly coupled services.
- You are building a highly available service and you choose to deploy the same code across multiple services, each one running on a separate compute pool that is assigned to a distinct placement group.
The following guidelines apply when you set a specific placement group for a compute pool:
- Instance family availability varies by placement group and region. Smaller regions might offer fewer placement group options, especially for GPU families. Call the SYSTEM$GET_INSTANCE_FAMILY_PLACEMENT_GROUPS system function to list the placement groups available for a specific instance family in your region.
- Placement group names are consistent within an account across different instance families. Different Snowflake accounts might observe different names for the same underlying placement groups.
- When you configure a placement group for a compute pool, it restricts Snowpark Container Services' flexibility to optimize node placement. This restriction can increase the likelihood of insufficient-capacity errors and lengthen startup times during peak demand.
- You can alter a placement group only if the compute pool is fully suspended and your services don't use block storage.
Specify DISTRIBUTED: When you set placement_group to DISTRIBUTED, Snowflake attempts to distribute nodes for that compute pool across all available placement groups. You should set placement_group to DISTRIBUTED if you want to maintain healthy fault tolerance across multiple placement groups. When compute pool nodes are distributed across multiple placement groups, if one placement group goes down, you don't lose all the nodes

The following behaviors apply when you set placement_group to DISTRIBUTED for a compute pool:
- Node distribution: Snowflake uses an equal-partition strategy to spread nodes across all available placement groups in a region. If a specific placement group encounters insufficient capacity errors, nodes are provisioned in other placement groups with available capacity, which can result in an uneven distribution.
- Service instances distribution: When there is more than one service instance, Snowflake attempts to evenly distribute the instances across placement groups. Sometimes even distribution can't be achieved because of constraints, such as capacity limitations.
- Outage behavior: In the current implementation, if a placement group fails, Snowflake doesn't automatically fail over nodes to healthy placement groups. You should overprovision your service instances (N+1) so that nodes in the remaining placement groups can handle the traffic load during an outage. In the event of placement group outage, Snowflake takes the following actions:
  - Stops placing new service instances in the impacted placement group.
  - Routes ingress traffic to service instances in the healthy placement groups.
  - Recreates service instances in the impacted placement group on the healthy placement groups.

备注

In smaller Snowflake regions, some instance types might not be available across multiple placement groups, which can reduce the compute pool's resilience to placement group failures.
After a placement group recovers, Snowflake doesn't automatically move service instances back to it; the system gradually rebalances during node upgrades or routine service maintenance.

Available instance families (machine types) for compute pool nodes¶

INSTANCE_FAMILY，请参阅 Snowflake 服务消耗表

vCPU

内存 (GiB)

存储 (GB)

带宽限制 (Gbps)

GPU

GPU Memory per GPU (GB)

节点限制

描述

CPU_X64_XS

1

6

100

高达 12.5

不适用

不适用

150

可用于 Snowpark 容器的最小实例。非常适合节省成本和入门。

CPU_X64_S

3

13

100

高达 12.5

不适用

不适用

150

非常适合托管多个服务/作业，同时节省成本。

CPU_X64_M

6

28

100

高达 12.5

不适用

不适用

150

非常适合拥有全栈应用程序或多种服务

CPU_X64_SL (except China)

124

654

100

高达 12.5

不适用

不适用

150

适用于需要异常大量 CPUs、内存和存储的应用程序。

CPU_X64_L

28

116

100

12.5

不适用

不适用

150

适用于需要异常大量 CPUs、内存和存储的应用程序。

HIGHMEM_X64_S

6

58

100

AWS 和 GCP：高达 12.5，Azure：8

不适用

不适用

150

适用于内存密集型应用程序。

HIGHMEM_X64_M

28

AWS：240、Azure 和 GCP：244

100

AWS：12.5，Azure 和 GCP：16

不适用

不适用

150

适用于在单台机器上托管多个内存密集型应用程序。

HIGHMEM_X64_SL (Azure and GCP, except GCP Dammam region)

92

654

100

32

不适用

不适用

20

可用于处理大量内存中数据的最大 Azure 或 GCP 高内存机器。

HIGHMEM_X64_L (AWS only)

124

984

100

50

不适用

不适用

150

可用于处理大量内存中数据的最大 AWS 高内存机器。

GPU_NV_S (AWS only, except Singapore, Switzerland North, Paris, and Osaka regions)

6

27

300 (NVMe)

高达 10

1 个 NVIDIA A10G

24

150

我们为开始使用 Snowpark 容器提供的最小 NVIDIA GPU 大小。

GPU_NV_M (AWS only, except gov regions, Singapore, Switzerland North, Paris, and Osaka regions)

44

178

3.4 TB (NVMe)

40

4 个 NVIDIA A10G

24

10

针对密集型 GPU 使用场景进行了优化，如 Computer Vision 或 LLMs/VLMs。

GPU_NV_L (AWS only, available only in AWS US West and US East non-gov regions by request; limited availability might be possible in other regions upon request)

92

1112

6.8 TB (NVMe)

400

8 个 NVIDIA A100

40

按请求

LLMs 和群集等特殊和高级 GPU 案例的最大 GPU 实例。

GPU_NV_XS（仅限 Azure，瑞士北部、UAE 北部、US 中部和 UK 南部区域除外）

3

26

100

8

1 个 NVIDIA T4

16

10

我们为 Snowpark 容器提供的 Azure NVIDIA GPU 最小规格，助您快速入门。

GPU_NV_SM（仅限 Azure，US 中部区域除外）

32

424

100

40

1 个 NVIDIA A10

24

10

为 Snowpark 容器提供的 Azure NVIDIA GPU 最小规格，助您快速入门。

GPU_NV_2M（仅限 Azure，US 中部区域除外）

68

858

100

80

2 个 NVIDIA A10

24

5

针对密集型 GPU 使用场景进行了优化，如 Computer Vision 或 LLMs/VLMs。

GPU_NV_3M（仅限 Azure，US 中部、北欧和 UAE 北部区域除外）

44

424

100

40

2 个 NVIDIA A100

80

按请求

针对内存密集型 GPU 使用场景进行了优化，如 Computer Vision 或 LLMs/VLMs。

GPU_NV_SL（仅限 Azure，US 中部、北欧和 UAE 北部区域除外）

92

858

100

80

4 个 NVIDIA A100

80

按请求

LLMs 和群集等特殊和高级 GPU 案例的最大 GPU 实例。

GPU_GCP_NV_L4_1_24G（仅限 Google Cloud）

6

28

300

高达 16

1 个 NVIDIA L4

24

10

我们为开始使用 Snowpark 容器提供的最小 NVIDIA GPU 大小。

GPU_GCP_NV_L4_4_24G（仅限 Google Cloud）

44

178

1200

高达 50

4 个 NVIDIA L4

24

10

GPU 使用场景，例如计算机视觉或 LLMs。

GPU_GCP_NV_A100_8_40G（仅限 Google Cloud，仅应要求在 GCP US 中部 1 和欧洲西部 4 地区提供）

92

654

2500

高达 100

8 个 NVIDIA A100

40

按请求

针对内存密集型 GPU 使用场景进行了优化，如 Computer Vision 或 LLMs/VLMs。

INSTANCE_FAMILY，请参阅 Snowflake 服务消耗表	vCPU	内存 (GiB)	存储 (GB)	带宽限制 (Gbps)	GPU	GPU Memory per GPU (GB)	节点限制	描述
CPU_X64_XS	1	6	100	高达 12.5	不适用	不适用	150	可用于 Snowpark 容器的最小实例。非常适合节省成本和入门。
CPU_X64_S	3	13	100	高达 12.5	不适用	不适用	150	非常适合托管多个服务/作业，同时节省成本。
CPU_X64_M	6	28	100	高达 12.5	不适用	不适用	150	非常适合拥有全栈应用程序或多种服务
CPU_X64_SL (except China)	124	654	100	高达 12.5	不适用	不适用	150	适用于需要异常大量 CPUs、内存和存储的应用程序。
CPU_X64_L	28	116	100	12.5	不适用	不适用	150	适用于需要异常大量 CPUs、内存和存储的应用程序。
HIGHMEM_X64_S	6	58	100	AWS 和 GCP：高达 12.5，Azure：8	不适用	不适用	150	适用于内存密集型应用程序。
HIGHMEM_X64_M	28	AWS：240、Azure 和 GCP：244	100	AWS：12.5，Azure 和 GCP：16	不适用	不适用	150	适用于在单台机器上托管多个内存密集型应用程序。
HIGHMEM_X64_SL (Azure and GCP, except GCP Dammam region)	92	654	100	32	不适用	不适用	20	可用于处理大量内存中数据的最大 Azure 或 GCP 高内存机器。
HIGHMEM_X64_L (AWS only)	124	984	100	50	不适用	不适用	150	可用于处理大量内存中数据的最大 AWS 高内存机器。
GPU_NV_S (AWS only, except Singapore, Switzerland North, Paris, and Osaka regions)	6	27	300 (NVMe)	高达 10	1 个 NVIDIA A10G	24	150	我们为开始使用 Snowpark 容器提供的最小 NVIDIA GPU 大小。
GPU_NV_M (AWS only, except gov regions, Singapore, Switzerland North, Paris, and Osaka regions)	44	178	3.4 TB (NVMe)	40	4 个 NVIDIA A10G	24	10	针对密集型 GPU 使用场景进行了优化，如 Computer Vision 或 LLMs/VLMs。
GPU_NV_L (AWS only, available only in AWS US West and US East non-gov regions by request; limited availability might be possible in other regions upon request)	92	1112	6.8 TB (NVMe)	400	8 个 NVIDIA A100	40	按请求	LLMs 和群集等特殊和高级 GPU 案例的最大 GPU 实例。
GPU_NV_XS（仅限 Azure，瑞士北部、UAE 北部、US 中部和 UK 南部区域除外）	3	26	100	8	1 个 NVIDIA T4	16	10	我们为 Snowpark 容器提供的 Azure NVIDIA GPU 最小规格，助您快速入门。
GPU_NV_SM（仅限 Azure，US 中部区域除外）	32	424	100	40	1 个 NVIDIA A10	24	10	为 Snowpark 容器提供的 Azure NVIDIA GPU 最小规格，助您快速入门。
GPU_NV_2M（仅限 Azure，US 中部区域除外）	68	858	100	80	2 个 NVIDIA A10	24	5	针对密集型 GPU 使用场景进行了优化，如 Computer Vision 或 LLMs/VLMs。
GPU_NV_3M（仅限 Azure，US 中部、北欧和 UAE 北部区域除外）	44	424	100	40	2 个 NVIDIA A100	80	按请求	针对内存密集型 GPU 使用场景进行了优化，如 Computer Vision 或 LLMs/VLMs。
GPU_NV_SL（仅限 Azure，US 中部、北欧和 UAE 北部区域除外）	92	858	100	80	4 个 NVIDIA A100	80	按请求	LLMs 和群集等特殊和高级 GPU 案例的最大 GPU 实例。
GPU_GCP_NV_L4_1_24G（仅限 Google Cloud）	6	28	300	高达 16	1 个 NVIDIA L4	24	10	我们为开始使用 Snowpark 容器提供的最小 NVIDIA GPU 大小。
GPU_GCP_NV_L4_4_24G（仅限 Google Cloud）	44	178	1200	高达 50	4 个 NVIDIA L4	24	10	GPU 使用场景，例如计算机视觉或 LLMs。
GPU_GCP_NV_A100_8_40G（仅限 Google Cloud，仅应要求在 GCP US 中部 1 和欧洲西部 4 地区提供）	92	654	2500	高达 100	8 个 NVIDIA A100	40	按请求	针对内存密集型 GPU 使用场景进行了优化，如 Computer Vision 或 LLMs/VLMs。

有关可用实例族的信息，请参阅 CREATE COMPUTE POOL。

计算池节点的自动扩缩¶

创建计算池后，Snowflake 会启动最低数量的节点，并自动创建更多节点，直至允许的最大数量。这就是 自动扩缩。当运行中的节点无法承担任何额外的工作负载时，系统就会分配新的节点。例如，假设计算池中的两个节点上运行着两个服务实例。如果在同一个计算池中执行另一个服务，额外的资源需求可能会促使 Snowflake 启动一个额外的节点。

但是，如果某个节点在特定时间内没有运行任何服务，Snowflake 会自动移除该节点，确保计算池在移除该节点后仍能保持所需的最少节点数。

管理计算池¶

您可以使用 Snowsight 或 SQL 管理计算池。

在 Snowsight 中，选择计算池名称旁边的更多选项 (...)，然后从菜单中选择所需的操作。本节将介绍用来管理计算池的 SQL 命令。

Snowpark Container Services 提供以下命令来管理计算池：

监控： 使用 SHOW COMPUTE POOLS 命令可获取有关计算池的信息。
操作： 使用 ALTER COMPUTE POOL 命令可更改计算池的状态。
```
ALTER COMPUTE POOL <name> { SUSPEND | RESUME | STOP ALL }
```
暂停计算池时，Snowflake 会暂停除作业服务外的所有服务。作业服务会继续运行，直到达到结束状态（DONE 或 FAILED），之后计算池节点会被释放。

在启动新服务之前，必须先恢复已暂停的计算池。如果将计算池配置为自动恢复（将 AUTO_RESUME 属性设置为 TRUE），则当向计算池提交服务时，Snowflake 会自动恢复计算池。否则，需要运行 ALTER COMPUTE POOL 命令，手动恢复计算池。
修改： 使用 ALTER COMPUTE POOL 命令可更改计算池属性。
```
ALTER COMPUTE POOL <name> SET propertiesToAlter = <value>
propertiesToAlter := { MIN_NODES | MAX_NODES | AUTO_RESUME | AUTO_SUSPEND_SECS | PLACEMENT_GROUP | INSTANCE_FAMILY | TAG | COMMENT }
```
降低 MAX_NODES 时，请注意以下潜在影响：
- Snowflake 可能需要终止一个或多个服务实例，并在计算池中的其他可用节点上重新启动它们。如果 MAX_NODES 设置过低，Snowflake 可能无法调度某些服务实例。
- 如果被终止的节点正在执行作业服务，则作业执行将失败。Snowflake 不会重新启动该作业服务。
  
  示例：
  ALTER COMPUTE POOL my_pool SET MIN_NODES = 2 MAX_NODES = 2;
移除： 使用 DROP COMPUTE POOL 命令可移除计算池。
示例：
DROP COMPUTE POOL <name>

在删除计算池之前，必须先停止所有正在运行的服务。
列出计算池并查看属性： 使用 SHOW COMPUTE POOLS 和 DESCRIBE COMPUTE POOL 命令。有关示例，请参阅显示计算池。

About the target_nodes compute pool property¶

This section explains the target_nodes property with examples. The target_nodes property indicates the number of nodes that Snowflake is targeting for your compute pool. If active_nodes isn't equal to the target_nodes, Snowflake autoscales the cluster to add or remove the nodes.

There are several properties related to the number of nodes in a compute pool. These includes: min_nodes, max_nodes, active_nodes, idle_nodes, and target_nodes. For more information about these properties, see DESC COMPUTE POOL and SHOW COMPUTE POOLS.

The following examples demonstrate how to interpret the values in the target_nodes column.

Example 1¶

Suppose in a CREATE COMPUTE POOL command, you specify MIN_NODES=1 and MAX_NODES=3.

While Snowflake is provisioning a node, initially the value in the active_nodes and idle_nodes columns is 0, and the value in the target_nodes column is 1. (The value in the target_nodes column is the same as the value that you specified for the MIN_NODES parameter.) This indicates that there should be one node in the compute pool that Snowflake is provisioning.

After Snowflake provisions one node, the value in the idle_nodes column is 1 (assuming that there are no services running). The value in the target_nodes column is still 1, indicating there should be one node in the compute pool.

Example 2¶

Snowflake might try to add a node to an existing compute pool due to autoscaling or changes to the minimum number of nodes (through ALTER COMPUTE POOL ... SET MIN_NODES).

While Snowflake is provisioning a node, the value in the state column is resizing. To determine how many nodes Snowflake is adding, check the value in the target_nodes column.

For example, suppose that the value in the, active_nodes column is 1, the value in the idle_nodes column is 0, and you resize the compute pool by updating the MIN_NODES property from 1 to 2. In this case, the value in the target_nodes column is 2 (the number of nodes that should be in the compute pool). From this, you can infer that Snowflake is provisioning one additional node.

计算池生命周期¶

计算池可以处于以下任何一种状态：

IDLE： 计算池拥有所需的虚拟机 (VM) 节点数量，但没有安排任何服务。在这种状态下，由于缺乏活动，自动扩缩会将计算池缩小到最小规模。
ACTIVE： 计算池上至少有一个正在运行或计划运行的服务。计算池可以根据负载或用户操作而放大（至最大节点数）或缩小（至最小节点数）。
SUSPENDED： 计算池目前不包含运行中的虚拟机节点，但如果将 AUTO_RESUME 计算池属性设置为 TRUE，则计算池会在有服务计划时自动恢复。

下列状态为瞬态：

STARTING： 创建或恢复计算池时，计算池会进入 STARTING 状态，直到至少有一个节点完成预置。
STOPPING： 暂停计算池（使用 ALTER COMPUTE POOL）时，计算池会进入 STOPPING 状态，直到 Snowflake 释放了计算池中的所有节点。暂停计算池时，Snowflake 会暂停除作业服务外的所有服务。作业服务会继续运行，直到达到结束状态（DONE 或 FAILED），之后计算池节点会被释放。
RESIZING： 创建计算池时，计算池最初会进入 STARTING 状态。在预置了一个节点后，计算池会进入 RESIZING 状态，直到预置成功了最少的节点数（如 CREATE COMPUTE POOL 中所指定）。当您更改计算池 (ALTER COMPUTE POOL) 并更新最小和最大节点值时，计算池会进入 RESIZING 状态，直到预置成功了最小节点数。请注意，计算池的自动扩缩也会使其处于 RESIZING 状态。

For information about how the costs incurred during the different states of the compute pool lifecycle, see 计算池成本.

计算池权限¶

使用计算池时，适用以下权限模型：

要在账户中创建计算池，当前角色需要对账户具有 CREATE COMPUTE POOL 权限。如果创建了计算池，那么作为所有者，您就具有 OWNERSHIP 权限，可以完全控制该计算池。对一个计算池具有 OWNERSHIP 权限并不意味着对其他计算池具有任何权限。

对于计算池，支持以下权限（功能）：


权限	用途
MODIFY	允许更改任何计算池属性，包括更改大小。
MONITOR	Enables viewing compute pool usage, including describing compute pool properties. Enables access to the monitoring endpoint exposed by the compute pool.
OPERATE	允许更改计算池的状态（暂停、恢复）。此外，还能停止任何已计划的服务（包括作业服务）。
USAGE	允许在计算池中创建服务。请注意，当计算池处于暂停状态且其 AUTO_RESUME 属性设置为 true 时，对计算池具有 USAGE 权限的角色在启动或恢复一项服务时可隐式触发计算池的恢复，即使该角色没有 OPERATE 权限。
OWNERSHIP	Grants full control over the compute pool. Only a single role can hold this privilege on a specific object at a time. Enables access to the monitoring endpoint exposed by the compute pool.
ALL [ PRIVILEGES ]	授予对计算池的所有权限，OWNERSHIP 除外。

计算池维护¶

As part of routine internal-infrastructure maintenance, Snowflake regularly updates compute pool nodes to ensure optimal performance and security. This includes operating system upgrades, driver enhancements, and security fixes. Maintenance involves replacing outdated nodes with updated ones every few weeks, with each node active for up to a month.

维护时段¶

In general, scheduled maintenance occurs every Saturday from 8 PM to Sunday at 8 AM, and every Sunday from 8 PM to Monday at 8 AM. For early access accounts, maintenance takes place daily starting at 11 PM and can last up to 6 hours.

服务中断¶

维护期间，Snowflake 会自动在新节点上重新创建在旧计算池节点上运行的服务实例。Snowflake 使用滚动方法来重新创建服务实例。

如果服务只有一个实例，则在 Snowflake 重新创建实例时会发生服务中断的情况。
对于有多个实例的服务，Snowflake 会在升级的节点上逐步重新创建服务实例。每次更换的服务实例不超过 50%。请注意，这可能会导致可用实例少于为服务请求的 MIN_INSTANCES。如果可用实例减少到少于 MIN_READY_INSTANCES，则会导致服务从 READY 状态过渡到 PENDING 状态，从而造成服务中断。因此，为避免服务中断，可考虑将 MIN_READY_INSTANCES 设置为低于 MIN_INSTANCES 的 50%。

正在进行的作业服务将会中断，客户必须在维护完成后重新启动这些服务。

注意

在维护时段或关键更新期间的服务中断不在 Snowflake 的支持策略和服务级别协议中规定的服务级别范围内。

减少停机时间的最佳实践¶

运行多个服务实例： 拥有多个实例可最大限度地减少维护期间服务中断的情况，确保高可用性。
在持久存储中存储应用程序状态： 在持久存储（包括块存储、Snowflake 暂存区或 Snowflake 表）上存储数据和有状态的对象。
捕捉 SIGTERM 信号： 在终止服务实例时，Snowflake 会首先将 SIGTERM 信号发送到每个服务容器（请参见终止服务）。在信号处理过程中，容器代码可以在关闭或重启服务实例之前保存服务状态。
设计高可用性服务，使其在维护期间以降级状态运行： 要在维护期间保持可用性，您的服务必须能够承受只有 50% 的实例保持运行。
提供就绪探针： 如果不提供就绪探针，Snowflake 会假定服务实例在代码开始执行时就已就绪。通常情况下，容器完成初始化并准备好处理请求需要一些时间。您应在服务配置中提供就绪探针，以明确告诉 Snowflake 服务实例何时可以处理请求。
监控维护计划： 避免在维护窗口期间安排关键任务。
避免在维护窗口期间调度作业服务运行： Snowflake 可能会在维护窗口期间取消正在运行的作业。
定期备份或执行检查点： 在持久存储（包括块存储、Snowflake 暂存区或 Snowflake 表）上定期备份或检查点应用程序状态。

如何在计算池上调度服务¶

创建服务时，您可能会选择运行多个实例来管理传入负载。对计算池节点上的服务实例进行调度时，Snowflake 遵循以下一般准则：

一个服务实例中的所有容器始终在单个计算池节点上运行。也就是说，一个服务实例永远不会跨越多个节点。
运行多个服务实例时，Snowflake 可能会在同一节点或计算池内的不同节点上运行这些服务实例。在做出这一决定时，Snowflake 会考虑服务规范文件中列出的任何硬性资源要求（例如内存和 GPU）（请参阅 containers.resources 字段）。

例如，假设计算池中的每个节点都提供 8 GB 内存。如果您的服务规范对内存的要求为 6 GB，而您在创建服务时选择运行两个实例，则 Snowflake 无法在同一节点上运行两个实例。在这种情况下，Snowflake 会将每个实例分别安排在计算池内的单独节点上，以满足内存需求。

备注

Snowflake 支持供应用程序容器使用的暂存区挂载。Snowflake 内部暂存区是支持的存储卷类型之一。

为了获得最佳性能，Snowflake 现在将每个计算池节点的暂存区卷挂载总数限制为八个，无论这些卷是否属于同一服务实例、同一服务或不同服务。

当节点达到限制后，Snowflake 不会使用该节点启动使用暂存区卷的新服务实例。如果计算池中的所有节点都达到了限制，Snowflake 将无法启动服务实例。在这种情况下，当您执行 SHOW SERVICE CONTAINERS IN SERVICE 命令时，Snowflake 会返回 PENDING 状态，显示“Unschedulable due to insufficient resources”消息。

为适应节点的暂存区挂载分配限制，在某些情况下可以增加为计算池请求的最大节点数。这样可以确保更多节点可供 Snowflake 启动服务实例。

System compute pools¶

Every Snowflake account includes two system compute pools: one CPU-based and one GPU-based exclusively for the following workloads:

Notebooks
Streamlit apps (CPU only)
Model serving
ML jobs

With system compute pools, you can run these workloads immediately, no compute pool setup required.

The system compute pools have the following default configuration:

Compute pool name: SYSTEM_COMPUTE_POOL_GPU
- 实例系列： 根据您的 Snowflake 账户所在的区域（AWS 或 Microsoft Azure 区域），Snowflake 会为此计算池使用以下 GPU 实例系列。
  - 在 Azure 中，为 GPU_NV_SM。
  - 在 AWS 中，为 GPU_NV_S。
  请注意，以下区域不支持 SYSTEM_COMPUTE_POOL_GPU：
  - 在 AWS 中：新加坡、瑞士北部、巴黎和大阪。
  - 在 Azure 中：US 中部。
  - Google Cloud: GPU compute pool isn't available.
- 默认配置：
  - MIN_NODES=1
  - MAX_NODES=50
  - INITIALLY_SUSPENDED=true
  - AUTO_SUSPEND_SECS=600
Compute pool name: SYSTEM_COMPUTE_POOL_CPU
- 实例系列： CPU_X64_S
- 默认配置：
  - MIN_NODES=1
  - MAX_NODES=150
  - INITIALLY_SUSPENDED=true
  - AUTO_SUSPEND_SECS=259200

Note that,

Compute pools are initially in a suspended state and only begin incurring costs when a supported Snowflake workload starts using them.
For the CPU system compute pool, Snowflake keeps one idle node in the pool at no cost to you whenever the pool is active, so that new workloads can start quickly. The following details apply:
- The idle node is visible in the idle_nodes column of SHOW COMPUTE POOLS and DESCRIBE COMPUTE POOL output.
- The idle node counts against the compute pool's MAX_NODES limit and the per-account node limit.
- Snowflake covers the cost of one idle node. It doesn't appear in your billing.
- When a workload starts on the idle node, that node is billed to you normally and Snowflake provisions a new idle node to replace it.
- This behavior isn't configurable. Contact Snowflake Support (https://community.snowflake.com/s/article/How-To-Submit-a-Support-Case-in-Snowflake-Lodge) if you have questions about this behavior.
If no workloads are running, the GPU compute pool is automatically suspended after 10 minutes and the CPU compute pool is automatically suspended after 3 days. To modify the auto-suspension policy for system compute pools, use the ALTER COMPUTE POOL SET AUTO_SUSPEND_SECS command.

Managing the system compute pools¶

In a Snowflake account, the ACCOUNTADMIN role owns these system compute pools. Administrators have full control over the compute pools, including modifying their properties, suspending operations, and monitoring consumption. The ACCOUNTADMIN role can delete the compute pool. For example:

USE ROLE ACCOUNTADMIN;
ALTER COMPUTE POOL SYSTEM_COMPUTE_POOL_CPU STOP ALL;
DROP COMPUTE POOL SYSTEM_COMPUTE_POOL_CPU;

By default, the USAGE permission on system compute pools is granted to the PUBLIC role, allowing all roles in the account to use them. However, the ACCOUNTADMIN can modify these privileges to restrict access if necessary.

To restrict access to system compute pools to specific roles in your account, use the ACCOUNTADMIN role to revoke the USAGE permission from the PUBLIC role and grant it to the desired role(s). For example:

USE ROLE ACCOUNTADMIN;
REVOKE USAGE ON COMPUTE POOL SYSTEM_COMPUTE_POOL_CPU FROM ROLE PUBLIC;
GRANT USAGE ON COMPUTE POOL SYSTEM_COMPUTE_POOL_CPU TO ROLE <role-name>;

System compute pools can be associated with budgets. for cost management.

Configuring your own preferred compute pools for Streamlit apps¶

When you create a container-runtime Streamlit app and don't specify a COMPUTE_POOL, Snowflake uses the compute pool specified by the DEFAULT_STREAMLIT_COMPUTE_POOL parameter. Snowflake sets this parameter to SYSTEM_COMPUTE_POOL_CPU for new accounts, so Streamlit apps run on the system compute pool by default. To use a different compute pool, set this account-level parameter.

When DEFAULT_STREAMLIT_COMPUTE_POOL is set, the compute pool selector is not shown in the Snowsight creation dialog. The app is created on the default compute pool automatically. To use a different pool, change it after creation using App Settings or ALTER STREAMLIT. See 更改计算池.

The following example configures my_pool as the default compute pool for Streamlit apps:

ALTER ACCOUNT SET DEFAULT_STREAMLIT_COMPUTE_POOL='my_pool';

To restore the compute pool selector in the Snowsight creation dialog, unset the parameter:

ALTER ACCOUNT UNSET DEFAULT_STREAMLIT_COMPUTE_POOL;

Use the following command to check the current compute pool preference configured in your account for Streamlit apps:

SHOW PARAMETERS LIKE 'DEFAULT_STREAMLIT_COMPUTE_POOL' IN ACCOUNT;

有关更多信息，请参阅 SHOW PARAMETERS。

为笔记本配置您自己的首选计算池¶

By default, Notebook services run in system compute pools. If you don't want to use the Snowflake-provisioned compute pools, you have the option to choose other compute pools in your account for Notebooks. To override the Snowflake-provisioned compute pools you can set these parameters (DEFAULT_NOTEBOOK_COMPUTE_POOL_CPU and DEFAULT_NOTEBOOK_COMPUTE_POOL_GPU). Note that, this will change your Snowsight experience. When creating a Notebook in Snowsight, the compute pool you configure using these parameters appears as the first preference in the UI. The following example commands set these parameters:

使用 GPU 运行时将 my_pool 配置为用于笔记本的账户级别首选计算池。
```
ALTER ACCOUNT SET DEFAULT_NOTEBOOK_COMPUTE_POOL_GPU='my_pool';
```
将 my_pool 配置为在数据库 my_db 中创建的笔记本的首选计算池。
```
ALTER DATABASE my_db SET DEFAULT_NOTEBOOK_COMPUTE_POOL_GPU='my_pool';
```
将 my_pool 配置为在架构 my_db.my_schema 中创建的笔记本的首选计算池。
ALTER SCHEMA my_db.my_schema SET DEFAULT_NOTEBOOK_COMPUTE_POOL_GPU='my_pool';

使用以下命令检查当前在您的账户中配置的用于运行笔记本的 GPU 计算池首选项：

SHOW PARAMETERS LIKE 'DEFAULT_NOTEBOOK_COMPUTE_POOL_GPU' IN ACCOUNT;

SHOW PARAMETERS LIKE 'DEFAULT_NOTEBOOK_COMPUTE_POOL_GPU' IN DATABASE my_db;

SHOW PARAMETERS LIKE 'DEFAULT_NOTEBOOK_COMPUTE_POOL_GPU' IN SCHEMA my_db.my_schema;

有关更多信息，请参阅 SHOW PARAMETERS。

准则和限制¶

CREATE COMPUTE POOL 权限： 如果在当前角色下无法创建计算池，请咨询账户管理员以获授权限。例如：
```
GRANT CREATE COMPUTE POOL ON ACCOUNT TO ROLE <role_name> [WITH GRANT OPTION];
```
有关更多信息，请参阅 GRANT <privileges> ... TO ROLE。
每个账户对于计算池节点数的限制。
- 您在账户中可以创建的节点数量上限（不考虑计算池的数量）为 150。
- 每个计算池的最大节点数为 50。
此外，每个实例系列允许的节点数量是有限制的（请参阅实例系列表中的 节点限制 列）。如果出现如下所示的错误信息：Requested number of nodes <#> exceeds the node limit for the account，说明您遇到了此类限制。如需了解更多信息，请联系您的客户代表。