ML 的容器运行时¶

概述¶

ML 的容器运行时是一组预配置的可定制环境，专为 Snowpark Container Services 上的机器学习构建，涵盖交互式实验和批量 ML 工作负载，例如模型训练、超参数调优、批量推理和微调。它们包括最热门的机器学习和深度学习框架。与 Snowflake 笔记本一起使用，可提供端到端的 ML 体验。

执行环境¶

ML 的容器运行时提供了一个环境，其中装有支持 Snowflake 内各种 ML 开发任务的包和库。除了预安装的包外，您还可以从外部来源（如公共 PyPI 存储库）或内部托管的包存储库导入包，这些包存储库提供了经批准可在您组织内部使用的包列表。

自定义 Python ML 工作负载和受支持的训练 APIs 在 Snowpark Container Services 中执行，该服务可在 CPU 或 GPU 计算池上运行。使用 Snowflake ML APIs 时，ML 的容器运行时会将处理任务分配到可用资源中。

分布式处理¶

Snowflake ML 建模和数据加载 APIs 建立在 Snowflake ML 的分布式处理框架之上，通过充分利用可用的计算能力来最大程度地提高资源利用率。默认情况下，该框架在多 GPU 节点上使用所有 GPUs，与开源的包相比，性能显著提高，并缩短了整体运行时间。

机器学习工作负载（包括数据加载）在 Snowflake 管理的计算环境中执行。该框架允许根据当前任务的具体要求动态扩展资源，例如训练模型或加载数据。每个任务的资源数量（包括 GPU 和内存分配）可通过提供的 APIs 轻松配置。

优化的数据加载¶

容器运行时提供了一系列数据连接器 APIs，支持将 Snowflake 数据源（包括表、DataFrames 和 Datasets）连接到流行的 ML 框架（例如 PyTorch 和 TensorFlow），充分利用多个核心或 GPUs。一旦加载，数据可以使用开源包或任何 Snowflake ML APIs（包括下面所述的分布式版本）进行处理。这些 APIs 可以在 snowflake.ml.data 命名空间中找到。

snowflake.ml.data.data_connector.DataConnector 类将 Snowpark DataFrames 或 Snowflake ML Datasets 连接到 TensorFlow 或 PyTorch DataSets 或 Pandas DataFrames。使用以下类方法之一实例化连接器：

DataConnector.from_dataframe 接受 Snowpark DataFrame。

DataConnector.from_dataset：接受 Snowflake ML 数据集，按名称和版本指定。

DataConnector.from_sources：接受源列表，每个源可以是 DataFrame 或数据集。

您实例化了连接器（例如，调用实例 data_connector）之后，请调用以下方法以生成所需类型的输出。

data_connector.to_tf_dataset：返回适合与 TensorFlow 一起使用的 TensorFlow 数据集。
data_connector.to_torch_dataset：返回适合与 PyTorch 一起使用的 PyTorch 数据集。

有关这些 APIs 的更多信息，请参阅 Snowflake ML API 参考。

使用开源构建¶

凭借预先填充热门 ML 包的基础 CPU 和 GPU 图像以及灵活性，以使用 pip 安装其他库，用户可以在 Snowflake Notebooks 中使用熟悉和创新的开源框架，而无需将数据移出 Snowflake。您可以将用于数据加载、训练和超参数优化的 Snowflake 分布式 APIs 与热门 OSS 包的熟悉 APIs 结合使用，只需对接口进行小型更改以允许扩展配置，从而扩展处理。

以下代码演示如何使用这些 APIs 创建 XGBoost 分类器：

from snowflake.snowpark.context import get_active_session
from snowflake.ml.data.data_connector import DataConnector
import pandas as pd
import xgboost as xgb
from sklearn.model_selection import train_test_split

session = get_active_session()

# Use the DataConnector API to pull in large data efficiently
df = session.table("my_dataset")
pandas_df = DataConnector.from_dataframe(df).to_pandas()

# Build with open source

X = df_pd[['feature1', 'feature2']]
y = df_pd['label']

# Split data into test and train in memory
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15, random_state=34)

# Train in memory
model = xgb.XGBClassifier()
model.fit(X_train, y_train)

# Predict
y_pred = model.predict(X_test)

Copy

优化的训练¶

ML 的容器运行时提供一系列分布式训练 APIs（包括 LightGBM、PyTorch 和 XGBoost 的分布式版本），充分利用容器环境中的可用资源。这些可以在 snowflake.ml.modeling.distributors 命名空间中找到。分布式类的 APIs 与标准版本的类似。

有关这些 APIs 的更多信息，请参阅 API 参考。

XGBoost¶

主要 XGBoost 类是 snowflake.ml.modeling.distributors.xgboost.XGBEstimator。相关类包括：

snowflake.ml.modeling.distributors.xgboost.XGBScalingConfig

要查看与此 API 搭配使用的示例，请参阅 Snowflake ML 容器运行时 GitHub 存储库中的 XGBoost on GPU (https://github.com/Snowflake-Labs/sfguide-getting-started-with-container-runtime-apis/blob/main/XGBoost_on_GPU_Quickstart.ipynb) 示例笔记本。

LightGBM¶

主要 LightGBM 类是 snowflake.ml.modeling.distributors.lightgbm.LightGBMEstimator。相关类包括：

snowflake.ml.modeling.distributors.lightgbm.LightGBMScalingConfig

要查看与此 API 搭配使用的示例，请参阅 Snowflake ML 容器运行时 GitHub 存储库中的 LightGBM on GPU (https://github.com/Snowflake-Labs/sfguide-getting-started-with-container-runtime-apis/blob/main/LightGBM_on_GPU_Quickstart.ipynb) 示例笔记本。

PyTorch¶

主要 PyTorch 类是 snowflake.ml.modeling.distributors.pytorch.PyTorchDistributor。相关的类和函数包括：

snowflake.ml.modeling.distributors.pytorch.WorkerResourceConfig
snowflake.ml.modeling.distributors.pytorch.PyTorchScalingConfig
snowflake.ml.modeling.distributors.pytorch.Context
snowflake.ml.modeling.distributors.pytorch.get_context

要查看与此 API 搭配使用的示例，请参阅 Snowflake ML 容器运行时 GitHub 存储库中的 PyTorch on GPU (https://github.com/Snowflake-Labs/sfguide-getting-started-with-container-runtime-apis/blob/main/PyTorch_on_GPU_Quickstart.ipynb) 示例笔记本。

Snowflake ML 建模 APIs¶

当 Snowflake ML 的建模 APIs 在 Notebook 中使用时，所有执行都发生在容器运行时，而不是在查询仓库中，例外情况是 snowflake.ml.modeling.preprocessing APIs，在查询仓库中执行。

限制¶

在 ML 的容器运行时中，Snowflake ML 建模 API 仅支持 predict、predict_proba 和 predict_log_proba 推理方法。其他方法在查询仓库中运行。
Snowflake ML 建模 API 仅在 ML 的容器运行时上支持与 sklearn 兼容的管道。
在 ML 的容器运行时上，Snowflake ML 建模 API 不支持预处理或指标。这些 APIs 在查询仓库中运行。
fit、predict 和 score 方法在 ML 容器运行时中执行。其他 Snowflake ML 方法在查询仓库中运行。
sample_weight_cols 不支持 XGBoost 或 LightGBM 模型。

容器运行时镜像规范¶

在创建笔记本以用于在容器运行时上运行时，可以选择 CPU 或者 GPU 镜像类型。这两个镜像都预装了 scikit-learn 和 PyTorch 等热门 ML 框架。您还可以使用 Snowpark ML 及其包含的所有内容。

CPU v1 镜像的完整列表¶

此表列出了预安装在 CPU v1 镜像上的所有 Python 包：

包	版本
absl-py	1.4.0
aiobotocore	2.7.0
aiohttp	3.9.5
aiohttp-cors	0.7.0
aioitertools	0.12.0
aiosignal	1.2.0
aiosignal	1.3.1
altair	5.4.1
annotated-types	0.6.0
anyio	3.5.0
appdirs	1.4.4
arviz	0.17.1
asn1crypto	1.5.1
asttokens	2.0.5
async-timeout	4.0.3
async-timeout	4.0.3
atpublic	4.0
attrs	23.1.0
attrs	23.2.0
backoff	2.2.1
bayesian-optimization	1.5.1
blinker	1.6.2
botocore	1.31.64
bottleneck	1.3.7
brotli	1.0.9
cachetools	5.3.3
causalpy	0.4.0
certifi	2024.8.30
cffi	1.16.0
charset-normalizer	3.3.2
click	8.1.7
clikit	0.6.2
cloudpickle	2.2.1
cmdstanpy	1.2.4
colorama	0.4.6
colorful	0.5.4
cons	0.4.6
contourpy	1.2.0
crashtest	0.3.1
cryptography	42.0.8
cycler	0.11.0
datasets	2.16.1
decorator	5.1.1
deprecated	1.2.13
dill	0.3.7
distlib	0.3.8
etuples	0.3.9
evaluate	0.4.2
exceptiongroup	1.2.0
executing	0.8.3
filelock	3.13.1
flask	3.0.3
fonttools	4.51.0
frozenlist	1.4.0
frozenlist	1.4.1
fsspec	2023.10.0
gitdb	4.0.7
gitpython	3.1.41
gmpy2	2.1.2
google-api-core	2.19.1
google-auth	2.29.0
googleapis-common-protos	1.63.2
graphviz	0.20.1
grpcio	1.66.1
grpcio-tools	1.62.3
gunicorn	22.0.0
h5netcdf	1.2.0
h5py	3.11.0
holidays	0.57
httpstan	4.13.0
huggingface-hub	0.24.6
idna	3.6
idna	3.7
importlib-metadata	6.11.0
importlib-resources	6.4.5
ipython	8.27.0
itsdangerous	2.2.0
jedi	0.19.1
jinja2	3.1.4
jmespath	1.0.1
joblib	1.4.2
jsonschema	4.19.2
jsonschema-specifications	2023.7.1
kiwisolver	1.4.4
lightgbm	3.3.5
lightgbm-ray	0.1.9
logical-unification	0.4.6
markdown-it-py	2.2.0
markupsafe	2.1.3
marshmallow	3.22.0
matplotlib	3.8.4
matplotlib-inline	0.1.6
mdurl	0.1.0
minikanren	1.0.3
mkl-fft	1.3.10
mkl-random	1.2.7
mkl-service	2.4.0
mlruntimes-client	0.2.0
mlruntimes-service	0.2.0
modin	0.31.0
mpmath	1.3.0
msgpack	1.0.3
multidict	6.0.4
multidict	6.0.5
multipledispatch	0.6.0
multiprocess	0.70.15
narwhals	1.8.4
networkx	3.3
nltk	3.9.1
numexpr	2.8.7
numpy	1.24.3
opencensus	0.11.3
opencensus-context	0.1.3
opencv-python	4.10.0.84
opentelemetry-api	1.23.0
opentelemetry-exporter-otlp-proto-common	1.23.0
opentelemetry-exporter-otlp-proto-grpc	1.25.0
opentelemetry-proto	1.23.0
opentelemetry-sdk	1.23.0
opentelemetry-semantic-conventions	0.44b0
packaging	23.1
pandas	2.2.3
parso	0.8.3
pastel	0.2.1
patsy	0.5.6
pexpect	4.8.0
pillow	9.5.0
pip	24.2
platformdirs	2.6.2
plotly	5.22.0
ply	3.11
prometheus-client	0.20.0
prompt-toolkit	3.0.43
prophet	1.1.5
proto-plus	1.24.0
protobuf	4.24.4
psutil	5.9.0
ptyprocess	0.7.0
pure-eval	0.2.2
pyarrow	15.0.0
pyarrow-hotfix	0.6
pyasn1	0.4.8
pyasn1-modules	0.2.8
pycparser	2.21
pydantic	2.8.2
pydantic-core	2.20.1
pydeck	0.9.1
pygments	2.15.1
pyjwt	2.8.0
pylev	1.4.0
pymc	5.16.1
pympler	1.1
pyopenssl	24.2.1
pyparsing	3.0.9
pyqt5	5.15.10
pyqt5-sip	12.13.0
pysimdjson	6.0.2
pysocks	1.7.1
pystan	3.10.0
pytensor	2.13.1
pytensor	2.23.0
python-dateutil	2.8.3+snowflake1
pytimeparse	1.1.8
pytz	2024.1
pytz-deprecation-shim	0.1.0.post0
pyyaml	6.0.1
ray	2.10.0
referencing	0.30.2
regex	2024.7.24
requests	2.32.3
retrying	1.3.4
rich	13.7.1
rpds-py	0.10.6
rsa	4.7.2
s3fs	2023.10.0
safetensors	0.4.4
scikit-learn	1.3.0
scipy	1.13.1
seaborn	0.13.2
setproctitle	1.2.2
setuptools	70.0.0
sip	6.7.12
six	1.16.0
smart-open	5.2.1
smmap	4.0.0
sniffio	1.3.0
snowbooks	1.46.0
snowflake	0.12.1
snowflake-connector-python	3.12.0
snowflake-core	0.12.1
snowflake-legacy	0.12.1
snowflake-ml-python	1.6.2
snowflake-snowpark-python	1.18.0
snowflake-telemetry-python	0.5.0
sortedcontainers	2.4.0
sqlparse	0.5.1
stack-data	0.2.0
stanio	0.5.1
statsmodels	0.14.2
streamlit	1.26.0
sympy	1.13.2
tenacity	8.2.3
tensorboardx	2.6.2.2
threadpoolctl	3.5.0
tokenizers	0.15.1
toml	0.10.2
tomli	2.0.1
tomlkit	0.11.1
toolz	0.12.0
torch	2.3.0
tornado	6.4.1
tqdm	4.66.4
traitlets	5.14.3
transformers	4.36.0
typing-extensions	4.12.2
tzdata	2024.2
tzlocal	4.3.1
unicodedata2	15.1.0
urllib3	2.0.7
validators	0.34.0
virtualenv	20.17.1
watchdog	5.0.3
wcwidth	0.2.5
webargs	8.6.0
werkzeug	3.0.3
wheel	0.43.0
wrapt	1.14.1
xarray	2023.6.0
xarray-einstats	0.6.0
xgboost	1.7.6
xgboost-ray	0.1.19
xxhash	2.0.2
yarl	1.11.0
yarl	1.9.4
zipp	3.17.0

GPU v1 镜像的完整列表¶

此表列出了预安装在 GPU v1 镜像上的所有 Python 包：

包	版本
absl-py	1.4.0
accelerate	0.34.2
aiobotocore	2.7.0
aiohttp	3.9.5
aiohttp-cors	0.7.0
aioitertools	0.12.0
aiosignal	1.2.0
aiosignal	1.3.1
altair	5.4.1
annotated-types	0.6.0
anyio	3.5.0
appdirs	1.4.4
arviz	0.17.1
asn1crypto	1.5.1
asttokens	2.0.5
async-timeout	4.0.3
async-timeout	4.0.3
atpublic	4.0
attrs	23.1.0
attrs	23.2.0
backoff	2.2.1
bayesian-optimization	1.5.1
blinker	1.6.2
botocore	1.31.64
bottleneck	1.3.7
brotli	1.0.9
cachetools	5.3.3
causalpy	0.4.0
certifi	2024.8.30
cffi	1.16.0
charset-normalizer	3.3.2
click	8.1.7
clikit	0.6.2
cloudpickle	2.0.0
cmake	3.30.3
cmdstanpy	1.2.4
colorama	0.4.6
colorful	0.5.4
cons	0.4.6
contourpy	1.2.0
crashtest	0.3.1
cryptography	42.0.8
cycler	0.11.0
datasets	2.16.1
decorator	5.1.1
deprecated	1.2.13
dill	0.3.7
diskcache	5.6.3
distlib	0.3.8
distro	1.9.0
etuples	0.3.9
evaluate	0.4.2
exceptiongroup	1.2.0
executing	0.8.3
fastapi	0.115.0
filelock	3.13.1
flask	3.0.3
fonttools	4.51.0
frozenlist	1.4.0
frozenlist	1.4.1
fsspec	2023.10.0
gitdb	4.0.7
gitpython	3.1.41
gmpy2	2.1.2
google-api-core	2.19.1
google-auth	2.29.0
googleapis-common-protos	1.63.2
graphviz	0.20.1
grpcio	1.66.1
grpcio-tools	1.62.3
gunicorn	22.0.0
h11	0.14.0
h5netcdf	1.2.0
h5py	3.11.0
holidays	0.57
httpcore	1.0.5
httpstan	4.13.0
httptools	0.6.1
httpx	0.27.2
huggingface-hub	0.24.6
idna	3.6
idna	3.7
importlib-metadata	6.11.0
importlib-resources	6.4.5
interegular	0.3.3
ipython	8.27.0
itsdangerous	2.2.0
jedi	0.19.1
jinja2	3.1.4
jiter	0.5.0
jmespath	1.0.1
joblib	1.4.2
jsonschema	4.19.2
jsonschema-specifications	2023.7.1
kiwisolver	1.4.4
lark	1.2.2
lightgbm	4.5.0
lightgbm-ray	0.1.9
llvmlite	0.43.0
lm-format-enforcer	0.10.3
logical-unification	0.4.6
markdown-it-py	2.2.0
markupsafe	2.1.3
marshmallow	3.22.0
matplotlib	3.8.4
matplotlib-inline	0.1.6
mdurl	0.1.0
minikanren	1.0.3
mkl-fft	1.3.10
mkl-random	1.2.7
mkl-service	2.4.0
mlruntimes-client	0.2.0
mlruntimes-service	0.2.0
modin	0.31.0
mpmath	1.3.0
msgpack	1.0.3
multidict	6.0.4
multidict	6.0.5
multipledispatch	0.6.0
multiprocess	0.70.15
narwhals	1.8.4
nest-asyncio	1.6.0
networkx	3.3
ninja	1.11.1.1
nltk	3.9.1
numba	0.60.0
numexpr	2.8.7
numpy	1.24.3
nvidia-cublas-cu12	12.1.3.1
nvidia-cuda-cupti-cu12	12.1.105
nvidia-cuda-nvrtc-cu12	12.1.105
nvidia-cuda-runtime-cu12	12.1.105
nvidia-cudnn-cu12	8.9.2.26
nvidia-cufft-cu12	11.0.2.54
nvidia-curand-cu12	10.3.2.106
nvidia-cusolver-cu12	11.4.5.107
nvidia-cusparse-cu12	12.1.0.106
nvidia-ml-py	12.560.30
nvidia-nccl-cu12	2.20.5
nvidia-nvjitlink-cu12	12.6.68
nvidia-nvtx-cu12	12.1.105
openai	1.50.1
opencensus	0.11.3
opencensus-context	0.1.3
opencv-python	4.10.0.84
opentelemetry-api	1.23.0
opentelemetry-exporter-otlp-proto-common	1.23.0
opentelemetry-exporter-otlp-proto-grpc	1.25.0
opentelemetry-proto	1.23.0
opentelemetry-sdk	1.23.0
opentelemetry-semantic-conventions	0.44b0
outlines	0.0.46
packaging	23.1
pandas	2.2.3
parso	0.8.3
pastel	0.2.1
patsy	0.5.6
peft	0.5.0
pexpect	4.8.0
pillow	9.5.0
pip	24.2
platformdirs	2.6.2
plotly	5.22.0
ply	3.11
prometheus-client	0.20.0
prometheus-fastapi-instrumentator	7.0.0
prompt-toolkit	3.0.43
prophet	1.1.5
proto-plus	1.24.0
protobuf	4.24.4
psutil	5.9.0
ptyprocess	0.7.0
pure-eval	0.2.2
py-cpuinfo	9.0.0
pyairports	2.1.1
pyarrow	15.0.0
pyarrow-hotfix	0.6
pyasn1	0.4.8
pyasn1-modules	0.2.8
pycountry	24.6.1
pycparser	2.21
pydantic	2.8.2
pydantic-core	2.20.1
pydeck	0.9.1
pygments	2.15.1
pyjwt	2.8.0
pylev	1.4.0
pymc	5.16.1
pympler	1.1
pyopenssl	24.2.1
pyparsing	3.0.9
pyqt5	5.15.10
pyqt5-sip	12.13.0
pysimdjson	6.0.2
pysocks	1.7.1
pystan	3.10.0
pytensor	2.13.1
pytensor	2.23.0
python-dateutil	2.8.3+snowflake1
python-dotenv	1.0.1
pytimeparse	1.1.8
pytz	2024.1
pytz-deprecation-shim	0.1.0.post0
pyyaml	6.0.1
pyzmq	26.2.0
ray	2.10.0
referencing	0.30.2
regex	2024.7.24
requests	2.32.3
retrying	1.3.4
rich	13.7.1
rpds-py	0.10.6
rsa	4.7.2
s3fs	2023.10.0
safetensors	0.4.4
scikit-learn	1.3.0
scipy	1.9.3
seaborn	0.13.2
sentencepiece	0.1.99
setproctitle	1.2.2
setuptools	70.0.0
sip	6.7.12
six	1.16.0
smart-open	5.2.1
smmap	4.0.0
sniffio	1.3.0
snowbooks	1.46.0
snowflake	0.12.1
snowflake-connector-python	3.12.0
snowflake-core	0.12.1
snowflake-legacy	0.12.1
snowflake-ml-python	1.6.2
snowflake-snowpark-python	1.18.0
snowflake-telemetry-python	0.5.0
sortedcontainers	2.4.0
sqlparse	0.5.1
stack-data	0.2.0
stanio	0.5.1
starlette	0.38.6
statsmodels	0.14.2
streamlit	1.26.0
sympy	1.13.2
tenacity	8.2.3
tensorboardx	2.6.2.2
threadpoolctl	3.5.0
tiktoken	0.7.0
tokenizers	0.20.0
toml	0.10.2
tomli	2.0.1
tomlkit	0.11.1
toolz	0.12.0
torch	2.3.1
torchvision	0.18.1
tornado	6.4.1
tqdm	4.66.4
traitlets	5.14.3
transformers	4.45.1
triton	2.3.1
typing-extensions	4.12.2
tzdata	2024.2
tzlocal	4.3.1
unicodedata2	15.1.0
urllib3	2.0.7
uvicorn	0.31.0
uvloop	0.20.0
validators	0.34.0
virtualenv	20.17.1
vllm	0.5.3.post1
vllm-flash-attn	2.5.9.post1
watchdog	5.0.3
watchfiles	0.24.0
wcwidth	0.2.5
webargs	8.6.0
websockets	13.1
werkzeug	3.0.3
wheel	0.43.0
wrapt	1.14.1
xarray	2023.6.0
xarray-einstats	0.6.0
xformers	0.0.27
xgboost	1.7.6
xgboost-ray	0.1.19
xxhash	2.0.2
yarl	1.11.0
yarl	1.9.4
zipp	3.17.0

后续步骤¶

要使用 ML 的容器运行时试用笔记本，请参阅 ML 的容器运行时的笔记本。