通过序列化文件引入您自己的模型类型

模型注册表支持直接在注册表中记录 内置模型类型。我们还提供了一种通过 snowflake.ml.model.custom_model.CustomModel 记录其他模型类型的方法。 使用外部工具训练的可序列化模型或从开源存储库获取的可序列化模型可以与 CustomModel 结合使用。

本指南解释如何:

  • 创建自定义模型。

  • 使用文件和模型对象创建模型上下文。

  • Include additional code with your model using code_paths.

  • 将自定义模型记录到 Snowflake Model Registry。

  • 部署模型以进行推理。

备注

This quickstart (https://quickstarts.snowflake.com/guide/deploying_custom_models_to_snowflake_model_registry/) provides an example of logging a custom PyCaret model.

通过关键字实参定义模型上下文

snowflake.ml.model.custom_model.ModelContext 可以使用用户定义的关键字实参进行实例化。这些值可以是字符串文件路径,也可以是 支持的模型类型 实例。文件和序列化模型将与模型打包在一起,以便在模型推理逻辑中使用。

Using in-memory model objects

When working with built-in model types, the recommended approach is to pass in-memory model objects directly to the ModelContext. This allows Snowflake ML to handle serialization automatically.

import pandas as pd
from snowflake.ml.model import custom_model

# Initialize ModelContext with an in-memory model object
# my_model can be any supported model type (e.g., sklearn, xgboost, lightgbm, and others)
model_context = custom_model.ModelContext(
    my_model=my_model,
)

# Define a custom model class that utilizes the context
class ExampleBringYourOwnModel(custom_model.CustomModel):
    def __init__(self, context: custom_model.ModelContext) -> None:
        super().__init__(context)

    @custom_model.inference_api
    def predict(self, input: pd.DataFrame) -> pd.DataFrame:
        # Use the model with key 'my_model' from the context to make predictions
        model_output = self.context['my_model'].predict(input)
        return pd.DataFrame({'output': model_output})

# Instantiate the custom model with the model context. This instance can be logged in the model registry.
my_model = ExampleBringYourOwnModel(model_context)
Copy

备注

In your custom model class, always access model objects through the model context. For example, use self.model = self.context['my_model'] instead of directly assigning self.model = model (where model is an in-memory model object). Accessing the model directly captures a second copy of the model in a closure, which results in significantly larger model files during serialization.

Using serialized files

For models or data that are stored in serialized files like Python pickles or JSON, you can provide file paths to your ModelContext. Files can be serialized models, configuration files, or files containing parameters. This is useful when working with pre-trained models saved to disk or configuration data.

import pickle
import pandas as pd
from snowflake.ml.model import custom_model

# Initialize ModelContext with a file path
# my_file_path is a local pickle file path
model_context = custom_model.ModelContext(
    my_file_path='/path/to/file.pkl',
)

# Define a custom model class that loads the pickled object
class ExampleBringYourOwnModel(custom_model.CustomModel):
    def __init__(self, context: custom_model.ModelContext) -> None:
        super().__init__(context)

        # Use 'my_file_path' key from the context to load the pickled object
        with open(self.context['my_file_path'], 'rb') as f:
            self.obj = pickle.load(f)

    @custom_model.inference_api
    def predict(self, input: pd.DataFrame) -> pd.DataFrame:
        # Use the loaded object to make predictions
        model_output = self.obj.predict(input)
        return pd.DataFrame({'output': model_output})

# Instantiate the custom model with the model context. This instance can be logged in the model registry.
my_model = ExampleBringYourOwnModel(model_context)
Copy

重要

When you combine a supported model type (such as XGBoost) with unsupported models or data, you don't need to serialize the supported model yourself. Set the supported model object directly in the context (e.g., base_model = my_xgb_model) and it is serialized automatically.

测试和记录自定义模型

您可以通过在本地运行自定义模型来测试它。

my_model = ExampleBringYourOwnModel(model_context)
output_df = my_model.predict(input_df)
Copy

当模型按预期工作时,将其记录到 Snowflake Model Registry。如下一个代码示例所示,提供 conda_dependencies``(或者 ``pip_requirements)以指定模型类所需的库。提供 sample_input_data (Pandas 或 Snowpark DataFrame)以推理模型的输入签名。或者,提供 模型签名

reg = Registry(session=sp_session, database_name="ML", schema_name="REGISTRY")
mv = reg.log_model(my_model,
            model_name="my_custom_model",
            version_name="v1",
            conda_dependencies=["scikit-learn"],
            comment="My Custom ML Model",
            sample_input_data=train_features)
output_df = mv.run(input_df)
Copy

Including additional code with code_paths

Use the code_paths parameter in Registry.log_model to package Python code, such as helper modules, utilities, and configuration files with your model. You can import this code just as you would locally.

You can either provide string paths to copy files or directories, or CodePath objects. The objects provide more control over which subdirectories or files are included, and the import paths that will be used by the model.

Using string paths

Pass a list of string paths to include files or directories. The last component of each path becomes the importable module name.

mv = reg.log_model(
    my_model,
    model_name="my_model",
    version_name="v1",
    code_paths=["src/mymodule"],  # import with: import mymodule
)
Copy

Using CodePath with filter

Use the CodePath class when you want to package only part of a directory tree or control the import paths used by your model.

from snowflake.ml.model import CodePath
Copy

A CodePath has two parameters:

  • root: A directory or file path.

  • filter (optional): A relative path under root that selects a subdirectory or file.

When filter is provided, the source is root/filter, and the filter value determines the import path. For example, filter="utils" allows you to import utils, and filter="pkg/subpkg" allows you to import pkg.subpkg.

Example: Given this project structure:

my_project/src/
├── utils/
│   └── preprocessing.py
├── models/
│   └── classifier.py
└── tests/          # Not needed for inference
Copy

To package only utils/ and models/, excluding tests/:

mv = reg.log_model(
    my_model,
    model_name="my_model",
    version_name="v1",
    code_paths=[
        CodePath("my_project/src/", filter="utils/"),
        CodePath("my_project/src/", filter="models/"),
    ],
)
Copy

You can also filter a single file:

code_paths=[
    CodePath("my_project/src/", filter="utils/preprocessing.py"),
]
# Import with: import utils.preprocessing
Copy

示例:记录 PyCaret 模型

The following example uses PyCaret to log a custom model type. PyCaret is a low-code, high-efficiency third-party package that Snowflake doesn't support natively. You can bring your own model types using similar methods.

第 1 步:定义模型上下文

Before you log your model, define the model context. The model context refers to your own custom model type. The following example specifies the path to the serialized (pickled) model using the context's model_file attribute. You can choose any name for the attribute as long as the name is not used for anything else.

pycaret_model_context = custom_model.ModelContext(
  model_file = 'pycaret_best_model.pkl',
)
Copy

第 2 步:创建自定义模型类

定义自定义模型类,以便在没有原生支持的情况下记录模型类型。在本示例中,定义了一个派生自 CustomModelPyCaretModel 类,以便模型记录在注册表中。

from pycaret.classification import load_model, predict_model

class PyCaretModel(custom_model.CustomModel):
    def __init__(self, context: custom_model.ModelContext) -> None:
        super().__init__(context)
        model_dir = self.context["model_file"][:-4]  # Remove '.pkl' suffix
        self.model = load_model(model_dir, verbose=False)
        self.model.memory = '/tmp/'  # Update memory directory

    @custom_model.inference_api
    def predict(self, X: pd.DataFrame) -> pd.DataFrame:
        model_output = predict_model(self.model, data=X)
        return pd.DataFrame({
            "prediction_label": model_output['prediction_label'],
            "prediction_score": model_output['prediction_score']
        })
Copy

备注

如上所示,将模型的内存目录设置为 /tmp/。Snowflake 的仓库节点具有受限的目录访问权限。/tmp 始终可写入,当模型需要可写入文件的地方时,它是一个安全的选择。对于其他类型的模型,这可能没有必要。

第 3 步:测试自定义模型

使用类似如下的代码在本地测试 PyCaret 模型。

test_data = [
    [1, 237, 1, 1.75, 1.99, 0.00, 0.00, 0, 0, 0.5, 1.99, 1.75, 0.24, 'No', 0.0, 0.0, 0.24, 1],
    # Additional test rows...
]
col_names = ['Id', 'WeekofPurchase', 'StoreID', 'PriceCH', 'PriceMM', 'DiscCH', 'DiscMM',
            'SpecialCH', 'SpecialMM', 'LoyalCH', 'SalePriceMM', 'SalePriceCH',
            'PriceDiff', 'Store7', 'PctDiscMM', 'PctDiscCH', 'ListPriceDiff', 'STORE']

test_df = pd.DataFrame(test_data, columns=col_names)

my_pycaret_model = PyCaretModel(pycaret_model_context)
output_df = my_pycaret_model.predict(test_df)
Copy

第 4 步:定义模型签名

在此示例中,使用示例数据推理用于输入验证的 模型签名

predict_signature = model_signature.infer_signature(input_data=test_df, output_data=output_df)
Copy

第 5 步:记录模型

以下代码在 Snowflake Model Regsitry 中记录(注册)模型。

snowml_registry = Registry(session)

custom_mv = snowml_registry.log_model(
    my_pycaret_model,
    model_name="my_pycaret_best_model",
    version_name="version_1",
    conda_dependencies=["pycaret==3.0.2", "scipy==1.11.4", "joblib==1.2.0"],
    options={"relax_version": False},
    signatures={"predict": predict_signature},
    comment = 'My PyCaret classification experiment using the CustomModel API'
)
Copy

第 6 步:验证注册表中的模型

要验证模型是否在 Model Registry 中,请使用 show_models 函数。

snowml_registry.show_models()
Copy

第 7 步:使用注册的模型进行预测

使用 run 函数调用模型以进行预测。

snowpark_df = session.create_dataframe(test_data, schema=col_nms)

custom_mv.run(snowpark_df).show()
Copy

后续步骤

After deploying a PyCaret model by way of the Snowflake Model Registry, you can view the model in Snowsight. In the navigation menu, select AI & ML » Models. If you do not see it there, make sure you are using the ACCOUNTADMIN role or the role you used to log the model.

要使用来自 SQL 的模型,请使用 SQL,如下所示:

SELECT
    my_pycaret_model!predict(*) AS predict_dict,
    predict_dict['prediction_label']::text AS prediction_label,
    predict_dict['prediction_score']::double AS prediction_score
from pycaret_input_data;
Copy
语言: 中文