通过序列化文件引入您自己的模型类型¶

模型注册表支持多种内置模型类型。您还可以记录其他类型的模型，包括使用外部工具训练的模型或从开源存储库获取的模型，只要它们是可序列化的，并且扩展 snowflake.ml.model.custom_model.CustomModel 类。

本指南解释如何：

创建自定义模型。
将它们记录到 Snowflake Model Registry。
部署它们以进行推理。

备注

本快速入门 (https://quickstarts.snowflake.com/guide/deploying_custom_models_to_snowflake_model_registry/index.html#0) 提供了示例，说明如何记录自定义 PyCaret 模型。

通过关键字实参定义模型上下文¶

Snowflake ML 在实例化 ModelContext 类时允许任意数量的关键字实参，允许您在定义和初始化自定义模型时轻松包含参数、配置文件或您自己的模型类的实例。

模型上下文的属性可以是支持的模型类型，例如内置模型类型或路径（例如指向包含模型、参数或配置文件的目录的路径）。其常见用途是在自定义模型 __init__ 方法或推理方法中加载 pickle 或 json 文件。

以下示例演示了如何通过模型上下文提供关键字实参，以及如何在自定义模型类中使用它们：

import pickle
import pandas as pd
from snowflake.ml.model import custom_model

# Initialize ModelContext with keyword arguments
# my_model can be any supported model type
# my_file_path is a local pickle file path
mc = custom_model.ModelContext(
    my_model=my_model,
    my_file_path='/path/to/file.pkl',
)

# Define a custom model class that utilizes the context
class ExampleBringYourOwnModel(custom_model.CustomModel):
    def __init__(self, context: custom_model.ModelContext) -> None:
        super().__init__(context)

        # Use 'my_file_path' to load the pickled object
        with open(self.context['my_file_path'], 'rb') as f:
            self.obj = pickle.load(f)


    @custom_model.inference_api
    def predict(self, input: pd.DataFrame) -> pd.DataFrame:
        # Use the model 'my_model' from the context to make predictions
        model_output = self.context['my_model'].predict(input)
        return pd.DataFrame({'output': model_output})

Copy

测试和记录自定义模型¶

您可以通过在本地运行自定义模型来测试它。

my_model = ExampleBringYourOwnModel(mc)
output_df = my_model.predict(input_df)

Copy

当模型按预期工作时，将其记录到 Snowflake Model Registry。如下一个代码示例所示，提供 conda_dependencies``（或者 ``pip_requirements）以指定模型类所需的库。提供 sample_input_data （Pandas 或 Snowpark DataFrame）以推理模型的输入签名。或者，提供模型签名。

reg = Registry(session=sp_session, database_name="ML", schema_name="REGISTRY")
mv = reg.log_model(my_model_pipeline,
            model_name="my_custom_model_pipeline",
            version_name="v1",
            conda_dependencies=["scikit-learn"],
            comment="My Custom ML Model Pipeline",
            sample_input_data=train_features)
output_df = mv.run(input_df)

Copy

示例：记录 PyCaret 模型¶

PyCaret 是 Snowflake 原生不支持的低代码、高效率的第三方包。您可以通过类似的方法引入您自己的模型类型。

第 1 步：定义模型上下文¶

在记录模型之前，请定义一个 ModelContext，引用您自己的、不受 Snowflake ML 原生支持的模型类型。在这种情况下，我们使用上下文的 model_file 属性指定序列化（持久化）模型的路径。

pycaret_mc = custom_model.ModelContext(
  model_file = 'pycaret_best_model.pkl',
)

Copy

第 2 步：创建自定义模型类¶

定义自定义模型类，以便在没有原生支持的情况下记录模型类型。在本示例中，定义了一个派生自 CustomModel 的 PyCaretModel 类，以便模型记录在注册表中。

from pycaret.classification import load_model, predict_model

class PyCaretModel(custom_model.CustomModel):
    def __init__(self, context: custom_model.ModelContext) -> None:
        super().__init__(context)
        model_dir = self.context["model_file"][:-4]  # Remove '.pkl' suffix
        self.model = load_model(model_dir, verbose=False)
        self.model.memory = '/tmp/'  # Update memory directory

    @custom_model.inference_api
    def predict(self, X: pd.DataFrame) -> pd.DataFrame:
        model_output = predict_model(self.model, data=X)
        return pd.DataFrame({
            "prediction_label": model_output['prediction_label'],
            "prediction_score": model_output['prediction_score']
        })

Copy

备注

如上所示，将模型的内存目录设置为 /tmp/。Snowflake 的仓库节点具有受限的目录访问权限。/tmp 始终可写入，当模型需要可写入文件的地方时，它是一个安全的选择。对于其他类型的模型，这可能没有必要。

第 3 步：测试自定义模型¶

使用类似如下的代码在本地测试 PyCaret 模型。

test_data = [
    [1, 237, 1, 1.75, 1.99, 0.00, 0.00, 0, 0, 0.5, 1.99, 1.75, 0.24, 'No', 0.0, 0.0, 0.24, 1],
    # Additional test rows...
]
col_names = ['Id', 'WeekofPurchase', 'StoreID', 'PriceCH', 'PriceMM', 'DiscCH', 'DiscMM',
            'SpecialCH', 'SpecialMM', 'LoyalCH', 'SalePriceMM', 'SalePriceCH',
            'PriceDiff', 'Store7', 'PctDiscMM', 'PctDiscCH', 'ListPriceDiff', 'STORE']

test_df = pd.DataFrame(test_data, columns=col_names)

my_pycaret_model = PyCaretModel(pycaret_mc)
output_df = my_pycaret_model.predict(test_df)

Copy

第 4 步：定义模型签名¶

在此示例中，使用示例数据推理用于输入验证的模型签名：

predict_signature = model_signature.infer_signature(input_data=test_df, output_data=output_df)

Copy

第 5 步：记录模型¶

以下代码在 Snowflake Model Regsitry 中记录（注册）模型。

snowml_registry = Registry(session)

custom_mv = snowml_registry.log_model(
    my_pycaret_model,
    model_name="'my_pycaret_best_model",
    version_name="version_1",
    conda_dependencies=["pycaret==3.0.2", "scipy==1.11.4", "joblib==1.2.0"],
    options={"relax_version": False},
    signatures={"predict": predict_signature},
    comment = 'My PyCaret classification experiment using the CustomModel API'
)

Copy

第 6 步：验证注册表中的模型¶

要验证模型是否在 Model Registry 中，请使用 show_models 函数。

snowml_registry.show_models()

Copy

第 7 步：使用注册的模型进行预测¶

使用 run 函数调用模型以进行预测。

snowpark_df = session.create_dataframe(test_data, schema=col_nms)

custom_mv.run(snowpark_df).show()

Copy

后续步骤¶

通过 Snowflake Model Registry 部署 PyCaret 模型后，您可以在 Snowsight 中查看该模型。导航至 Models 页面的 AI & ML 下面。如果您在此处没有看到模型，请确保您使用的是 ACCOUNTADMIN 角色或您用于记录模型的角色。

要使用来自 SQL 的模型，请使用 SQL，如下所示：

SELECT
    my_pycaret_model!predict(*) AS predict_dict,
    predict_dict['prediction_label']::text AS prediction_label,
    predict_dict['prediction_score']::double AS prediction_score
from pycaret_input_data;

Copy