使用模型进行预处理和后处理

本主题以许多模型类型和场景为例,解释了如何创建模型,将它们记录到 Snowflake Model Registry 中并部署它们。这些对象包括:

  • 内存中 scikit-learn 模型和管道。

  • 您自己的自定义模型。

  • 多个模型。

内存中 scikit-learn 模型和管道

Snowflak ML 结合使用关键字实参和 ModelContext 类,将内存中 scikit-learn 模型无缝集成到 Model Registry。以下示例展示了将内存中 scikit-learn 模型作为关键字实参传递到模型上下文并在自定义模型类中调用它。

from sklearn import datasets, svm
import pandas as pd
from snowflake.ml.model import custom_model

# Step 1: Import the Iris dataset
iris_X, iris_y = datasets.load_iris(return_X_y=True)

# Step 2: Initialize a scikit-learn LinearSVC model and train it
svc = svm.LinearSVC()
svc.fit(iris_X, iris_y)

# Step 3: Initialize ModelContext with keyword arguments
mc = custom_model.ModelContext(
    my_model=svc,
)

# Step 4: Define a custom model class to utilize the context
class ExampleSklearnModel(custom_model.CustomModel):
    def __init__(self, context: custom_model.ModelContext) -> None:
        super().__init__(context)

    @custom_model.inference_api
    def predict(self, input: pd.DataFrame) -> pd.DataFrame:
        # Use the model from the context for predictions
        model_output = self.context['my_model'].predict(input)
        # Return the predictions in a DataFrame
        return pd.DataFrame({'output': model_output})
Copy

结合使用 scikit-learn 管道与 Snowflake ML

以下示例展示了如何在 Snowflake ML 中使用 scikit-learn 管道。这包括预处理步骤(例如扩展或赋值),然后是预测模型,所有这些都在自定义模型类中使用 ModelContext 进行管理。

from sklearn import datasets
from sklearn.svm import SVC
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer
import pandas as pd
from snowflake.ml.model import custom_model

# Step 1: Load the Iris dataset
iris_X, iris_y = datasets.load_iris(return_X_y=True)

# Step 2: Create a scikit-learn pipeline
# The pipeline includes:
# - A SimpleImputer to handle missing values
# - A StandardScaler to standardize the data
# - A Support Vector Classifier (SVC) for predictions
pipeline = Pipeline([
    ('imputer', SimpleImputer(strategy='mean')),
    ('scaler', StandardScaler()),
    ('classifier', SVC(kernel='linear', probability=True))
])

# Step 3: Fit the pipeline to the dataset
pipeline.fit(iris_X, iris_y)

# Step 4: Initialize ModelContext with the pipeline
mc = custom_model.ModelContext(
    pipeline_model=pipeline,
)

# Step 5: Define a custom model class to utilize the pipeline
class ExamplePipelineModel(custom_model.CustomModel):
    def __init__(self, context: custom_model.ModelContext) -> None:
        super().__init__(context)

    @custom_model.inference_api
    def predict(self, input: pd.DataFrame) -> pd.DataFrame:
        # Use the pipeline from the context to process input and make predictions
        predictions = self.context['pipeline_model'].predict(input)
        probabilities = self.context['pipeline_model'].predict_proba(input)

        # Return predictions and probabilities as a DataFrame
        return pd.DataFrame({
            'predictions': predictions,
            'probability_class_0': probabilities[:, 0],
            'probability_class_1': probabilities[:, 1]
        })

# Example usage:
# Convert new input data into a DataFrame
new_input = pd.DataFrame(iris_X[:5])  # Using the first 5 samples for demonstration

# Initialize the custom model and run predictions
custom_pipeline_model = ExamplePipelineModel(context=mc)
result = custom_pipeline_model.predict(new_input)

print(result)
Copy

使用您自己的模型

以下示例使用您自己的模型作为自定义模型。

mc = custom_model.ModelContext(
    my_model=your_own_model,
)

from snowflake.ml.model import custom_model
import pandas as pd
import json

class ExampleYourOwnModel(custom_model.CustomModel):
    def __init__(self, context: custom_model.ModelContext) -> None:
        super().__init__(context)

    @custom_model.inference_api
    def predict(self, input: pd.DataFrame) -> pd.DataFrame:
        model_output = self.context['my_model'].predict(features)
        return pd.DataFrame({'output': model_output})
Copy

使用多个模型

下面是一个自定义模型,结合了多个模型,并在生成预测时使用配置文件以应用偏差。

mc = custom_model.ModelContext(
    model1=model1,
    model2=model2,
    feature_preproc=preproc
    }
)
Copy

备注

model1model2 是注册表本机支持的任何类型模型的对象。feature_preproc 是一个 scikit-learn pipeline 对象。

from snowflake.ml.model import custom_model
import pandas as pd
import json

class ExamplePipelineModel(custom_model.CustomModel):

    @custom_model.inference_api
    def predict(self, input: pd.DataFrame) -> pd.DataFrame:
        ...
        return pd.DataFrame(...)


# Here is the fully-functional custom model that uses both model1 and model2
class ExamplePipelineModel(custom_model.CustomModel):
    def __init__(self, context: custom_model.ModelContext) -> None:
        super().__init__(context)

    @custom_model.inference_api
    def predict(self, input: pd.DataFrame) -> pd.DataFrame:
        features = self.context['feature_preproc'].transform(input)
        model_output = self.context['model1'].predict(
            self.context['model2'].predict(features)
        )
        return pd.DataFrame({'output': model_output})
Copy
语言: 中文