Specifying model signatures

To ensure a consistent experience no matter where a model is run, the Snowflake Model Registry needs to know the input and output schema of the model’s inference methods: that is, the name and type of all columns in the input or output DataFrame. This allows these columns to be mapped between Python and SQL data types when necessary. This schema is referred to as a signature by analogy to the arguments of a function and their types.

For certain ML frameworks, the model registry can infer these schemas, either from data structures in the model itself or from sample input data. However, models often accept or return objects that lack this information, such as NumPy arrays. In these cases, Snowpark ML infers the input feature names as input_feature_1, input_feature_2, and so on. Similarly, output features are named output_feature_1, output_feature_2, and so on.

To use more meaningful names in your custom models, you can use one of the following methods:

  • Update sample_input_data with column names, usually by converting the dataset to a pandas or Snowpark DataFrame.

  • Explicitly pass signatures to log_model. When a model does not produce names in its output, explicit signatures might be the only option.

Inferring a signature

Like the model registry itself, you can generate signatures automatically. Use snowflake.ml.model.model_signature.infer_signature to infer a signature based on provided sample input, output, and column names, and then apply that signature to the appropriate methods when logging the model, as in the following example:

import pandas as pd
from sklearn import svm, datasets

from snowflake.ml.model import model_signature

digits = datasets.load_digits()
target_digit = 6

def one_vs_all(dataset, digit):
    return [x == digit for x in dataset]

train_features = digits.data[:10]
train_labels = one_vs_all(digits.target[:10], target_digit)
clf = svm.SVC(gamma=0.001, C=10.0, probability=True)
clf.fit(train_features, train_labels)

sig = model_signature.infer_signature(
    train_features,
    labels_df,
    input_feature_names=['column1', 'column2', ...],
    output_feature_names=['is_target_digit'])

# Supply a signature for every function the model exposes, in this case only `predict`.
mv = reg.log_model(
    clf,
    model_name='my_model',
    version_name='v1',
    signatures={"predict": sig}
)
Copy

This example applies the signature to only one method, but you can infer a signature for each method your model exposes. You can use the same signature object (sig in the example) for all methods that have the same signature.

Constructing a signature

You can also manually construct a signature by using snowflake.ml.model.model_signature.ModelSignature. Both scalar and tensor types (including ragged tensors) are supported.

Example:

from snowflake.ml.model.model_signature import ModelSignature, FeatureSpec, DataType

sig = ModelSignature(
    inputs=[
        FeatureSpec(dtype=DataType.DOUBLE, name=f_0),
        FeatureSpec(dtype=DataType.INT64, name=sparse_0_fixed_len, shape=(5, 5)),
        FeatureSpec(dtype=DataType.INT64, name=sparse_1_variable_len, shape=(-1,)),
    ],
    outputs=[
        FeatureSpec(dtype=DataType.FLOAT, name=output),
    ]
)
Copy

Then pass the signature object, sig, to log_model with the signatures argument as in the example above for the methods to which it applies.

Data type mappings

This section describes the equivalence of types in the Snowflake Model Registry for supported type systems.

Column data types

The following table shows the equivalence of model signature (SQL) type, pandas DataFrames (NumPy) type, and Snowpark Python type.

Model signature (SQL) type

pandas DataFrame (NumPy) type

Snowpark Python type

INT8

np.int8

ByteType

INT16

np.int16

ShortType

INT32

np.int32

IntegerType

INT64

np.int64

LongType

FLOAT

np.float32

FloatType

DOUBLE

np.float64

DoubleType

UINT8

np.uint8

ByteType

UINT16

np.uint16

ShortType

UINT32

np.uint32

IntegerType

UINT64

np.uint64

LongType

BOOL

np.bool_

BooleanType

STRING

np.str_

StringType

BYTES

np.bytes_

BinaryType

TIMESTAMP_NTZ

np.datetime64

TimestampType

The representation of tensor features where the shape is specified uses np.object_.

Missing values

NULL values are not permitted in the sample input data or the inference input data.

Conversion from NumPy

If the NumPy data type can be safely cast to a NumPy type shown in Column data types, it is inferred as the corresponding data type.

Conversion from PyTorch

PyTorch type

Model signature (SQL) type

torch.uint8

UINT8

torch.int8

INT8

torch.int16

INT16

torch.int32

INT32

torch.int64

INT64

torch.float32

FLOAT

torch.float64

DOUBLE

torch.bool

BOOL

Conversion from Snowpark

In addition to the mappings shown in Column data types, the following conversions apply:

  • DecimalType with scale of 0 maps to INT64.

  • DecimalType with scale greater than 0 maps to DOUBLE.

Language: English