CREATE SNOWFLAKE.ML.ANOMALY_DETECTION¶

使用您提供的训练数据创建新的异常检测模型或替换现有模型。

语法

CREATE [ OR REPLACE ] SNOWFLAKE.ML.ANOMALY_DETECTION <model_name>(
  INPUT_DATA => <reference_to_training_data>,
  [ SERIES_COLNAME => '<series_column_name>', ]
  TIMESTAMP_COLNAME => '<timestamp_column_name>',
  TARGET_COLNAME => '<target_column_name>',
  LABEL_COLNAME => '<label_column_name>',
  [ CONFIG_OBJECT => <config_object> ]
)
[ [ WITH ] TAG ( <tag_name> = '<tag_value>' [ , <tag_name> = '<tag_value>' , ... ] ) ]
[ COMMENT = '<string_literal>' ]

参数

model_name

指定异常检测器对象的标识符 (model_name)；对于在其中创建对象的架构，此名称必须唯一。

In addition, the identifier must start with an alphabetic character and cannot contain spaces or special characters unless the entire identifier string is enclosed in double quotes (for example, "My object"). Identifiers enclosed in double quotes are also case-sensitive. For more details, see Identifier requirements.

构造函数实参

必填：

INPUT_DATA => reference_to_training_data: Specifies a reference to the table, view, or query that returns the training data for the model.

To create this reference, you can use the TABLE keyword with the table name, view name, or query, or you can call the SYSTEM$REFERENCE or SYSTEM$QUERY_REFERENCE function.
TIMESTAMP_COLNAME => 'timestamp_column_name': 指定时间序列数据中包含时间戳 (TIMESTAMP_NTZ) 的列的名称。
TARGET_COLNAME => 'target_column_name': 指定包含待分析数据（NUMERIC 或 FLOAT）的列的名称。
LABEL_COLNAME => 'label_column_name': Specifies the name of the column containing the labels for the data. Labels are Boolean (true/false) values indicating whether a given row is a known anomaly. If you do not have labeled data, pass an empty string ('') for this argument.

可选：

SERIES_COLNAME => 'series_column_name'

包含序列标识符的列的名称（对于多序列数据）。此列应为 VARIANT，因为它可以是任意值类型，也可以是数组中多列中任意类型的值的组合。

CONFIG_OBJECT => config_object

An OBJECT containing key-value pairs used to configure the model training job.

Key	Type	Default	Description
`aggregation_categorical`	STRING	`'MODE'`	分类特征的聚合方法。支持的值包括： `'MODE'`: The most frequent value. `'FIRST'`: The earliest value. `'LAST'`: The latest value.
`aggregation_numeric`	STRING	`'MEAN'`	数字特征的聚合方法。支持的值包括： `'MEAN'`: The average of the values. `'MEDIAN'`: The middle value. `MODE`: The most frequent value. `'MIN'`: The smallest value. `'MAX'`: The largest value. `'SUM'`: The total of the values. `'FIRST'`: The earliest value. `'LAST'`: The latest value.
`aggregation_target`	STRING	Same as `aggregation_numeric`, or `'MEAN'` if not specified	目标值的聚合方法。支持的值包括： `'MEAN'`: The average of the values. `'MEDIAN'`: The middle value. `MODE`: The most frequent value. `'MIN'`: The smallest value. `'MAX'`: The largest value. `'SUM'`: The total of the values. `'FIRST'`: The earliest value. `'LAST'`: The latest value.
`evaluate`	BOOLEAN	TRUE	Whether evaluation metrics should be generated. If TRUE, additional models are trained for cross-validation using the parameters in the `evaluation_config`.
`evaluation_config`	OBJECT	See 评估配置.	An optional config object to specify how out-of-sample evaluation metrics should be generated. See next section.
`frequency`	STRING	n/a	The frequency of the time series. If not specified, the model infers the frequency. The value must be a string representing a time period, such as `'1 day'`. Supported units include seconds, minutes, hours, days, weeks, months, quarters, and years. You may use singular (“hour”) or plural (“hours”) for the interval name, but may not abbreviate.
`lower_bound`	FLOAT or NULL	NULL	The lower bound for the target value. If specified, the model will not predict values below this threshold.
`upper_bound`	FLOAT or NULL	NULL	The upper bound for the target value. If specified, the model will not predict values above this threshold.
`on_error`	STRING	`'ABORT'`	指定训练的错误处理方法的字符串（常量）。这在训练多个序列时最有用。支持的值包括： `'abort'`: Abort training if an error is encountered in any time series. `'skip'`: Skip any time series where training encounters an error. This allows training to succeed for other time series. To see which series failed during model training, call the model’s method.

评估配置

The evaluation_config object contains key-value pairs that configure cross-validation. These parameters are from the scikit-learn TimeSeriesSplit (https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.TimeSeriesSplit.html) cross-validator.

键	类型	默认值	描述
`n_splits`	INTEGER	5	拆分数。
`max_train_size`	INTEGER or NULL (no maximum).	NULL	单个训练集的最大大小。
`test_size`	INTEGER or NULL.	NULL	用于限制测试集的大小。
`gap`	INTEGER	0	在测试集之前，要从每个训练集结束时排除的样本数。
`prediction_interval`	FLOAT	0.95	用于计算区间指标的预测区间。

使用说明

If the column names specified by the TIMESTAMP_COLNAME, TARGET_COLNAME, or LABEL_COLNAME arguments do not exist in the table, view, or query specified by the INPUT_DATA argument, an error occurs.
Replication is supported only for instances of the CUSTOM_CLASSIFIER class.

示例

For a representative example, see the anomaly detection example.