CREATE SNOWFLAKE.ML.ANOMALY_ DETECTION¶
使用您提供的训练数据创建新的异常检测模型或替换现有模型。
语法
参数
model_name指定异常检测器对象的标识符 (model_name);对于在其中创建对象的架构,此名称必须唯一。
In addition, the identifier must start with an alphabetic character and cannot contain spaces or special characters unless the entire identifier string is enclosed in double quotes (for example,
"My object"). Identifiers enclosed in double quotes are also case-sensitive. For more details, see Identifier requirements.
构造函数实参
必填:
INPUT_DATA => reference_to_training_dataSpecifies a reference to the table, view, or query that returns the training data for the model.
To create this reference, you can use the TABLE keyword with the table name, view name, or query, or you can call the SYSTEM$REFERENCE or SYSTEM$QUERY_REFERENCE function.
TIMESTAMP_COLNAME => 'timestamp_column_name'指定时间序列数据中包含时间戳 (TIMESTAMP_NTZ) 的列的名称。
TARGET_COLNAME => 'target_column_name'指定包含待分析数据(NUMERIC 或 FLOAT)的列的名称。
LABEL_COLNAME => 'label_column_name'Specifies the name of the column containing the labels for the data. Labels are Boolean (true/false) values indicating whether a given row is a known anomaly. If you do not have labeled data, pass an empty string (
'') for this argument.
可选:
SERIES_COLNAME => 'series_column_name'包含序列标识符的列的名称(对于多序列数据)。此列应为 VARIANT,因为它可以是任意值类型,也可以是数组中多列中任意类型的值的组合。
CONFIG_OBJECT => config_objectAn OBJECT containing key-value pairs used to configure the model training job.
Key Type Default Description aggregation_categoricalSTRING 'MODE'分类特征的聚合方法。支持的值包括:
'MODE': The most frequent value.'FIRST': The earliest value.'LAST': The latest value.
aggregation_numericSTRING 'MEAN'数字特征的聚合方法。支持的值包括:
'MEAN': The average of the values.'MEDIAN': The middle value.MODE: The most frequent value.'MIN': The smallest value.'MAX': The largest value.'SUM': The total of the values.'FIRST': The earliest value.'LAST': The latest value.
aggregation_targetSTRING Same as aggregation_numeric, or'MEAN'if not specified目标值的聚合方法。支持的值包括:
'MEAN': The average of the values.'MEDIAN': The middle value.MODE: The most frequent value.'MIN': The smallest value.'MAX': The largest value.'SUM': The total of the values.'FIRST': The earliest value.'LAST': The latest value.
evaluateBOOLEAN TRUE Whether evaluation metrics should be generated. If TRUE, additional models are trained for cross-validation using the parameters in the
evaluation_config.evaluation_configOBJECT See 评估配置. An optional config object to specify how out-of-sample evaluation metrics should be generated. See next section. frequencySTRING n/a The frequency of the time series. If not specified, the model infers the frequency. The value must be a string representing a time period, such as
'1 day'. Supported units include seconds, minutes, hours, days, weeks, months, quarters, and years. You may use singular (“hour”) or plural (“hours”) for the interval name, but may not abbreviate.lower_boundFLOAT or NULL NULL The lower bound for the target value. If specified, the model will not predict values below this threshold. upper_boundFLOAT or NULL NULL The upper bound for the target value. If specified, the model will not predict values above this threshold. on_errorSTRING 'ABORT'指定训练的错误处理方法的字符串(常量)。这在训练多个序列时最有用。支持的值包括:
评估配置
The evaluation_config object contains key-value pairs that configure cross-validation. These parameters are from the scikit-learn
TimeSeriesSplit (https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.TimeSeriesSplit.html)
cross-validator.
使用说明
- If the column names specified by the TIMESTAMP_COLNAME, TARGET_COLNAME, or LABEL_COLNAME arguments do not exist in the table, view, or query specified by the INPUT_DATA argument, an error occurs.
-
Replication is supported only for instances of the CUSTOM_CLASSIFIER class.
示例
For a representative example, see the anomaly detection example.