使用特征视图

Note

The Snowflake Feature Store API is available in the Snowpark ML Python package (snowflake-ml-python) v1.5.0 and later.

特征视图 将原始数据转换封装到一个或多个相关 特征 中。特征视图中的所有特征均按相同的计划刷新。特征平台由存储特征的 特征表 提供支持。

Snowflake 特征平台支持两种不同类型的特征视图:

  • Snowflake-managed feature view: The feature table is automatically refreshed from raw data by Snowflake on a schedule you specify. A feature view is considered Snowflake-managed if you provide a schedule for refreshing it.
  • External feature view: If you don’t provide a schedule for refreshing the feature view, it’s considered external. You are responsible for maintaining the feature table, updating features from raw data as needed, for example using a tool such as dbt (https://www.getdbt.com/).

The class snowflake.ml.feature_store.FeatureView is the Python API for interacting with feature views. The FeatureView constructor accepts a Snowpark DataFrame that contains the feature generation logic. The provided DataFrame must also contain the join_keys columns specified in the entities associated with the feature view. A timestamp column name is required if your feature view includes time-series features.

See the Feature Store API Reference for full details of the Python API.

创建 Snowflake 管理的特征视图

A Snowflake-managed feature view uses a dynamic table as the feature table. Features are extracted from the source data on a schedule you specify, handling new data efficiently and incrementally. The illustration below shows the flow of data from its source, through feature transformations, into a feature table.

A managed feature view in the Snowflake Feature Store

To create a Snowflake-managed feature view, use code like the following Python block, where entity is the entity that the features are associated with, and my_df is the Snowpark DataFrame that contains your feature transformation logic based on your source data.

Setting the refresh_freq parameter designates the feature view as Snowflake-managed. The value can be a time delta (minimum value 1 minute), or it can be a cron expression with time zone (e.g. * * * * * America/Los_Angeles).

from snowflake.ml.feature_store import FeatureView

managed_fv = FeatureView(
    name="MY_MANAGED_FV",
    entities=[entity],
    feature_df=my_df,                   # Snowpark DataFrame containing feature transformations
    timestamp_col="ts",                 # optional timestamp column name in the dataframe
    refresh_freq="5 minutes",           # how often feature data refreshes
    desc="my managed feature view"      # optional description
)

You can write feature transformations using Snowpark Python or in SQL. The Snowpark Python API provides utility functions for defining common feature types such as windowed aggregations. See Common feature and query patterns for examples of using these functions.

To qualify for incremental refresh, each source table must have change tracking enabled. If change tracking is not already enabled on a source table, Snowflake attempts to enable it automatically when creating the feature view’s dynamic table. This requires OWNERSHIP of the table. If you do not own the table, ask the owner to enable change tracking, or create the feature view with refresh_mode='FULL', which fully reads the source table for each refresh.

创建外部特征视图

Features generated outside of the Snowflake Feature Store can be registered by setting the refresh_freq parameter to None when creating them. In this situation, you must create and maintain the feature table yourself. The feature DataFrame is based on the feature table, not on the raw data source, and usually contains a simple projection from this table, with no transformations.

Note

You can perform feature transformations in the feature DataFrame; these calculations are carried out as needed when you retrieve data from the feature view. However, external feature views are primarily intended for use with tools such as dbt (https://www.getdbt.com/) that you already use to perform feature transformations. Generally, you should use Snowflake-managed feature views if you want Snowflake to perform feature transformation.

下图展示了数据从源开始,通过外部工具(此处为 dbt)执行特征转换,最终进入特征表的流程。

A managed feature view in the Snowflake Feature Store

External feature views are implemented as views on your feature table, so they incur no additional storage cost.

下方代码展示了如何创建外部特征视图。

external_fv = FeatureView(
    name="MY_EXTERNAL_FV",
    entities=[entity],
    feature_df=my_df,                   # Snowpark DataFrame referencing the feature table
    timestamp_col="ts",                 # optional timestamp column name in the dataframe
    refresh_freq=None,                  # None means the feature view is external
    desc="my external feature view"     # optional description
)

让特征视图更易被发现

Adding per-feature descriptions to the FeatureView makes it easier to find features using Snowsight Universal Search. The following example uses a feature view’s attach_feature_desc method to provide a short description of each included feature in a Python dictionary:

external_fv = external_fv.attach_feature_desc(
    {
        "SENDERID": "Sender account ID for the transaction",
        "RECEIVERID": "Receiver account ID for the transaction",
        "IBAN": "International Bank Identifier for the receiver bank",
        "AMOUNT": "Amount of the transaction"
    }
)

这两种特征视图都可以用特征描述来充实。

注册特征视图

Once a feature view has been completely defined, you can register it in the feature store using the feature store’s register_feature_view method, with a customized name and version. Incremental maintenance (for supported query types) and automatic refresh occur based on the specified refresh frequency.

When the provided query cannot be maintained via incremental maintenance using a dynamic table, the table will be fully refreshed from the query at the specified frequency. This may lead to greater lag in feature refresh and higher maintenance costs. You can alter the query logic, breaking the query into multiple smaller queries that support incremental maintenance, or provision a larger virtual warehouse for dynamic table maintenance. See General limitations for the latest information on dynamic table limitations.

registered_fv: FeatureView = fs.register_feature_view(
    feature_view=managed_fv,    # feature view created above, could also use external_fv
    version="1",
    block=True,         # whether function call blocks until initial data is available
    overwrite=False,    # whether to replace existing feature view with same name/version
)

特征视图管道定义在注册后不可变,只要特征视图存在,就可以提供一致的特征计算。

检索特征视图

Once a feature view has been registered with the feature store, you can retrieve it from there when you need it by using the feature store’s get_feature_view method:

retrieved_fv: FeatureView = fs.get_feature_view(
    name="MY_MANAGED_FV",
    version="1"
)

发现特征视图

You can list all registered feature views in the feature store, optionally filtering by entity name or feature view name, using the list_feature_views method. Information about the matching features is returned as a Snowpark DataFrame. The following code shows an example of getting a list of feature views; fs is a reference to the feature store.

fs.list_feature_views(
    entity_name="<entity_name>",                # optional
    feature_view_name="<feature_view_name>",    # optional
).show()

此外,还可以使用 Snowsight 特征平台 UI 或 Universal Search 来发现特征。

更新特征视图

You can update some properties of a feature view you have registered in the feature store using the feature store’s update_feature_view method. The updatable properties are:

  • 特征视图的刷新频率
  • 执行特征转换的仓库
  • 特征视图的描述

不能修改特征定义和列。要更改特征平台中的特征,请创建新版本的特征视图。

When you call update_feature_view, specify the feature view version to be updated by providing its name and version. The additional parameters specify the properties to be updated; you can specify just the ones you want to change. The following code shows an example of changing feature view properties; fs is a reference to the feature store.

fs.update_feature_view(
    name="<name>",
    version="<version>",
    refresh_freq="<new_fresh_freq>",    # optional
    warehouse="<new_warehouse>",        # optional
    desc="<new_description>",           # optional
)

删除特征视图

You can delete a feature view from the feature store with the feature store’s delete_feature_view method. The following code shows an example of deleting a feature view; fs is a reference to the feature store.

fs.delete_feature_view(
    feature_view="<name>",
    version="<version>",
)

Warning

删除特征视图版本会中断使用该版本的所有管线。在删除之前,请确保特征视图版本未在使用中。

成本注意事项

Snowflake-managed feature views use Snowflake dynamic tables. See Monitor dynamic tables for information on monitoring dynamic tables and Understanding costs for dynamic tables for information on the costs of dynamic tables. External feature views use views, which do not incur additional storage costs.

已知限制

  • The maximum number of Snowflake-managed feature views and the feature transformation queries in feature views are subject to the limitations of dynamic tables.
  • Not all feature transformation queries are supported by dynamic incremental refresh. See the limitations.
  • Feature view names are SQL identifiers and subject to Snowflake identifier requirements.
  • Feature view versions are strings and have a maximum length of 128 characters. Some characters are not permitted and will produce an error message.