语义视图的 YAML 规范¶

语义视图是架构级对象，用于定义数据的业务概念，使用户可以更轻松地使用业务术语查询和分析数据。您可以使用 YAML 规范在 Cortex Analyst 中创建语义视图，也可以使用 SYSTEM$CREATE_SEMANTIC_VIEW_FROM_YAML 存储过程基于 YAML 规范创建语义视图。

概述¶

语义视图是用于在 Snowflake 中定义业务语义的推荐方法。语义视图是架构级对象，可与 Snowflake 的权限系统、共享机制及元数据目录集成。

备注

为保持向后兼容性，旧版语义模型 YAML 文件（存储于暂存区中）仍可在 Cortex Analyst 中使用，但我们建议在新实施中采用语义视图。

与旧版语义模型相比，语义视图的优势包括：

原生 Snowflake 集成：支持完整 RBAC、共享及目录的架构级对象
高级功能：支持派生指标和访问修饰符（公共/专用）
更好的治理：与 Snowflake 的权限和共享系统集成
简化管理：无需管理暂存区中的 YAML 文件

YAML 格式¶

语义视图可以采用 YAML (https://yaml.org/) 规范来定义其行为，允许可读的纯文本定义。

语义视图 YAML 规范的通用语法为：

# Name and description of the semantic view.
name: <name>
description: <string>

# Logical table-level concepts
# A semantic view can contain one or more logical tables.
tables:
  # A logical table on top of a base table.
  - name: <name>
    description: <string>
    # The fully qualified name of the base table.
    base_table:
      database: <database>
      schema: <schema>
      table: <base table name>

    # Dimension columns in the logical table.
    dimensions:
      - name: <name>
        synonyms: <array of strings>
        description: <string>
        expr: <SQL expression>
        data_type: <data type>
        unique: <boolean>
        cortex_search_service:
          service: <string>
          literal_column: <string>
          database: <string>
          schema: <string>
        is_enum: <boolean>
    - ...
    # Time dimension columns in the logical table.
    time_dimensions:
      - name: <name>
        synonyms: <array of strings>
        description: <string>
        expr: <SQL expression>
        data_type: <data type>
        unique: <boolean>

    # Fact columns in the logical table.
    facts:
      - name: <name>
        synonyms: <array of strings>
        description: <string>
        access_modifier: <public_access | private_access>  # Default is public_access.
        expr: <SQL expression>
        data_type: <data type>

    # Regular metrics scoped to the logical table.
    metrics:
      - name: <name>
        synonyms: <array of strings>
        description: <string>
        access_modifier: <public_access | private_access>  # Default is public_access.
        expr: <SQL expression>
        non_additive_dimensions:
        - table: <table name>
          dimension: <dimension name>
          sort_direction: <ascending | descending>
          null_order: <first | last>
        using_relationships:
        - <relationship_name>

    # Commonly used filters over the logical table.
    filters:
      - name: <name>
        synonyms: <array of strings>
        description: <string>
        expr: <SQL expression>

# View-level concepts
# Relationships between logical tables
relationships:
  - name: <string>
    left_table: <table>
    right_table: <table>
    relationship_columns:
      - left_column: <column>
        right_column: <column>
      - left_column: <column>
        right_column: <column>

# Derived metrics scoped to the semantic view.
# Derived metrics combine metrics from multiple tables.
metrics:
  - name: <name>
    synonyms: <array of strings>
    description: <string>
    access_modifier: <public_access | private_access>  # Default is public_access
    expr: <SQL expression>

# Additional context concepts
# Verified queries with example questions and queries that answer them
verified_queries:
  - name: <string>       # A descriptive name of the query.
    question: <string>   # The natural language question that this query answers.
    verified_at: <int>   # Optional: Time (in seconds since the UNIX epoch, January 1, 1970) when the query was verified.
    verified_by: <string> # Optional: Name of the person who verified the query.
    use_as_onboarding_question: <boolean>  # Optional: Marks this question as an onboarding question for the end user.
    sql: <string>        # The SQL query for answering the question

重要

语义视图不需要 旧版语义模型中使用的 join_type 或 relationship_type 字段。关系类型是根据数据自动推断出来的。

关键概念¶

表¶

逻辑表表示业务实体（例如客户、订单或产品），并映射到物理数据库表。每个逻辑表可以定义：

基表：物理表的完全限定名称
主键：唯一标识行的列
同义词：表的备用名称
描述：向 Clean Room 提供商发送请求，要求他们批准自定义模板，以便将其添加到 Clean Room。关于表所表示内容的业务友好型解释

维度¶

维度表示为分析提供上下文的分类属性。它们回答“谁”、“什么”、“何地”和“何时”的问题。维度可以是：

常规维度：文本、数字或其他分类值
时间维度：具有基于时间的特殊处理的日期或时间戳列

维度的属性¶

expr：用于计算维度值的 SQL 表达式
synonyms：选择使用时默认使用的角色和仓库。用户可能使用的替代术语
unique：选择使用时默认使用的角色和仓库。值在各行中是否唯一
is_enum：选择使用时默认使用的角色和仓库。维度是否具有一组固定值
cortex_search_service：选择使用时默认使用的角色和仓库。用于语义搜索的可选 Cortex Search 服务

物理维度的可选属性¶

这些字段可选，但建议使用这些字段，以便通过语义视图搜索生成更高质量的结果。

synonyms

用于指代此维度的其他术语/短语的列表。在这个语义模型的所有同义词中，必须是唯一的。

description

此维度的简要描述。包含提供有用上下文的信息，例如此维度表示的数据。

unique

一个布尔值，表示此维度具有唯一值。

sample_values

此列的示例值（如果有）。可以添加任何可能在用户问题中被引用的值。

is_enum

布尔值。如果为 True，则 sample_values 字段中的值将被视为可能值的完整列表，模型仅在筛选该列时从这些值中进行选择。

cortex_search_service

指定要用于此维度的 Cortex Search 服务。它有以下字段：

service：Cortex Search 服务的名称。
literal_column：（可选）Cortex Search 服务中包含字面量值的列。
database：（可选）Cortex Search 服务所在的数据库。默认为 base_table 的数据库。
schema：（可选）Cortex Search 服务所在的架构。默认为 base_table 的架构。

cortex_search_service 会替换 cortex_search_service_name 字段，后者只能指定名称。cortex_search_service_name 已弃用。

时间维度的可选属性¶

这些字段可选，但建议使用这些字段，以便通过语义视图搜索生成更高质量的结果。

synonyms: 用于指代此时间维度的其他术语/短语的列表。在这个语义模型的所有同义词中，必须是唯一的。
description: 此维度的简要描述。包含提供有用上下文的信息，例如此维度用作参考点的时区。
unique：: 一个布尔值，表示此列具有唯一值。
sample_values：: 此列的示例值（如果有）。可以添加任何可能在用户问题中被引用的值。

事实¶

事实是表示特定业务事件或事务的行级定量属性。事实用于获取最精细的“金额”或“数量”，例如个人销售额、购买数量或成本。

事实通常在语义视图中充当“助手”概念，以帮助构造维度和指标。

事实的属性包括：

expr：用于计算事实值的 SQL 表达式
access_modifier：选择使用时默认使用的角色和仓库。设置为 private_access 时，可从查询中隐藏（适用于中间计算）
data_type：选择使用时默认使用的角色和仓库。事实的数据类型

指标¶

指标是用于衡量业务绩效的可量化指标，通过使用 SUM、AVG 和 COUNT 等函数汇总事实或其他列计算得出。

指标的两种类型：

表级指标：范围限定到特定逻辑表，聚合该表中的数据
派生指标：组合多个表中指标的视图级指标

指标的属性¶

expr：具有聚合函数的 SQL 表达式
access_modifier：选择使用时默认使用的角色和仓库。设置为 private_access 时，可从查询中隐藏（适用于中间计算）
synonyms：选择使用时默认使用的角色和仓库。指标的替代术语

指标的可选属性¶

如果您想指定该指标的不可累加维度，请使用以下字段：
non_additive_dimensions
指定不应汇总指标的维度。
table
包含维度的逻辑表的名称。

dimension
维度的名称。

sort_direction
不可累加维度的排序顺序。您可以指定以下值中的一个：

ascending：选择使用时默认使用的角色和仓库。按升序对维度值进行排序。

descending：选择使用时默认使用的角色和仓库。按降序对维度值进行排序。

默认：ascending
null_order
指定 NULLs 是否在非 NULL 值之前或之后排序。您可以指定以下值中的一个：

first：NULLs 在非 NULL 值之前排序。

last：NULLs 在非 NULL 值之后排序。

默认值：None。取决于 sort_direction 字段（ascending 或 descending）中的值；请参阅 ORDER BY 文档中的使用说明。
备注

由于行按不可累加维度排序，因此指定维度的顺序非常重要。这类似于您在 ORDER BY 子句中指定列的顺序。
以下示例指定 m_account_balance 指标不能通过 year_dim 和 month_dim 维度进行聚合：
```
metrics:
  - name: m_account_balance
    ...
    non_additive_dimensions:
    - table: bank_accounts
      dimension: year_dim
      sort_direction: ascending
      null_order: last
    - table: bank_accounts
      dimension: month_dim
      sort_direction: descending
      null_order: first
```
如果语义视图中两个特定逻辑表之间存在多个关系路径，请使用以下字段指定要使用的关系路径：

using_relationships

预览版功能 – 开放

适用于所有账户。

指定计算指标时用于联接逻辑表的关系的名称。

派生指标¶

派生指标是未绑定到特定表的视图级指标。它们可以组合来自多个表的指标或在整个视图中执行计算。

派生指标示例：

metrics:
  - name: total_profit_margin
    description: "Overall profit margin across all products"
    expr: (orders.total_revenue - orders.total_cost) / orders.total_revenue
    access_modifier: public_access

关系¶

关系定义了逻辑表之间的联接方式。每个关系指定：

left_table：选择使用时默认使用的角色和仓库。包含外键的表
right_table：选择使用时默认使用的角色和仓库。被引用的表
relationship_columns：选择使用时默认使用的角色和仓库。用于联接的列对，如 left_column 和 right_column

关系类型（一对一、多对一）是根据数据和主键定义自动推断出来的。

备注

与旧版语义模型不同，语义视图不需要显式 join_type 或 relationship_type 规范。这些规范是自动确定的。

筛选器¶

筛选器定义可按名称引用的常用筛选条件。这有助于确保跨查询的筛选逻辑一致。

示例：

filters:
  - name: active_customers
    description: "Customers who have made a purchase in the last 12 months"
    expr: "customer_last_purchase_date >= DATEADD(month, -12, CURRENT_DATE())"

经过验证的查询¶

已验证的查询是示例问题及其相应的 SQL 查询。它们帮助 Cortex Analyst 了解如何回答类似问题，并为用户提供文档。

属性：

question：选择使用时默认使用的角色和仓库。自然语言问题
sql：回答问题的 SQL 查询
verified_by：选择使用时默认使用的角色和仓库。（可选）验证查询是否正确的人员
verified_at：选择使用时默认使用的角色和仓库。（可选）验证时的时间戳
use_as_onboarding_question：选择使用时默认使用的角色和仓库。（可选）标记是否作为向用户提供的建议显示

访问修饰符¶

语义视图支持针对事实和指标的访问修饰符，允许您控制可见性：

``public_access``（默认）：用户可见且可查询
private_access：选择使用时默认使用的角色和仓库。从查询中隐藏，仅用于中间计算

示例：

facts:
  - name: internal_cost
    expr: unit_cost * quantity
    data_type: NUMBER
    access_modifier: private_access  # Not visible in queries

metrics:
  - name: total_revenue
    expr: SUM(sale_amount)
    access_modifier: public_access  # Visible in queries

Cortex Analyst 自定义指令¶

您可以使用 SQL 命令在语义视图定义中提供自定义指令。这些指令指导如何生成查询以及如何对问题进行分类。这些指令不属于 YAML 规范，但使用 CREATE SEMANTIC VIEW 命令进行设置。

有关更多信息，请参阅 Providing custom instructions for Cortex Analyst。

语义视图 YAML 示例¶

以下是语义视图 YAML 规范的完整示例：

name: revenue_analysis
description: "Semantic view for analyzing revenue across products and customers"

tables:
  - name: customers
    description: "Customer information"
    base_table:
      database: sales_db
      schema: public
      table: customers
    dimensions:
      - name: customer_name
        synonyms: ["client name", "customer"]
        description: "Full name of the customer"
        expr: c_name
        data_type: VARCHAR
      - name: customer_segment
        synonyms: ["segment", "market segment"]
        description: "Customer market segment"
        expr: c_mktsegment
        data_type: VARCHAR
        is_enum: true

  - name: orders
    description: "Order information"
    base_table:
      database: sales_db
      schema: public
      table: orders
    dimensions:
      - name: order_date
        description: "Date when order was placed"
        expr: o_orderdate
        data_type: DATE
    time_dimensions:
      - name: order_year
        description: "Year when order was placed"
        expr: YEAR(o_orderdate)
        data_type: NUMBER
    facts:
      - name: order_total
        description: "Total order amount"
        expr: o_totalprice
        data_type: NUMBER
    metrics:
      - name: total_orders
        description: "Total number of orders"
        expr: COUNT(*)
      - name: average_order_value
        description: "Average order value"
        expr: AVG(o_totalprice)

relationships:
  - name: orders_to_customers
    left_table: orders
    right_table: customers
    relationship_columns:
      - left_column: o_custkey
        right_column: c_custkey

metrics:
  - name: revenue_per_customer
    description: "Average revenue per customer"
    expr: orders.total_revenue / customers.customer_count
    access_modifier: public_access

verified_queries:
  - name: top_customers_by_revenue
    question: "Who are the top 10 customers by revenue?"
    sql: |
      SELECT
        customer_name,
        SUM(order_total) as total_revenue
      FROM revenue_analysis
      GROUP BY customer_name
      ORDER BY total_revenue DESC
      LIMIT 10
    use_as_onboarding_question: true

根据 YAML 创建语义视图¶

要根据 YAML 规范创建语义视图，请使用 SYSTEM$CREATE_SEMANTIC_VIEW_FROM_YAML 存储过程。

有关更多信息，请参阅根据 YAML 规范创建语义视图。

从语义视图中获取 YAML¶

要将语义视图导出为 YAML 格式，请使用 SYSTEM$READ_YAML_FROM_SEMANTIC_VIEW 函数。

有关更多信息，请参阅获取语义视图的 YAML 规范。

与旧版语义模型的差异¶

如果您要从旧版语义模型 YAML 文件迁移到语义视图，请注意以下主要差异：


功能	旧版语义模型	语义视图
存储	暂存区上的 YAML 文件	数据库中的架构级对象
权限	基于暂存区的访问控制	完整 Snowflake RBAC 集成
共享	手动文件共享	原生 Snowflake 共享
联接类型	需要 `join_type` 和 `relationship_type`	自动推断
派生指标	不支持	完全受支持
访问修饰符	不支持	`public_access` / `private_access`
自定义指令	在 YAML 文件中	通过 SQL 命令设置

从旧版语义模型转换为语义视图时：

从关系中移除 join_type 和 relationship_type
考虑使用派生指标进行视图级计算
将 access_modifier 添加到要设为专用的事实/指标
将自定义指令移至 SQL CREATE SEMANTIC VIEW 命令