How Snowflake validates semantic views

Snowflake verifies that it complies with a set of validation rules when you define a semantic view. These rules ensure your semantic model is well-formed and will function correctly.

These rules are explained in the next sections:

General validation rules

The following rules apply to semantic views in general:

  • Required elements: A semantic view must define at least one dimension or metric.

    For example, your TPC-H semantic view needs at least one dimension (like customer_name) or a metric (like order_average_value).

  • Primary and foreign keys: In the primary and foreign key definitions, you must use physical base table columns or expressions defined in logical tables that directly refer to a base table column (for example, t1.fact AS t1.col).

    For example, in the TPC-H schema, you can use c_custkey as the primary key for the customer table and o_custkey as the foreign key in the orders table. c_custkey and o_custkey are columns in the physical base tables.

  • Table alias references: When referring to tables in relationships or expressions, you must use their defined aliases.

    For example, if you define the table alias orders AS snowflake_sample_data.tpch.orders_table, you must use the table alias orders (not orders_table) in the definitions of your metrics.

    If you don’t specify an alias for a logical table, you must use the logical table name in any expressions.

Validation rules for relationships

The following rules apply to relationships in semantic views:

  • Many-to-one relationships: Relationships work like foreign key constraints.

    For example, when you define a relationship as orders (o_custkey) REFERENCES customers (c_custkey), you’re creating a many-to-one relationship from orders to customers (many orders can belong to one customer), where c_custkey must be a primary key.

  • Transitive relationships: Snowflake automatically derives indirect relationships.

    For example, if you define a relationship between line_items and orders and another relationship between orders and customer, Snowflake understands there’s also a relationship between line_items and customer.

  • No circular relationships: You cannot define circular relationships, even through transitive paths.

    For example, you cannot define a relationship from orders to customer and another relationship from customer to orders.

  • No self-references: Currently, a table cannot reference itself (like an employee manager hierarchy where employees can reference other employees as their manager).

  • Multi-path relationship restrictions: You can define multiple relationships between two tables, but there are limitations.

    For example, if line_items is related to orders through both order_key and another column, those tables cannot refer to each other’s semantic expressions.

  • One-to-one relationship restrictions: One-to-one relationships have limitations.

    For example, suppose that you define a relationship orders(id) REFERENCES order_summary, where id is a primary key orders. order_summary cannot refer to semantic expressions in orders semantic expressions, though orders can refer to semantic expressions in order_summary.

Expression validation rules

The following rules apply to semantic expressions in facts, dimensions, and metrics:

General rules about expressions

The following rules apply to semantic expressions in general:

  • Expression types: Dimensions and facts are row-level expressions (unaggregated), while metrics are aggregate-level expressions.

    For example, customer_name is a dimension (row-level), while order_average_value is a metric (aggregate-level).

  • Table association: Every semantic expression must be associated with a table.

    For example, customer_name must be defined as customer.customer_name and order_average_value as orders.order_average_value.

  • Same-table references: Expressions can refer to base table columns or other expressions on the same logical table using either qualified or unqualified names.

    For example, in the orders table, you could define orders.shipping_month as

    • MONTH(o_shipdate) (using the unqualified column name)

    • MONTH(orders.o_shipdate) (using the qualified name)

  • Cross-table limitations: Expressions cannot refer to base table columns from other tables or expressions from unrelated logical tables.

    For example, customer.customer_name cannot directly reference an expression from the orders table unless there’s a relationship between them. To work with data across tables, you must:

    1. Define relationships between logical tables (for example, between customer and orders through c_custkey).

    2. Define a fact on the source table (for example, orders.total_value).

    3. Refer to these expressions from a connected logical table (for example, customer.order_value can refer to orders.total_value).

  • Name resolution: If both a semantic expression and a column have the same name, references to that name resolve to the semantic expression.

    For example, if you define a region dimension and there’s also a region column, region in expressions resolves to the dimension, not the column. An exception is when an expression refers to the same name in its definition (for example, customer.c_name AS customers.c_name). The reference resolves to the column, rather than to the defining expression itself.

  • Expression reference cycles: You cannot create circular references between expressions.

    For example, you cannot define customer.total_value based on orders.customer_value and then define orders.customer_value based on customer.total_value.

  • Table reference cycles: You cannot create circular references between logical tables in expression definitions.

    For example, you cannot define customer.total_value based on orders.customer_value and then define orders.customer_count based on customer.c_custkey.

  • Function usage: You can use scalar functions like YEAR* / DAY* / WEEK* / MONTH / QUARTER in dimensions, but table functions are not allowed. Window functions are allowed in dimensions and facts but not in metrics.

Rules for row-level expressions (dimensions and facts)

The following rules apply to row-level expressions in dimensions and facts:

  • Same-table references: A row-level expression can directly refer to columns from its own table.

    For example, customers.customer_name can be defined as customers.c_name directly.

  • Equal or lower granularity: A row-level expression can directly refer to other row-level expressions at the same or lower granularity.

    For example, orders.order_details can refer to customer.customer_name because customer is at a lower granularity than orders (one customer can have many orders).

  • Higher granularity references: When referencing row-level expressions at higher granularity, a row-level expression must use aggregation.

    For example, customer.total_orders must use COUNT(orders.o_orderkey) because orders is at a higher granularity than customer (one customer can have many orders).

  • Aggregate references: A dimension like orders.order_type cannot refer to a metric like orders.order_average_value, but customer.customer_segment can refer to orders.order_average_value because customer is at a lower granularity than orders.

Rules for aggregate-level expressions (metrics)

The following rules apply to aggregate-level expressions in metrics:

  • Basic aggregation: A metric must use an aggregate function.

    For example, orders.order_average_value must use AVG(orders.o_totalprice).

  • Equal or lower granularity: When referring to row-level expressions at equal or lower granularity, a metric must use a single aggregate.

    For example, orders.total_value can use SUM(line_items.discounted_price) because line_items is at lower granularity than orders.

  • Higher granularity references: When referring to row-level expressions at higher granularity, a metric must use nested aggregation.

    For example, customer.average_order_value must use AVG(SUM(orders.o_totalprice)) because orders is at higher granularity than customer.

  • Other aggregate references: A metric can directly refer to other metrics at equal or lower granularity without aggregation.

    For example, orders.profit_margin can be defined as orders.total_revenue / orders.total_cost without additional aggregation. However, when referring to metrics at higher granularity, an aggregation is required.

Language: English