Aggregation policies¶
An aggregation policy is a schema-level object that controls what type of query can access data from a table or view. When an aggregation policy is applied to a table, queries against that table must aggregate data into groups of a minimum size in order to return results, thereby preventing a query from returning information from an individual record. A table or view with an aggregation policy assigned to it is said to be aggregation-constrained.
Aggregation policies can be used with or without an entity key. When aggregation policies are used without an entity key, they protect the privacy of individual rows in the data set (that is, row-level privacy). If you use an aggregation policy with an entity key, it protects the privacy of an entity, even if information about that entity appears in multiple rows (that is, entity-level privacy).
For more information about combining aggregation policies with an entity key, see Implementing entity-level privacy with aggregation policies.
Overview¶
A core feature of Snowflake is the ability to share data sets with other entities. Aggregation policies allow a provider (data owner) to exercise control over what can be done with their data even after it is shared with a consumer. Specifically, the provider can require a consumer of a table to aggregate the data rather than retrieve individual records.
When creating an aggregation policy, the provider’s policy administrator specifies a minimum group size (i.e. the number of rows that must be aggregated together into a group). The larger the minimum group size, the less likely it is that a consumer could use the query results to deduce the contents of a single record. Once the aggregation policy is applied to a table or view, a query against it must conform to two requirements:
The query must aggregate the data. If the query uses an aggregation function, it must be one of the allowed aggregation functions.
Each group created by the query must include the aggregate of at least X records, where X is the minimum group size of the aggregation policy.
If the query returns a group that contains fewer records than the minimum group size of the policy, then Snowflake combines those groups
into a remainder group. Snowflake applies the aggregation function to the appropriate column to return a value for the remainder group.
However, because that value is calculated from rows that belong to more than one group, the value of the GROUP BY key column is NULL. For
example, if the query includes the clause GROUP BY state
, then the value of state
in the remainder group is NULL.
A query that does not return enough results to populate a remainder group still works, but returns a NULL value in every field of the results.
Limitations¶
You cannot protect an external table with an aggregation policy.
If the query uses an explicit grouping construct, it must be a GROUP BY clause. The query cannot use related constructs like GROUP BY ROLLUP, GROUP BY CUBE, or GROUP BY GROUPING SETS.
Most set operators are not allowed when one of the queries acts on an aggregation-constrained table. As an exception, UNION ALL is supported, but each result group must satisfy the minimum group size of the aggregation-constrained tables being queried (see Query requirements for details).
If a column of an aggregation-constrained table is protected by a projection policy, a query against that table cannot use the column as an argument of the COUNT function.
Recursive CTEs are not allowed in queries against an aggregation-constrained table or view.
Window functions are not allowed in queries against an aggregation-constrained table or view.
A query against an aggregation-constrained table cannot use a correlated subquery or lateral join when there are references to or from the portion of the query that meets the requirements of the aggregation policy. The following examples illustrate the types of queries that are prohibited.
- Example 1
Assuming
protected_table
is aggregation-constrained, the following query is not allowed because the portion of the query that aggregates data references another part of the query outside of the subquery:SELECT c1, c2 FROM open_table WHERE c1 = (SELECT x FROM protected_table WHERE y = open_table.c2);
- Example 2
Assuming
protected_table
is aggregation-constrained, the following query is not allowed because the subquery references the part of the query that aggregates data, which is outside of the subquery:SELECT SUM(SELECT COUNT(*) FROM open_table ot WHERE pt.id = ot.id) FROM protected_table pt;
Considerations¶
Consider the following when using aggregation policies to protect sensitive data:
Aggregation policies protect data for an individual record, not an entity. If a data set contains multiple records belonging to the same entity, an aggregation policy only protects the privacy of a specific record pertaining to that entity, not the entire entity.
While aggregation policies limit access to individual records, they do not guarantee a malicious actor could not use deliberate queries to obtain potentially sensitive data from an aggregation-constrained table. With enough query attempts, a malicious actor could potentially work around the aggregation requirements to ascertain a value from an individual row. Aggregation policies are best suited for use with partners and customers with whom you have an existing level of trust. In addition, providers should be vigilant about potential misuses of their data (for example, reviewing the access history for their listings).
Create an aggregation policy¶
The syntax for creating an aggregation policy is:
CREATE [ OR REPLACE ] AGGREGATION POLICY <name> AS () RETURNS AGGREGATION_CONSTRAINT -> <body> [ COMMENT = '<string_literal>' ];
Where:
name
specifies the name of the policy.AS () RETURNS AGGREGATION_CONSTRAINT
is the signature and return type of the policy. The signature does not accept any arguments and the return type is AGGREGATION_CONSTRAINT, which is an internal data type. All aggregation policies have the same signature and return type.body
is a SQL expression that determines the restrictions of an aggregation policy.
Calling functions from the body¶
The body of an aggregation policy uses two functions to define the constraints of the policy: NO_AGGREGATION_CONSTRAINT and AGGREGATION_CONSTRAINT. When the conditions of the body call one of these functions, the return value from the function determines how queries against the aggregation-constrained table or view must be formulated to return results.
- NO_AGGREGATION_CONSTRAINT
Use the body’s expression to call the NO_AGGREGATION_CONSTRAINT function when you want a query to have unrestricted access to the table or view to which the aggregation policy is assigned.
- AGGREGATION_CONSTRAINT
Use the body’s expression to call the AGGREGATION_CONSTRAINT function to require that queries aggregate data in order to return results. Use the MIN_GROUP_SIZE argument to specify how many rows or entities must be included in each aggregation group.
For the complete syntax for the NO_AGGREGATION_CONSTRAINT and AGGREGATION_CONSTRAINT functions, see CREATE AGGREGATION POLICY.
Note
The body of an aggregation policy cannot reference a user-defined function, table, or view.
Example policies¶
- Fixed minimum group size
The simplest aggregation policy calls the AGGREGATION_CONSTRAINT function directly and defines a constant minimum group size that is applied to all queries against the table. For example, the following command creates an aggregation policy with a minimum group size of 5:
CREATE AGGREGATION POLICY my_agg_policy AS () RETURNS AGGREGATION_CONSTRAINT -> AGGREGATION_CONSTRAINT(MIN_GROUP_SIZE => 5);
- Conditional policy
Policy administrators can define the SQL expression of an aggregation policy so different queries have different restrictions based on factors such as the role of the user executing the query. This strategy can allow one user to query a table without restriction while requiring others to aggregate results.
For example, the following aggregation policy gives users with the role
ADMIN
unrestricted access to a table while requiring all other queries to aggregate data into groups of at least 5 rows or entities.CREATE AGGREGATION POLICY my_agg_policy AS () RETURNS AGGREGATION_CONSTRAINT -> CASE WHEN CURRENT_ROLE() = 'ADMIN' THEN NO_AGGREGATION_CONSTRAINT() ELSE AGGREGATION_CONSTRAINT(MIN_GROUP_SIZE => 5) END;
Tip
You can use the following strategies when using context functions like CURRENT_ROLE in a conditional policy:
Context functions return strings, so comparisons using them are case-sensitive. You can use LOWER to convert strings to all lowercase if you’d like to do a case-insensitive comparison.
The POLICY_CONTEXT function helps you evaluate whether a policy body is returning the correct value when a context function returns a certain value. The POLICY_CONTEXT function simulates query results based upon a specified value of one or more context functions.
Modify an aggregation policy¶
You can use the ALTER AGGREGATION POLICY command to modify the SQL expression that determines the minimum group size of the aggregation policy. You can also rename the policy or change its comment.
Before modifying an aggregation policy, you can execute the DESCRIBE AGGREGATION POLICY command or
GET_DDL function to review the current SQL expression of the policy. The SQL expression that determines the
minimum group size appears in the BODY
column.
As an example, you can execute the following command to change the SQL expression of the aggregation policy my_policy
to require a
minimum group size of 2 rows in all circumstances:
ALTER AGGREGATION POLICY my_policy SET BODY -> AGGREGATION_CONSTRAINT(MIN_GROUP_SIZE=>2);
Assign an aggregation policy¶
Once created, an aggregation policy can be applied to one or more tables or views to make it aggregation-constrained. A table or view can only have one aggregation policy attached.
Use the SET AGGREGATION POLICY clause of a ALTER TABLE or ALTER VIEW command to assign an aggregation policy to an existing table or view:
ALTER { TABLE | VIEW } <name> SET AGGREGATION POLICY <policy_name> [ FORCE ]
Where:
name
specifies the name of the table or view.policy_name
specifies the name of the aggregation policy.FORCE
is an optional parameter that allows the command to assign the aggregation policy to a table or view that already has an aggregation policy assigned to it. The new aggregation policy atomically replaces the existing one.
For example, to assign the policy my_agg_policy
to the table t1
, execute:
ALTER TABLE t1 SET AGGREGATION POLICY my_agg_policy;
You can also use the WITH clause of the CREATE TABLE and CREATE VIEW commands to assign
an aggregation policy to a table or view at creation time. For example, to assign the policy my_agg_policy
to a new table, execute:
CREATE TABLE t1 WITH AGGREGATION POLICY my_agg_policy;
Replace an aggregation policy¶
The recommended method of replacing an aggregation policy is to use the FORCE
parameter to detach the existing aggregation policy and
assign the new one in a single command. This allows you to atomically replace the old policy, leaving no gap in protection.
For example, to assign a new aggregation policy to a table that is already aggregation-constrained:
ALTER TABLE privacy SET AGGREGATION POLICY agg_policy_2 FORCE;
You can also detach the aggregation policy from a table or view in one statement (… UNSET AGGREGATION POLICY) and then set a new policy on the table or view in a different statement (… SET AGGREGATION POLICY <name>). If you choose this method, the table is not protected by an aggregation policy in between detaching one policy and assigning another. A query could potentially access sensitive data during this time.
Detach an aggregation policy¶
Use the UNSET AGGREGATION POLICY clause of an ALTER TABLE or ALTER VIEW command to detach an aggregation policy from a table or view in order to remove the need to aggregate data. The name of the aggregation policy is not required because a table or view cannot have more than one aggregation policy attached.
ALTER {TABLE | VIEW} <name> UNSET AGGREGATION POLICY
Where:
name
specifies the name of the table or view.
For example, to detach an aggregation policy from view v1
, execute:
ALTER VIEW v1 UNSET AGGREGATION POLICY;
Monitor aggregation policies¶
It can be helpful to think of two general approaches to determine how to monitor aggregation policy usage.
Discover aggregation policies¶
You can use the AGGREGATION_POLICIES view in the Account Usage schema of the shared SNOWFLAKE database. This view is a catalog for all aggregation policies in your Snowflake account. For example:
SELECT * FROM SNOWFLAKE.ACCOUNT_USAGE.AGGREGATION_POLICIES ORDER BY POLICY_NAME;
Identify aggregation policy references¶
The POLICY_REFERENCES Information Schema table function can identify aggregation policy references. There are two different syntax options:
Return a row for each object (i.e. table or view) that has the specified aggregation policy set on it:
USE DATABASE my_db; USE SCHEMA information_schema; SELECT policy_name, policy_kind, ref_entity_name, ref_entity_domain, ref_column_name, ref_arg_column_names, policy_status FROM TABLE(information_schema.policy_references(policy_name => 'my_db.my_schema.aggpolicy'));
Return a row for each policy assigned to the table named
my_table
:USE DATABASE my_db; USE SCHEMA information_schema; SELECT policy_name, policy_kind, ref_entity_name, ref_entity_domain, ref_column_name, ref_arg_column_names, policy_status FROM TABLE(information_schema.policy_references(ref_entity_name => 'my_db.my_schema.my_table', ref_entity_domain => 'table'));
Query requirements¶
After an aggregation policy has been applied to a table or view, queries against that table or view must conform to certain requirements. This section discusses what is and isn’t allowed in a query against an aggregation-constrained table or view.
Note
Once part of the query properly aggregates data to satisfy the requirements of the aggregation policy, these query restrictions do not apply, and another part of the query can include things that are otherwise prohibited.
For example, the following query can use a SELECT statement that does not aggregate results because another part of the query has already satisfied the aggregation requirements of the policy that is assigned to protected_table
:
SELECT * FROM open_table ot WHERE ot.a > (SELECT SUM(id) FROM protected_table pt)
For additional restrictions on what can be included in a query, refer to Limitations.
- Aggregation Functions
The following aggregation functions are allowed in a query against an aggregation-constrained table:
A query can contain more than one of these allowed aggregation functions. A query fails if it attempts to use an aggregation function that is not allowed.
- Grouping Statement
A query against an aggregation-constrained table must aggregate data into groups of a minimum size. It can use an explicit grouping statement (i.e. a GROUP BY clause) or a scalar aggregation function that aggregates the entire data set (e.g.
COUNT(*)
).- Filters
In general, Snowflake does not restrict how a query uses WHERE and ON clauses to filter the aggregation-constrained table as long as it aggregates the rows selected by the filter.
- Joins
A query can join an aggregation-constrained table with another table, including another aggregation-constrained table.
Snowflake checks each aggregation group to make sure that the number of rows taken from an aggregation-constrained table meets or exceeds the minimum group size of that table. For example, if an aggregation-constrained table
table_a
with a minimum group size of 5 is joined withtable_b
with a minimum group size of 3, each group returned by the query must be created using at least 5 rows fromtable_a
and 3 rows fromtable_b
.Whether a query with a join meets the requirements of an aggregation-constrained table is determined by the number of rows taken from the table, not the size of a group. As a result, the size of a group created from the joined data could be greater than the minimum group size of the aggregation-constrained table, but still result in filtered data. For example, suppose:
agg_t
is aggregation constrained with a minimum group size of 2. This table contains a single integer columnc
that has the following content: {1
,2
,2
}.open_t
is unconstrained, and contains an integer columnc
with the following content: {1
,1
,1
,2
}.
A user executes the following query that joins the two tables:
SELECT c, COUNT(*) FROM agg_t, open_t WHERE agg_t.c = open_t.c GROUP BY agg_t.c;
The query will return:
+-----------------+ | c | COUNT(*) | |------+----------| | 2 | 2 | |------+----------| | null | 3 | +-----------------+
Even though the second group has 3 records, which is greater than the minimum group size, all of those records correspond to a single record in the aggregation-constrained table, so the value is filtered out.
- UNION ALL
A query can use UNION ALL to combine results of two subqueries, even if one or more of the queried tables are aggregation-constrained. Similar to joins, each group in the results must satisfy the minimum group size of every aggregation-constrained table being queried. For example, suppose:
Table
protected_table1
has a minimum group size of 2.Table
protected_table2
has a minimum group size of 5.
If you run the query:
SELECT a, COUNT(*) FROM ( SELECT a, b FROM protected_table1 UNION ALL SELECT a, b FROM protected_table2 ) GROUP BY a;
Each group formed by the key
a
must contain 2 records fromprotected_table1
and 5 records fromprotected_table2
, otherwise the records are placed in a remainder group.- External Functions
A query cannot call an external function unless another part of the query has properly aggregated results to meet the requirements of the aggregation-constrained table.
- Logging & Metrics
A query cannot log a column of an aggregation-constrained table via UDF logging or metrics.
- Data Type Conversions
A query that includes a data type conversion function in the SELECT statement must use the TRY version of the function. For example, the TRY_CAST function is allowed, but the CAST function is prohibited. The following data type conversion functions are allowed for numeric types:
- PIVOT
A query cannot use the PIVOT operator against a column in an aggregation-constrained table.
Extended example¶
Creating an aggregation policy and assigning the aggregation policy to a table follows the same general procedure as creating and assigning other policies, such as masking and projection policies:
If you are using a centralized management approach, create a custom role (e.g.
agg_policy_admin
) to manage the policy. Alternatively, you can use an existing role.Grant this role the privileges to create and assign an aggregation policy.
Create the aggregation policy.
Assign the aggregation policy to a table.
Once the aggregation policy is assigned to a table, successful queries against the table must aggregate its data.
The following extended example provides insight into each step in this process, from the provider’s access control administrator creating a custom role to a data consumer executing a query to return aggregated results.
- Access Control Administrator Tasks
Create a custom role to manage the aggregation policy. You could also re-use an existing role.
USE ROLE USERADMIN; CREATE ROLE AGG_POLICY_ADMIN;
Grant the
agg_policy_admin
custom role the privileges to create an aggregation policy in a schema and assign the aggregation policy to a table or view in the Snowflake account.This step assumes the aggregation policy will be stored in a database and schema named
privacy.agg_policies
and this database and schema already exist:GRANT USAGE ON DATABASE privacy TO ROLE agg_policy_admin; GRANT USAGE ON SCHEMA privacy.agg_policies TO ROLE agg_policy_admin; GRANT CREATE AGGREGATION POLICY ON SCHEMA privacy.agg_policies TO ROLE agg_policy_admin; GRANT APPLY AGGREGATION POLICY ON ACCOUNT TO ROLE agg_policy_admin;
The
agg_policy_admin
role can now be assigned to one or more users.For details about the privileges needed to work with aggregation policies, refer to Privileges and commands (in this topic).
- Aggregation Policy Administrator Tasks
Create an aggregation policy to require aggregation and define a minimum group size of 3:
USE ROLE agg_policy_admin; USE SCHEMA privacy.aggpolicies; CREATE AGGREGATION POLICY my_policy AS () RETURNS AGGREGATION_CONSTRAINT -> AGGREGATION_CONSTRAINT(MIN_GROUP_SIZE => 3);
Assign the aggregation policy to a table
t1
:ALTER TABLE t1 SET AGGREGATION POLICY my_policy;
- Consumer Query
Once the provider shares the aggregation-constrained table, the data consumer can execute queries against it. For this example, assume the aggregation-constrained table
t1
contains the following rows:peak
state
elevation
washington
NH
6288
cannon
NH
4080
kearsarge
NH
2937
mansfield
VT
4395
killington
VT
4229
wachusett
MA
2006
Now, assume that the consumer executes the following query against
t1
:SELECT state, AVG(elevation) AS avg_elevation FROM t1 GROUP BY state;
The results are:
+----------+-----------------+ | STATE | AVG_ELEVATION | |----------+-----------------+ | NH | 4435 | | NULL | 3543 | +----------+-----------------+
Note that the value of
state
in the second group isNULL
because it is a remainder group that averages the elevation of peaks in bothVT
andMA
.
Aggregation policies with Snowflake features¶
The following subsections briefly summarize how aggregation policies interact with various Snowflake features and services.
Other policies¶
This section describes how an aggregation policy interacts with other policies, including masking policies, row access policies, and projection policies.
You can attach other policies to an aggregation-constrained table. A successful query against the table must meet the requirements of all policies.
If a row access policy is assigned to an aggregation-constrained table, a row excluded from the query results based on the row access policy is not included when calculating the aggregated results.
The body of a masking policy, row access policy, or projection policy cannot reference an aggregation-constrained table, including its columns. Similarly, the body of the other policy cannot include a UDF that references the aggregation-constrained table.
Views and materialized views¶
You can assign an aggregation policy to both views and materialized views. When an aggregation policy is applied to a view, the underlying table does not become aggregation-constrained. This base table can still be queried without restriction.
To avoid the possibility of exposing sensitive data, all aggregation-constrained views are treated as if they are secure views even if they are not.
Whether you can create a view from an aggregation-constrained table depends on the type of view:
You can create a regular view from one or more aggregation-constrained tables, however queries against that view must aggregate data in a way that meets the restrictions of those base tables.
You cannot create a materialized view based on an aggregation-constrained table or view, nor can you assign an aggregation policy to a table or view upon which a materialized view is based.
Cloned objects¶
The following approach helps to safeguard data from users with the SELECT privilege on a cloned table or view that is stored in the cloned database or schema:
Cloning an individual aggregation policy object is not supported.
Cloning a database results in the cloning of all aggregation policies within the database.
Cloning a schema results in the cloning of all aggregation policies within the schema.
A cloned table maps to the same aggregation policies as the source table.
When a table is cloned in the context of its parent schema cloning, if the source table has a reference to an aggregation policy in the same parent schema (i.e. a local reference), the cloned table will have a reference to the cloned aggregation policy.
If the source table refers to an aggregation policy in a different schema (i.e. a foreign reference), then the cloned table retains the foreign reference.
For more information, see CREATE <object> … CLONE.
Replication¶
Aggregation policies and their assignments can be replicated using database replication and replication groups.
For database replication, the replication operation fails if either of the following conditions is true:
The primary database is in an Enterprise (or higher) account and contains a policy but one or more of the accounts approved for replication are on lower editions.
A table or view contained in the primary database has a dangling reference to an aggregation policy in another database.
The dangling reference behavior for database replication can be avoided when replicating multiple databases in a replication group.
Privileges and commands¶
The following subsections provide information to help manage aggregation policies.
Aggregation policy privileges¶
Snowflake supports the following privileges on the aggregation policy object.
The USAGE privilege on the parent database and schema are required to perform operations on any object in a schema.
Privilege |
Usage |
---|---|
APPLY |
Enables the set and unset operations for an aggregation policy on a table. |
OWNERSHIP |
Transfers ownership of the aggregation policy, which grants full control over the aggregation policy. Required to alter most properties of an aggregation policy. |
For details, see Summary of DDL commands, operations, and privileges (in this topic).
Aggregation policy DDL reference¶
Snowflake supports the following DDL to create and manage aggregation policies.
Summary of DDL commands, operations, and privileges¶
The following table summarizes the relationship between aggregation policy privileges and DDL operations.
The USAGE privilege on the parent database and schema are required to perform operations on any object in a schema.
Operation |
Privilege required |
---|---|
Create aggregation policy. |
A role with the CREATE AGGREGATION POLICY privilege in the same schema. |
Alter aggregation policy. |
The role with the OWNERSHIP privilege on the aggregation policy. |
Describe aggregation policy |
One of the following:
|
Drop aggregation policy. |
A role with the OWNERSHIP privilege on the aggregation policy. |
Show aggregation policies. |
One of the following:
|
Set or unset an aggregation policy on a table. |
One of the following:
|
Snowflake supports different permissions to create and set an aggregation policy on an object.
For a centralized aggregation policy management approach in which the
aggregation_policy_admin
custom role creates and sets aggregation policies on all tables, the following permissions are necessary:USE ROLE securityadmin; GRANT USAGE ON DATABASE mydb TO ROLE agg_policy_admin; GRANT USAGE ON SCHEMA mydb.schema TO ROLE proj_policy_admin; GRANT CREATE AGGREGATION POLICY ON SCHEMA mydb.schema TO ROLE aggregation_policy_admin; GRANT APPLY ON AGGREGATION POLICY ON ACCOUNT TO ROLE aggregation_policy_admin;
In a hybrid management approach, a single role has the CREATE AGGREGATION POLICY privilege to ensure aggregation policies are named consistently and individual teams or roles have the APPLY privilege for a specific aggregation policy.
For example, the custom role
finance_role
role can be granted the permission to set the aggregation policycost_center
on tables and views the role owns (i.e. the role has the OWNERSHIP privilege on the table or view):USE ROLE securityadmin; GRANT CREATE AGGREGATION POLICY ON SCHEMA mydb.schema TO ROLE aggregation_policy_admin; GRANT APPLY ON AGGREGATION POLICY cost_center TO ROLE finance_role;