调整隐私控制
本主题描述了数据所有者可以用以调整 Snowflake 用来在结果中引入噪声的隐私控制的技术。Snowflake 建议按照本主题中介绍的顺序尝试这些选项。
Snowflake 提供参数来调整隐私预算对隐私损失的限制和每个汇总操作使用的最大隐私预算(在差分隐私文献中统称为 epsilon)。
步骤 1:调整隐私域¶
在调整隐私预算之前,应考虑调整隐私保护表中列的隐私域设置。Snowflake 引入的噪声足以掩盖一列中的所有数值,因此数值范围越大,必须引入的噪声就越多。请遵循以下指导原则:
- If you want to increase the noise, broaden the range to include values that are greater or less than the actual values. Remember, the privacy domain defines all possible values, not actual values.
- If you want to decrease the noise, narrow the privacy domain to exclude or clamp values outside a useful range. For information about how values outside the privacy domain are treated, see Values outside a privacy domain.
Note
The analyst can also narrow a privacy domain to decrease noise. For more information, see Narrowing a privacy domain to improve results
步骤 2:调整 MAX_ BUDGET_ PER_ AGGREGATE 参数¶
If you’ve adjusted the privacy domain, but still need to fine-tune your privacy controls, you can start modifying settings that affect the
privacy budget. Adjusting the MAX_BUDGET_PER_AGGREGATE parameter in the body of a privacy policy controls how much of a privacy
budget can be spent on each aggregate in a query (that is, how much privacy loss an aggregate can incur). Adjusting this parameter changes
the amount of noise added to each aggregate query, as well as the number of aggregates that can be executed before the privacy budget
limit is reached.
The parameter sets the level for each aggregate, not each query. As an example, the query SELECT COUNT(*), AVG(a) ... has two
aggregates: COUNT(*) and AVG(a).
To adjust the maximum privacy loss incurred by each aggregate in a query, use the ALTER PRIVACY POLICY command to
set a new value for the MAX_BUDGET_PER_AGGREGATE parameter. For example:
步骤 3:调整隐私预算限额¶
如果调整其他隐私控制并不能达到您想要的效果,您可以调整隐私预算对隐私损失的限制。其他隐私控制会影响查询结果中的噪声量,而调整预算限制则会影响分析师可以运行的查询次数。
分析师每次针对受隐私保护的表运行带有聚合函数的查询时,分析师的累计隐私损失都会递增,剩余聚合函数的估计数量也会递减。当累计隐私损失达到隐私预算上限时,分析师就不能再运行其他查询。如果您想最大限度地提高数据对分析师的有用性,可以根据您认为分析师在每个预算窗口中将运行的查询次数来确定预算限额。
Note
Remember that cumulative privacy loss is reset to 0 on a fixed schedule, as defined by the budget window. When the privacy budget is reset, the analyst can run a fresh set of queries even if they reached the budget limit during the previous budget window.
The ESTIMATE_REMAINING_DP_AGGREGATES function helps estimate the number of queries remaining for a privacy
budget. In general, this number is based on the number of aggregates in each query and the value of the MAX_BUDGET_PER_AGGREGATE
parameter that you specified in the body of the privacy policy. For an extended example of using the ESTIMATE_REMAINING_DP_AGGREGATES
function to see the effects of queries on the privacy budget, see Tracking privacy budget spending.
After you have used the ESTIMATE_REMAINING_DP_AGGREGATES function to get an idea of how much privacy budget is spent on a series of queries,
you can adjust the BUDGET_LIMIT parameter in the body of the privacy policy to set a new privacy budget limit. For example:
Important
Note that this command includes the MAX_BUDGET_PER_AGGREGATE parameter that was set previously. If you don’t include a parameter
in the ALTER PRIVACY POLICY statement, it resets to its default value.