ZetaSQL supports differentially private aggregate functions. For an explanation of how aggregate functions work, see Aggregate function calls.
Differentially private aggregate functions can only be used with differentially private queries.
WITH ANONYMIZATION ...
ANON_AVG(expression [CLAMPED BETWEEN lower_bound AND upper_bound])
Description
Returns the average of non-NULL
, non-NaN
values in the expression.
This function first computes the average per privacy unit column, and then
computes the final result by averaging these averages.
This function must be used with the ANONYMIZATION
clause and
can support these arguments:
expression
: The input expression. This can be any numeric input type, such asINT64
.CLAMPED BETWEEN
clause: Perform clamping per privacy unit column averages.
Return type
DOUBLE
Examples
The following differentially private query gets the average number of each item
requested per professor. Smaller aggregations may not be included. This query
references a view called view_on_professors
.
-- With noise
SELECT
WITH ANONYMIZATION
OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1)
item,
ANON_AVG(quantity CLAMPED BETWEEN 0 AND 100) average_quantity
FROM {{USERNAME}}.view_on_professors
GROUP BY item;
-- These results will change each time you run the query.
-- Smaller aggregations may be removed.
+----------+------------------+
| item | average_quantity |
+----------+------------------+
| pencil | 38.5038356810269 |
| pen | 13.4725028762032 |
+----------+------------------+
-- Without noise (this un-noised version is for demonstration only)
SELECT
WITH ANONYMIZATION
OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=1)
item,
ANON_AVG(quantity) average_quantity
FROM {{USERNAME}}.view_on_professors
GROUP BY item;
-- These results will not change when you run the query.
+----------+------------------+
| item | average_quantity |
+----------+------------------+
| scissors | 8 |
| pencil | 40 |
| pen | 18.5 |
+----------+------------------+
Note: You can learn more about when and when not to use noise here.
WITH ANONYMIZATION ...
ANON_COUNT(*)
Description
Returns the number of rows in the
differentially private FROM
clause. The final result
is an aggregation across privacy unit columns.
Input values are clamped implicitly. Clamping is
performed per privacy unit column.
This function must be used with the ANONYMIZATION
clause.
Return type
INT64
Examples
The following differentially private query counts the number of requests for
each item. This query references a view called
view_on_professors
.
-- With noise
SELECT
WITH ANONYMIZATION
OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1)
item,
ANON_COUNT(*) times_requested
FROM {{USERNAME}}.view_on_professors
GROUP BY item;
-- These results will change each time you run the query.
-- Smaller aggregations may be removed.
+----------+-----------------+
| item | times_requested |
+----------+-----------------+
| pencil | 5 |
| pen | 2 |
+----------+-----------------+
--Without noise (this un-noised version is for demonstration only)
SELECT
WITH ANONYMIZATION
OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=1)
item,
ANON_COUNT(*) times_requested
FROM {{USERNAME}}.view_on_professors
GROUP BY item;
-- These results will not change when you run the query.
+----------+-----------------+
| item | times_requested |
+----------+-----------------+
| scissors | 1 |
| pencil | 4 |
| pen | 3 |
+----------+-----------------+
Note: You can learn more about when and when not to use noise here.
WITH ANONYMIZATION ...
ANON_COUNT(expression [CLAMPED BETWEEN lower_bound AND upper_bound])
Description
Returns the number of non-NULL
expression values. The final result is an
aggregation across privacy unit columns.
This function must be used with the ANONYMIZATION
clause and
can support these arguments:
expression
: The input expression. This can be any numeric input type, such asINT64
.CLAMPED BETWEEN
clause: Perform clamping per privacy unit column.
Return type
INT64
Examples
The following differentially private query counts the number of requests made
for each type of item. This query references a view called
view_on_professors
.
-- With noise
SELECT
WITH ANONYMIZATION
OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1)
item,
ANON_COUNT(item CLAMPED BETWEEN 0 AND 100) times_requested
FROM {{USERNAME}}.view_on_professors
GROUP BY item;
-- These results will change each time you run the query.
-- Smaller aggregations may be removed.
+----------+-----------------+
| item | times_requested |
+----------+-----------------+
| pencil | 5 |
| pen | 2 |
+----------+-----------------+
--Without noise (this un-noised version is for demonstration only)
SELECT
WITH ANONYMIZATION
OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=1)
item,
ANON_COUNT(item CLAMPED BETWEEN 0 AND 100) times_requested
FROM {{USERNAME}}.view_on_professors
GROUP BY item;
-- These results will not change when you run the query.
+----------+-----------------+
| item | times_requested |
+----------+-----------------+
| scissors | 1 |
| pencil | 4 |
| pen | 3 |
+----------+-----------------+
Note: You can learn more about when and when not to use noise here.
WITH ANONYMIZATION ...
ANON_PERCENTILE_CONT(expression, percentile [CLAMPED BETWEEN lower_bound AND upper_bound])
Description
Takes an expression and computes a percentile for it. The final result is an aggregation across privacy unit columns.
This function must be used with the ANONYMIZATION
clause and
can support these arguments:
expression
: The input expression. This can be most numeric input types, such asINT64
.NULL
s are always ignored.percentile
: The percentile to compute. The percentile must be a literal in the range [0, 1]CLAMPED BETWEEN
clause: Perform clamping per privacy unit column.
NUMERIC
and BIGNUMERIC
arguments are not allowed.
If you need them, cast them to
DOUBLE
first.
Return type
DOUBLE
Examples
The following differentially private query gets the percentile of items
requested. Smaller aggregations may not be included. This query references a
view called view_on_professors
.
-- With noise
SELECT
WITH ANONYMIZATION
OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1)
item,
ANON_PERCENTILE_CONT(quantity, 0.5 CLAMPED BETWEEN 0 AND 100) percentile_requested
FROM {{USERNAME}}.view_on_professors
GROUP BY item;
-- These results will change each time you run the query.
-- Smaller aggregations may be removed.
+----------+----------------------+
| item | percentile_requested |
+----------+----------------------+
| pencil | 72.00011444091797 |
| scissors | 8.000175476074219 |
| pen | 23.001075744628906 |
+----------+----------------------+
WITH ANONYMIZATION ...
ANON_QUANTILES(expression, number CLAMPED BETWEEN lower_bound AND upper_bound)
Description
Returns an array of differentially private quantile boundaries for values in
expression
. The first element in the return value is the
minimum quantile boundary and the last element is the maximum quantile boundary.
The returned results are aggregations across privacy unit columns.
This function must be used with the ANONYMIZATION
clause and
can support these arguments:
expression
: The input expression. This can be most numeric input types, such asINT64
.NULL
s are always ignored.number
: The number of quantiles to create. This must be anINT64
.CLAMPED BETWEEN
clause: Perform clamping per privacy unit column.
NUMERIC
and BIGNUMERIC
arguments are not allowed.
If you need them, cast them to
DOUBLE
first.
Return type
ARRAY
<DOUBLE
>
Examples
The following differentially private query gets the five quantile boundaries of
the four quartiles of the number of items requested. Smaller aggregations
may not be included. This query references a view called
view_on_professors
.
-- With noise
SELECT
WITH ANONYMIZATION
OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1)
item,
ANON_QUANTILES(quantity, 4 CLAMPED BETWEEN 0 AND 100) quantiles_requested
FROM {{USERNAME}}.view_on_professors
GROUP BY item;
-- These results will change each time you run the query.
-- Smaller aggregations may be removed.
+----------+----------------------------------------------------------------------+
| item | quantiles_requested |
+----------+----------------------------------------------------------------------+
| pen | [6.409375,20.647684733072918,41.40625,67.30848524305556,99.80078125] |
| pencil | [6.849259,44.010416666666664,62.64204,65.83806818181819,98.59375] |
+----------+----------------------------------------------------------------------+
WITH ANONYMIZATION ...
ANON_STDDEV_POP(expression [CLAMPED BETWEEN lower_bound AND upper_bound])
Description
Takes an expression and computes the population (biased) standard deviation of
the values in the expression. The final result is an aggregation across
privacy unit columns between 0
and +Inf
.
This function must be used with the ANONYMIZATION
clause and
can support these arguments:
expression
: The input expression. This can be most numeric input types, such asINT64
.NULL
s are always ignored.CLAMPED BETWEEN
clause: Perform clamping per individual entity values.
NUMERIC
and BIGNUMERIC
arguments are not allowed.
If you need them, cast them to
DOUBLE
first.
Return type
DOUBLE
Examples
The following differentially private query gets the
population (biased) standard deviation of items requested. Smaller aggregations
may not be included. This query references a view called
view_on_professors
.
-- With noise
SELECT
WITH ANONYMIZATION
OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1)
item,
ANON_STDDEV_POP(quantity CLAMPED BETWEEN 0 AND 100) pop_standard_deviation
FROM {{USERNAME}}.view_on_professors
GROUP BY item;
-- These results will change each time you run the query.
-- Smaller aggregations may be removed.
+----------+------------------------+
| item | pop_standard_deviation |
+----------+------------------------+
| pencil | 25.350871122442054 |
| scissors | 50 |
| pen | 2 |
+----------+------------------------+
WITH ANONYMIZATION ...
ANON_SUM(expression [CLAMPED BETWEEN lower_bound AND upper_bound])
Description
Returns the sum of non-NULL
, non-NaN
values in the expression. The final
result is an aggregation across privacy unit columns.
This function must be used with the ANONYMIZATION
clause and
can support these arguments:
expression
: The input expression. This can be any numeric input type, such asINT64
.CLAMPED BETWEEN
clause: Perform clamping per privacy unit column.
Return type
One of the following supertypes:
INT64
UINT64
DOUBLE
Examples
The following differentially private query gets the sum of items requested.
Smaller aggregations may not be included. This query references a view called
view_on_professors
.
-- With noise
SELECT
WITH ANONYMIZATION
OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1)
item,
ANON_SUM(quantity CLAMPED BETWEEN 0 AND 100) quantity
FROM {{USERNAME}}.view_on_professors
GROUP BY item;
-- These results will change each time you run the query.
-- Smaller aggregations may be removed.
+----------+-----------+
| item | quantity |
+----------+-----------+
| pencil | 143 |
| pen | 59 |
+----------+-----------+
-- Without noise (this un-noised version is for demonstration only)
SELECT
WITH ANONYMIZATION
OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=1)
item,
ANON_SUM(quantity) quantity
FROM {{USERNAME}}.view_on_professors
GROUP BY item;
-- These results will not change when you run the query.
+----------+----------+
| item | quantity |
+----------+----------+
| scissors | 8 |
| pencil | 144 |
| pen | 58 |
+----------+----------+
Note: You can learn more about when and when not to use noise here.
WITH ANONYMIZATION ...
ANON_VAR_POP(expression [CLAMPED BETWEEN lower_bound AND upper_bound])
Description
Takes an expression and computes the population (biased) variance of the values
in the expression. The final result is an aggregation across
privacy unit columns between 0
and +Inf
. You can
clamp the input values explicitly, otherwise input values
are clamped implicitly. Clamping is performed per individual entity values.
This function must be used with the ANONYMIZATION
clause and
can support these arguments:
expression
: The input expression. This can be any numeric input type, such asINT64
.NULL
s are always ignored.CLAMPED BETWEEN
clause: Perform clamping per individual entity values.
NUMERIC
and BIGNUMERIC
arguments are not allowed.
If you need them, cast them to
DOUBLE
first.
Return type
DOUBLE
Examples
The following differentially private query gets the
population (biased) variance of items requested. Smaller aggregations may not
be included. This query references a view called
view_on_professors
.
-- With noise
SELECT
WITH ANONYMIZATION
OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1)
item,
ANON_VAR_POP(quantity CLAMPED BETWEEN 0 AND 100) pop_variance
FROM {{USERNAME}}.view_on_professors
GROUP BY item;
-- These results will change each time you run the query.
-- Smaller aggregations may be removed.
+----------+-----------------+
| item | pop_variance |
+----------+-----------------+
| pencil | 642 |
| pen | 2.6666666666665 |
| scissors | 2500 |
+----------+-----------------+
WITH DIFFERENTIAL_PRIVACY ...
AVG(expression[, contribution_bounds_per_group => (lower_bound, upper_bound)])
Description
Returns the average of non-NULL
, non-NaN
values in the expression.
This function first computes the average per privacy unit column, and then
computes the final result by averaging these averages.
This function must be used with the DIFFERENTIAL_PRIVACY
clause and
can support these arguments:
expression
: The input expression. This can be any numeric input type, such asINT64
.contribution_bounds_per_group
: The contribution bounds named argument. Perform clamping per each group separately before performing intermediate grouping on the privacy unit column.
Return type
DOUBLE
Examples
The following differentially private query gets the average number of each item
requested per professor. Smaller aggregations may not be included. This query
references a table called professors
.
-- With noise
SELECT
WITH DIFFERENTIAL_PRIVACY
OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1, privacy_unit_column=id)
item,
AVG(quantity, contribution_bounds_per_group => (0,100)) average_quantity
FROM professors
GROUP BY item;
-- These results will change each time you run the query.
-- Smaller aggregations may be removed.
+----------+------------------+
| item | average_quantity |
+----------+------------------+
| pencil | 38.5038356810269 |
| pen | 13.4725028762032 |
+----------+------------------+
-- Without noise (this un-noised version is for demonstration only)
SELECT
WITH DIFFERENTIAL_PRIVACY
OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=1, privacy_unit_column=id)
item,
AVG(quantity) average_quantity
FROM professors
GROUP BY item;
-- These results will not change when you run the query.
+----------+------------------+
| item | average_quantity |
+----------+------------------+
| scissors | 8 |
| pencil | 40 |
| pen | 18.5 |
+----------+------------------+
Note: You can learn more about when and when not to use noise here.
WITH DIFFERENTIAL_PRIVACY ...
COUNT(* [, contribution_bounds_per_group => (lower_bound, upper_bound)]))
Description
Returns the number of rows in the
differentially private FROM
clause. The final result
is an aggregation across privacy unit columns.
This function must be used with the DIFFERENTIAL_PRIVACY
clause and
can support the following argument:
contribution_bounds_per_group
: The contribution bounds named argument. Perform clamping per each group separately before performing intermediate grouping on the privacy unit column.
Return type
INT64
Examples
The following differentially private query counts the number of requests for
each item. This query references a table called
professors
.
-- With noise
SELECT
WITH DIFFERENTIAL_PRIVACY
OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1, privacy_unit_column=id)
item,
COUNT(*) times_requested
FROM professors
GROUP BY item;
-- These results will change each time you run the query.
-- Smaller aggregations may be removed.
+----------+-----------------+
| item | times_requested |
+----------+-----------------+
| pencil | 5 |
| pen | 2 |
+----------+-----------------+
--Without noise (this un-noised version is for demonstration only)
SELECT
WITH DIFFERENTIAL_PRIVACY
OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=1, privacy_unit_column=id)
item,
COUNT(*) times_requested
FROM professors
GROUP BY item;
-- These results will not change when you run the query.
+----------+-----------------+
| item | times_requested |
+----------+-----------------+
| scissors | 1 |
| pencil | 4 |
| pen | 3 |
+----------+-----------------+
Note: You can learn more about when and when not to use noise here.
WITH DIFFERENTIAL_PRIVACY ...
COUNT(expression[, contribution_bounds_per_group => (lower_bound, upper_bound)])
Description
Returns the number of non-NULL
expression values. The final result is an
aggregation across privacy unit columns.
This function must be used with the DIFFERENTIAL_PRIVACY
clause and
can support these arguments:
expression
: The input expression. This can be any numeric input type, such asINT64
.contribution_bounds_per_group
: The contribution bounds named argument. Perform clamping per each group separately before performing intermediate grouping on the privacy unit column.
Return type
INT64
Examples
The following differentially private query counts the number of requests made
for each type of item. This query references a table called
professors
.
-- With noise
SELECT
WITH DIFFERENTIAL_PRIVACY
OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1, privacy_unit_column=id)
item,
COUNT(item, contribution_bounds_per_group => (0,100)) times_requested
FROM professors
GROUP BY item;
-- These results will change each time you run the query.
-- Smaller aggregations may be removed.
+----------+-----------------+
| item | times_requested |
+----------+-----------------+
| pencil | 5 |
| pen | 2 |
+----------+-----------------+
--Without noise (this un-noised version is for demonstration only)
SELECT
WITH DIFFERENTIAL_PRIVACY
OPTIONS(epsilon=1e20, delta=.01, max_groups_contributed=1, privacy_unit_column=id)
item,
COUNT(item, contribution_bounds_per_group => (0,100)) times_requested
FROM professors
GROUP BY item;
-- These results will not change when you run the query.
+----------+-----------------+
| item | times_requested |
+----------+-----------------+
| scissors | 1 |
| pencil | 4 |
| pen | 3 |
+----------+-----------------+
Note: You can learn more about when and when not to use noise here.
WITH DIFFERENTIAL_PRIVACY ...
PERCENTILE_CONT(expression, percentile, contribution_bounds_per_row => (lower_bound, upper_bound))
Description
Takes an expression and computes a percentile for it. The final result is an aggregation across privacy unit columns.
This function must be used with the DIFFERENTIAL_PRIVACY
clause and
can support these arguments:
expression
: The input expression. This can be most numeric input types, such asINT64
.NULL
s are always ignored.percentile
: The percentile to compute. The percentile must be a literal in the range [0, 1]contribution_bounds_per_row
: The contribution bounds named argument. Perform clamping per each row separately before performing intermediate grouping on the privacy unit column.
NUMERIC
and BIGNUMERIC
arguments are not allowed.
If you need them, cast them to
DOUBLE
first.
Return type
DOUBLE
Examples
The following differentially private query gets the percentile of items
requested. Smaller aggregations may not be included. This query references a
view called professors
.
-- With noise
SELECT
WITH DIFFERENTIAL_PRIVACY
OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1, privacy_unit_column=id)
item,
PERCENTILE_CONT(quantity, 0.5, contribution_bounds_per_row => (0,100)) percentile_requested
FROM professors
GROUP BY item;
-- These results will change each time you run the query.
-- Smaller aggregations may be removed.
+----------+----------------------+
| item | percentile_requested |
+----------+----------------------+
| pencil | 72.00011444091797 |
| scissors | 8.000175476074219 |
| pen | 23.001075744628906 |
+----------+----------------------+
WITH DIFFERENTIAL_PRIVACY ...
VAR_POP(expression[, contribution_bounds_per_row => (lower_bound, upper_bound)])
Description
Takes an expression and computes the population (biased) variance of the values
in the expression. The final result is an aggregation across
privacy unit columns between 0
and +Inf
. You can
clamp the input values explicitly, otherwise input values
are clamped implicitly. Clamping is performed per individual user values.
This function must be used with the DIFFERENTIAL_PRIVACY
clause and
can support these arguments:
expression
: The input expression. This can be any numeric input type, such asINT64
.NULL
s are always ignored.contribution_bounds_per_row
: The contribution bounds named argument. Perform clamping per each row separately before performing intermediate grouping on individual user values.
NUMERIC
and BIGNUMERIC
arguments are not allowed.
If you need them, cast them to
DOUBLE
first.
Return type
DOUBLE
Examples
The following differentially private query gets the
population (biased) variance of items requested. Smaller aggregations may not
be included. This query references a view called
professors
.
-- With noise
SELECT
WITH DIFFERENTIAL_PRIVACY
OPTIONS(epsilon=10, delta=.01, max_groups_contributed=1, privacy_unit_column=id)
item,
VAR_POP(quantity, contribution_bounds_per_row => (0,100)) pop_variance
FROM professors
GROUP BY item;
-- These results will change each time you run the query.
-- Smaller aggregations may be removed.
+----------+-----------------+
| item | pop_variance |
+----------+-----------------+
| pencil | 642 |
| pen | 2.6666666666665 |
| scissors | 2500 |
+----------+-----------------+
In differentially private queries, aggregation clamping is used to limit the contribution of outliers. You can clamp implicitly or explicitly.
If you clamp explicitly, you can clamp values with the
contribution bounds named argument (recommended) or
the CLAMPED BETWEEN
clause.
contribution_bounds_per_group => (lower_bound,upper_bound)
contribution_bounds_per_row => (lower_bound,upper_bound)
Use this named argument to clamp values per group or per row between a lower and upper bound.
contribution_bounds_per_row
: Contributions per privacy unit are clamped on a per-row (per-record) basis. This means that:- Upper and lower bounds are applied to column values in individual rows produced by the input subquery independently.
- The maximum possible contribution per privacy unit (and per grouping set)
is the product of the per-row contribution limit and
max_groups_contributed
differential privacy parameter.
contribution_bounds_per_group
: Contributions per privacy unit are clamped on a unique set of entity-specifiedGROUP BY
keys. The upper and lower bounds are applied to values per group after the values are aggregated per privacy unit.lower_bound
: Numeric literal that represents the smallest value to include in an aggregation.upper_bound
: Numeric literal that represents the largest value to include in an aggregation.
NUMERIC
and BIGNUMERIC
arguments are not allowed.
Examples
The following anonymized query clamps each aggregate contribution for each
differential privacy ID and within a specified range (0
and 100
).
As long as all or most values fall within this range, your results
will be accurate. This query references a view called
view_on_professors
.
--Without noise (this un-noised version is for demonstration only)
SELECT WITH DIFFERENTIAL_PRIVACY
OPTIONS (
epsilon = 1e20,
delta = .01,
privacy_unit_column=id
)
item,
AVG(quantity, contribution_bounds_per_group=>(0,100)) average_quantity
FROM view_on_professors
GROUP BY item;
+----------+------------------+
| item | average_quantity |
+----------+------------------+
| scissors | 8 |
| pencil | 40 |
| pen | 18.5 |
+----------+------------------+
Notice what happens when most or all values fall outside of the clamped range. To get accurate results, ensure that the difference between the upper and lower bound is as small as possible, and that most inputs are between the upper and lower bound.
--Without noise (this un-noised version is for demonstration only)
SELECT WITH DIFFERENTIAL_PRIVACY
OPTIONS (
epsilon = 1e20,
delta = .01,
privacy_unit_column=id
)
item,
AVG(quantity, contribution_bounds_per_group=>(50,100)) average_quantity
FROM view_on_professors
GROUP BY item;
+----------+------------------+
| item | average_quantity |
+----------+------------------+
| scissors | 54 |
| pencil | 58 |
| pen | 51 |
+----------+------------------+
To learn more about when and when not to use noise in differentially private queries, see Differentially privacy.
CLAMPED BETWEEN lower_bound AND upper_bound
Use this clause to clamp values between a lower and an upper bound.
lower_bound
: Numeric literal that represents the smallest value to include in an aggregation.upper_bound
: Numeric literal that represents the largest value to include in an aggregation.
NUMERIC
and BIGNUMERIC
arguments are not allowed.
Note: This is a legacy feature. If possible, use the contribution_bounds
named argument instead.
Examples
The following differentially private query clamps each aggregate contribution
for each privacy unit column and within a specified range (0
and 100
).
As long as all or most values fall within this range, your results will be
accurate. This query references a view called
view_on_professors
.
--Without noise (this un-noised version is for demonstration only)
SELECT WITH ANONYMIZATION
OPTIONS (
epsilon = 1e20,
delta = .01,
max_groups_contributed = 1
)
item,
AVG(quantity CLAMPED BETWEEN 0 AND 100) average_quantity
FROM view_on_professors
GROUP BY item;
+----------+------------------+
| item | average_quantity |
+----------+------------------+
| scissors | 8 |
| pencil | 40 |
| pen | 18.5 |
+----------+------------------+
Notice what happens when most or all values fall outside of the clamped range. To get accurate results, ensure that the difference between the upper and lower bound is as small as possible, and that most inputs are between the upper and lower bound.
--Without noise (this un-noised version is for demonstration only)
SELECT WITH ANONYMIZATION
OPTIONS (
epsilon = 1e20,
delta = .01,
max_groups_contributed = 1
)
item,
AVG(quantity CLAMPED BETWEEN 50 AND 100) average_quantity
FROM view_on_professors
GROUP BY item;
+----------+------------------+
| item | average_quantity |
+----------+------------------+
| scissors | 54 |
| pencil | 58 |
| pen | 51 |
+----------+------------------+
To learn more about when and when not to use noise in differentially private queries, see Differentially privacy.
In differentially private aggregate functions, clamping explicitly clamps the total contribution from each privacy unit column to within a specified range.
Explicit bounds are uniformly applied to all aggregations. So even if some aggregations have a wide range of values, and others have a narrow range of values, the same bounds are applied to all of them. On the other hand, when implicit bounds are inferred from the data, the bounds applied to each aggregation can be different.
Explicit bounds should be chosen to reflect public information. For example, bounding ages between 0 and 100 reflects public information because the age of most people generally falls within this range.
Important: The results of the query reveal the explicit bounds. Do not use explicit bounds based on the entity data; explicit bounds should be based on public information.
In differentially private aggregate functions, explicit clamping is optional. If you don't include this clause, clamping is implicit, which means bounds are derived from the data itself in a differentially private way. The process is somewhat random, so aggregations with identical ranges can have different bounds.
Implicit bounds are determined for each aggregation. So if some aggregations have a wide range of values, and others have a narrow range of values, implicit bounding can identify different bounds for different aggregations as appropriate. Implicit bounds might be an advantage or a disadvantage depending on your use case. Different bounds for different aggregations can result in lower error. Different bounds also means that different aggregations have different levels of uncertainty, which might not be directly comparable. Explicit bounds, on the other hand, apply uniformly to all aggregations and should be derived from public information.
When clamping is implicit, part of the total epsilon is spent picking bounds. This leaves less epsilon for aggregations, so these aggregations are noisier.