Vector summation DP aggregation #264

dvadym · 2022-04-18T12:42:09Z

Context

DPEngine.aggregate performs DP aggregations of scalar values (sum, count, mean etc). A set of computed metrics is controlled with metrics field of aggregate_params argument.
The result of this function is a collection of (partition_key, named_tuple_with_requested_metrics)

Note: More details on the terminology is here.

Goals

Support of vector_sum in `DPEngine.aggregate`

The goal is Implement full support of vector_sum in DPEngine.aggregate, i.e. the values to aggregate are arrays of the same size, and output is (partition_key, named_tuple["array_sum": sum_of_vectors_per_partition_key]).

References:

All metrics are aggregated with combiners (e.g. SumCombiner )
There is already a low level function that applies Laplace/Gaussian mechanism to np arrays

This task can be slit in 2 parts:

Implementing VectorSumCombiner, which performs aggregation
Plumb VectorSumCombiner into DPEngine.aggregate

Expose `vector_sum` computation to high-level Beam and Spark APIs.

High-level Beam and Spark APIs are represented by PrivatePCollection and PrivateRDD classes and transformations on them. All DP computations are performed in DPEngine.
PrivatePCollection and PrivateRDD keeps data in internal collection (PCollection or RDD correspondingly). They provide a guarantee, that only data that has been aggregated in a DP manner, using no more than the specified privacy budget can be extracted.
Private Beam and Private Spark transformation are wrappers around DPEngine.aggregate. There are transformation for COUNT, MEAN etc.

variance transformation can be used as a good example:

The text was updated successfully, but these errors were encountered:

rialg · 2022-05-10T09:13:46Z

I can take a look at this one

dvadym · 2022-05-10T10:55:26Z

Sure, go ahead! Thanks!

rialg · 2022-05-12T13:46:11Z

IIUC, the VectorSumCombiner class will be similar to SumCombiner, but the AccumulatorType = np.ndarray. Is this correct?

dvadym · 2022-05-12T13:59:23Z

Yes, correct

rialg · 2022-05-12T15:22:44Z

In order to use add_noise_vector, an object of AdditiveVectorNoiseParams needs to be created. AFAIK, CombinerParams should contain the attributes needed to populate AdditiveVectorNoiseParams. Would it make sense to extend AggregateParams with the missing fields for the vector noise?

For instance:

    max_norm: float
    l0_sensitivity: float
    linf_sensitivity: float
    norm_kind: pipeline_dp.aggregate_params.NormKind

dvadym · 2022-05-13T06:49:37Z

Good question, we need to introduce max_norm and norm_kind in AggregateParams.

l0_sensitivity = max_partitions_contributed
linf_sensitivity = max_contributions_per_partition

rialg · 2022-05-13T13:55:38Z

I'm trying to include VectorSumCombiner in DPEngine.aggregate. But, I would need to understand whether it should be used with the CompoundCombiners class. Should this case be considered as a separete branch in create_compound_combiner, similar to what happens with the metric pipeline_dp.Metrics.PRIVACY_ID_COUNT?

dvadym added the Type: New Feature ➕ Introduction of a completely new addition to the codebase label Apr 18, 2022

dvadym assigned rialg May 10, 2022

rialg mentioned this issue May 13, 2022

Implementing VectorSumCombiner #276

Merged

4 tasks

rialg mentioned this issue Jun 13, 2022

Expose vector_sum computation to high-level Beam API #293

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vector summation DP aggregation #264

Vector summation DP aggregation #264

dvadym commented Apr 18, 2022 •

edited

Loading

rialg commented May 10, 2022

dvadym commented May 10, 2022

rialg commented May 12, 2022

dvadym commented May 12, 2022

rialg commented May 12, 2022 •

edited

Loading

dvadym commented May 13, 2022

rialg commented May 13, 2022

Vector summation DP aggregation #264

Vector summation DP aggregation #264

Comments

dvadym commented Apr 18, 2022 • edited Loading

Context

Goals

Support of vector_sum in DPEngine.aggregate

Expose vector_sum computation to high-level Beam and Spark APIs.

rialg commented May 10, 2022

dvadym commented May 10, 2022

rialg commented May 12, 2022

dvadym commented May 12, 2022

rialg commented May 12, 2022 • edited Loading

dvadym commented May 13, 2022

rialg commented May 13, 2022

dvadym commented Apr 18, 2022 •

edited

Loading

Support of vector_sum in `DPEngine.aggregate`

Expose `vector_sum` computation to high-level Beam and Spark APIs.

rialg commented May 12, 2022 •

edited

Loading