Private learning of a click prediction model inside a trusted server

For an overview of differentially private (DP) learning methods, we refer to this PATCG presentation.

Criteo's ML challenge has already shown that learning on aggregated data can be performed with good results applying global DP constraints. Performance mostly relies on those 2 assets:

a small un-obfuscated dataset of display-level events;
aggregated reports encompassing label proportions (i.e., average label information) associated to a fixed set of user features.

However, it seems still unclear in which form a small granular display-level training data set that shares the same user distribution can persist in a future world without third-party cookies. The best results obtained without a small display-level dataset are still significantly below the results of a logistic trained on granular data (see table here).

In this repository, global DP learning on display-level data inside a trusted server is explored. Instead of applying local DP noise on raw user data, this method uses the full granular display-level dataset to directly learn the model and publish the DP noised model by using the DP-SGD method.

An overview on how this method could be embedded inside a trusted server with TEE technology can be found here.

DP parameter epsilon is computed based the DP accountant approach, publicly available in Google's differential privacy library.

We use the full display-level data set published in Criteo's Privacy Preserving ML Competition(90 mio lines, 2.5 GB).

Install

Create venv with python3.9

python3.9 -m venv venv
. venv/bin/activate
pip install -e .

install dp_accounting https://github.com/google/differential-privacy/tree/main/python/dp_accounting

Initialize data

get Criteo ML challenge data

./download_dataset.sh data

init pandas structures

python init_dataset.py --datapath data

Sample Run

python sample_run.py --datapath data

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
dp_ad_click_prediction		dp_ad_click_prediction
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
download_dataset.sh		download_dataset.sh
init_dataset.py		init_dataset.py
requirements.txt		requirements.txt
sample_run.py		sample_run.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Private learning of a click prediction model inside a trusted server

Install

Initialize data

Sample Run

About

Releases

Packages

Contributors 2

Languages

License

criteo-research/dp-sgd-ad-click-prediction

Folders and files

Latest commit

History

Repository files navigation

Private learning of a click prediction model inside a trusted server

Install

Initialize data

Sample Run

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages