Code for "Machine Learning with Differentially Private Labels: Mechanisms and Frameworks" in PoPETs 2022.
├── LabelDP
| ├── SCAN #unsupervised learning based
| | ├── simiclr.py
| | ├── scan.py
| | ├── selflabel.py
| | ├── eval.py
| | ├── agm.py #analytical calibration gaussian mechanism
| | └── NoiseCluster.py #add gaussian noise to clusters
| ├── FixMatch #semi-supervised learning based
| | └── train.py
| ├── AugDescent #learning with noise label based
| | ├── train_cifar.py
| | └── train_cinic.py
| ├── PATEFM #pipeline 3
| | ├── accoutant.py
| | ├── train_teacher.py
| | ├── teacher_vote.py
| | ├── teacher_vote_add_noise.py
| | ├── train_student.py
| | └── eval_student.py
| ├── mypath.py #specify ckpt_path and data_path in this file.
| ├── generate_noise.py #generate randres noise
| ├── requirement.txt #required package
| ├── cinic10_conver_img2npy.py #convert cinic10 imgs to npy.
| └── EvalOnly
| ├── SCAN
| ├── FixMatch
| ├── AugDescent
| ├── mypath.py
| └── evaldplabel.py
├── ckpt_path
| ├── cifar10
| ├── cifar100
| └── cinic10
└── data_path
├── cifar10
| ├── cifar-10-batches-py
| └── dplabel
| ├── pate
| └── rr
├── cifar100
| ├── cifar-100-python
| └── dplabel
| ├── pate
| └── rr
└── cinic10
├── train
├── test
├── valid
├── npy
└── dplabel
├── pate
└── rr
Speicify your ckpt_path
and data_path
in LabelDP/mypath.py
We leverage advancements in machine learning including unsupervised learning (SCAN source repo) and semi-supervised learning (FixMatch source repo), learning with noisy labels (Aug-Descent source repo) to improve utility for machine learning models under label differential privacy. Specifically we propose NoiseCluster and DenoiseSSL to improve the utility.
-
- CIFAR10/CIFAR100 will be automatically downloaded when generated differentially private labels by Randomized Response. Other scripts will assume CIFAR10/CIFAR100 to be in the expected folder (see Files)without specifying
download=True
.
- CIFAR10/CIFAR100 will be automatically downloaded when generated differentially private labels by Randomized Response. Other scripts will assume CIFAR10/CIFAR100 to be in the expected folder (see Files)without specifying
-
- download and save original dataset split
train/valid/test
todata_path/cinic10
, then runpython cinic10_convert_img2npy.py
will save npy format of cinic10 dataset indata_path/cinic10/npy
- download and save original dataset split
-
You can also directly prepare the datasets by downloading the file we provide (Google drive link) and untar it as
data_path
.
- RandRes
python generate_noise.py --dataset cifar10
python generate_noise.py --dataset cifar100
python generate_noise.py --dataset cinic10
- PATE-FM
Please see PATEFM
- Generated RandRes and PATE label.
As mentioned above, you can also directly prepare the datasets including label files (dplabel folder under each dataset) by downloading the file we provide (Google drive link) and untar it as data_path
.
The code is tested with Python 3.8.5 and PyTorch 1.11.0. The complete list of required packages are available in requirement.txt
, and can be installed with pip install -r requirement.txt
.
- See
SCAN
/FixMatch
/AugDescent
for further instructions.
- We also provide evaluation of results in our paper (including generated labels and checkpoints). See
EvalOnly
for further instructions.