Reproduce the results in the paper #8

jiahuigeng · 2022-10-17T08:16:36Z

Hi, I tried to reproduce the experiment results in the paper. I am using the following commands. But the logs seem not correct. Could you share the command line you are using in the paper? I am really interested in your work and willing to explore more about sketch techniques.

python cv_train.py --dataset_name CIFAR10 --iid --num_workers 2 --lr_scale 0.4 --local_momentum=0.0 --num_devices 2 --num_devices=2 --num_clients 2

MY PID: 31280
5315 port in use, trying next...
Namespace(checkpoint_path='./checkpoint', dataset_dir='./dataset', dataset_name='CIFAR10', device='cuda', do_batchnorm=False, do_checkpoint=False, do_dp=False, do_finetune=False, do_iid=True, do_test=False, do_topk_down=False, dp_mode='worker', error_type='none', eval_before_start=False, fedavg_batch_size=-1, fedavg_lr_decay=1, finetune_path='./finetune', finetuned_from=None, k=50000, l2_norm_clip=1.0, lm_coef=1.0, local_batch_size=8, local_momentum=0.0, lr_scale=0.4, max_grad_norm=None, max_history=2, mc_coef=1.0, microbatch_size=-1, mode='sketch', model='ResNet9', model_checkpoint='gpt2', nan_threshold=999, noise_multiplier=0.0, num_blocks=20, num_candidates=2, num_clients=2, num_cols=500000, num_devices=2, num_epochs=24, num_fedavg_epochs=1, num_results_train=2, num_results_val=2, num_rows=5, num_workers=2, personality_permutations=1, pivot_epoch=5, port=5646, seed=21, share_ps_gpu=False, train_dataloader_workers=0, use_tensorboard=False, val_dataloader_workers=0, valid_batch_size=8, virtual_momentum=0, weight_decay=0.0005)
50000 625
Using BatchNorm: False
Finished initializing in 11.00 seconds
miniconda3/lib/python3.8/site-packages/torch/optim/lr_scheduler.py:131: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
warnings.warn("Detected call of lr_scheduler.step() before optimizer.step(). "
CommEfficient/CommEfficient/utils.py:258: UserWarning: This overload of add_ is deprecated:
add_(Number alpha, Tensor other)
Consider using one of the following signatures instead:
add_(Tensor other, *, Number alpha) (Triggered internally at ../torch/csrc/utils/python_arg_parser.cpp:1055.)
grad_vec.add_(args.weight_decay / args.num_workers, weights)
epoch lr train_time train_loss train_acc test_loss test_acc down (MiB) up (MiB) total_time
1 0.0800 655.4752 2.3025 0.1009 2.3025 0.1014 0 59606 679.6477
2 0.1600 649.9156 2.3025 0.1008 2.3025 0.1014 0 59606 1343.1710
3 0.2400 649.3290 2.3025 0.1011 2.3025 0.1014 0 59606 2006.0574

The text was updated successfully, but these errors were encountered:

kiddyboots216 · 2022-12-15T19:53:47Z

Hello. Could you try using the settings that we use in the paper? So don't add the --iid flag and use the number of workers and number of clients that we use instead of 2. When you use 2 clients and 2 workers this means that you are splitting the entire CIFAR10 dataset into 2 chunks, and then doing training with the entire dataset at each epoch. For this setting, that is near identical to full-batch training, you may need to follow the optimization guidelines in something like the LAMB optimizer.

Antonio-demo · 2023-02-23T14:21:11Z

Hello, I'm trying to reproduce your experimental results through the code provided by this paper, but I cannot correctly run your paper's code. So I want to know how to correctly run this code?

kiddyboots216 · 2023-02-25T22:27:15Z

Hi @Antonio-demo I think you can create a new issue and provide some more details, e.g. the command that you are running.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproduce the results in the paper #8

Reproduce the results in the paper #8

jiahuigeng commented Oct 17, 2022

kiddyboots216 commented Dec 15, 2022

Antonio-demo commented Feb 23, 2023

kiddyboots216 commented Feb 25, 2023

Reproduce the results in the paper #8

Reproduce the results in the paper #8

Comments

jiahuigeng commented Oct 17, 2022

kiddyboots216 commented Dec 15, 2022

Antonio-demo commented Feb 23, 2023

kiddyboots216 commented Feb 25, 2023