Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tutorial: computer vision #216

Open
1 of 9 tasks
justheuristic opened this issue Apr 9, 2021 · 0 comments
Open
1 of 9 tasks

Tutorial: computer vision #216

justheuristic opened this issue Apr 9, 2021 · 0 comments
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@justheuristic
Copy link
Member

justheuristic commented Apr 9, 2021

Let's add a tutorial for training VIT/ResNet50 with Decentralized SGD

The intent is to use DecentralizedSGD optimizer with vissl library for swav.

Here's a basic tutorial for training simclr in vissl: https://colab.research.google.com/drive/1Rt3Plt3ph84i1A-eolLFafybwjrBFxYe?usp=sharing .

The engineering is up to you, but it appears that the two hardest tasks will be to

The main request for DecentralizedSGD is to implement a training regime where it would be able to run averaging all the time with latest parameters. The issue with the current implementation is that DecentralizedSGD will spend up to half of the time looking for groups and when it will end up averaging model parameters, these parameters will be an older snapshot since before the averager began looking for group. Finally, when the averager actually updates model parameters, these updates will disregard any local changes to model parameters made during averaging.

Here's a few ideas on how to improve DecentralizedSGD:

  • after an evaraging round, instead of overwriting, it would be better to compute weight = weight + averaged_weight - weight_before_averaging_step. This will prevent DecentralizedSGD from disregarding local updates concurrent with averager.step.
  • in DecentralizedAverager, we can implement a callback that allows the user to update the model parameters right before the beginning of AllReduce (i.e. after the group is formed). This should significantly reduce the staleness of averaged parameters.
  • in DecentralizedSGD, we can modify the code for calling averager step to allow for concurrent matchmaking and allreduce. In other words, once the averager has found one group, let him immediately look for the next group while still running allreduce.

Implementing this into an example will require the following steps:

  • create a root folder, e.g. ./hivemind/examples/swav, containing...
  • modified training runner that uses DecentralizedSGD
  • basic README similar to this or this
    • describe what it does (and how)
    • list additional requirements
    • full how-to-run guide
@justheuristic justheuristic added enhancement New feature or request help wanted Extra attention is needed labels Apr 9, 2021
@justheuristic justheuristic changed the title Training SWAV with Decentralized SGD Tutorial: training SWAV with Decentralized SGD Apr 10, 2021
@justheuristic justheuristic changed the title Tutorial: training SWAV with Decentralized SGD Tutorial: computer vision Feb 24, 2022
@justheuristic justheuristic linked a pull request Jun 1, 2022 that will close this issue
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants