You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, awesome project. Learn a lot from this project, thanks your great work!
but the model partition policy confused me. From this blog I read, the ZeRO1 policy only partitions the model optimizer state (self.m and self.v in code), but the code shown below splits the model parameters. Would you mind explaining why?
@calico-niko Not sure if it's useful to you, but here's what I understood:
The optimizer state consists of a copy of the parameters and momentum parameters:
Along with the parameter sharding you mentioned, sharding momentums is done here (because of _local_params thus getting a local final current_offset):
Hi, awesome project. Learn a lot from this project, thanks your great work!
but the model partition policy confused me. From this blog I read, the ZeRO1 policy only partitions the model optimizer state (
self.m
andself.v
in code), but the code shown below splits the model parameters. Would you mind explaining why?min-fsdp/journey/understanding_zero/4_zero1.py
Lines 70 to 76 in 750b4f4
The text was updated successfully, but these errors were encountered: