Skip to content

Apply missing lr_mult and wd_mult to the lr and weight_decay of megatron param groups. #1960

Apply missing lr_mult and wd_mult to the lr and weight_decay of megatron param groups.

Apply missing lr_mult and wd_mult to the lr and weight_decay of megatron param groups. #1960