📉 Optimize GRPO memory usage by redefining per_device_batch_size
as generations per device
#6443
Job | Run time |
---|---|
2m 3s | |
2m 3s |
per_device_batch_size
as generations per device
#6443
Job | Run time |
---|---|
2m 3s | |
2m 3s |