-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow for >1 batch size in Splatfacto #3582
base: main
Are you sure you want to change the base?
Conversation
Hey Alex! this is super cool; especially in MCMC which doesn't require gradient thresholds at all. #3216 might have mildly broken parts of this PR since it merged in parallel dataloading, but it shouldn't be too bad; let us know if you want any help fixing conflicts! |
@akristoffersen I think you might want to modify |
2d95d1e
to
cbb5ceb
Compare
Works with masks now As expected, I noticed a almost 2x increase in rays/s with a batch size of two, and a very slight performance drop with a batch size of 1 compared to baseline (50.1 M rays/sec -> 48 M rays/sec) |
@hardikdava do you mean that the tuning might be different for the thresholds? Yeah, I don't know exactly what to do there. maybe someone else has an opinion? Some quick stats on the poster dataset. ![]() so the splitting / densification outcomes are affected by batch size. ![]() Similarly, train rays/sec do start higher due to the larger batch size, but go down as you'd expect with the higher number of gaussians. ![]() Some good news, with a higher batch I do see the training loss hitting better values quicker as the batch size increases. |
@akristoffersen currently, densification, splitting and culling are implemented inside strategy and logic is based on In simple words, suppose the batch size is 2, opacity reset needs to be applied at every 3000th step. So it should happen at every 1500th steps according to batch size. But according to your current implementation it will be applied at every 3000th steps but actually it will be 6000th step (batch size * step). |
@hardikdava I think dividing those parameters by the batch size assumes that every image produces gradients for a unique set of gaussians. If there's any overlap, then the gaussians seen by 2 images would just be getting a single gradient descent update applied to them (albeit of a possibly better quality because of the signal from both images), while if it was a single image batch those gaussians would have gotten 2 gradient descent updates applied to them. I think that dividing those params by the batch size could still be a good approximation, I'll try it and see how the losses look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested these changes on 2 of my datasets with the following commands
- ns-train
- ns-render
- ns-eval
- ns-export
These all worked well! @jeffreyhuparallel do you have any comments about the hyperparameter strategy stuff?
WIP, preliminary testing makes it look like it's working but I would want to make sure.