speedup mask sampling, properly save masks in images pipeline #2346

kerrj · 2023-08-15T19:25:27Z

Masks were not saved properly in the images processing pipeline; now they are

kerrj · 2023-08-15T20:44:25Z

Added a change for speeding up mask sampling by caching the mask non-zero indices, it speeds up sampling a few orders of magnitude (before mask sampling was pretty much unusable)

tancik

LGTM

brentyi · 2023-08-16T06:40:56Z

nerfstudio/data/pixel_samplers.py

-            chosen_indices = random.sample(range(len(nonzero_indices)), k=batch_size)
-            indices = nonzero_indices[chosen_indices]
+            if not hasattr(self, "nonzero_indices"):
+                self.nonzero_indices = torch.nonzero(mask[..., 0], as_tuple=False)


this seems a bit weird if we call this function twice but with different mask args, the second will be ignored

maybe we can refactor the method to take nonzero_indices as an argument instead?

torch.nonzero may require a lot of memory(

this seems a bit weird if we call this function twice but with different mask args, the second will be ignored

maybe we can refactor the method to take nonzero_indices as an argument instead?

@kerrj does this implementation assume that each batch has the same set of masks? I think it'd be nice / more correct if it handles the case where there are different masks for different images.

Also, it looks like the goal is to avoid redundant calls to nonzero. If each image has a different mask, would it make sense to use an img_to_nonzero_indices dict or something?

@brentyi agreed, I wanted to keep the interface the same though since providing nonzero_indices kicks the can up to the callee, which would run into the same problem of changing masks.

@Ilyabasharov It requires the same amount of memory, since the indices are stored on CPU RAM. I tested with/without and it's the same, intuitively because to instantiate the nonzero_indices array each step you have to allocate the memory anyway, then immediately deallocate it which in practice is essentially the same as keeping it allocated since it happens every single step.

@kevin-thankyou-lin the mask parameter is shape NxHxW, so it includes all the images in the dataset. If the masks are different for each image it will take that into account.

@kerrj thanks for explanation! I've tested torch.nonzero on gpu with large images and have faced with OOM :( but if we use CPU it will be much better

@brentyi agreed, I wanted to keep the interface the same though since providing nonzero_indices kicks the can up to the callee, which would run into the same problem of changing masks.

Agree that the cache hit / miss logic still needs to be solved, but kicking the can up seems nice. As a heuristic it seems ideal to avoid statefulness in lower-level primitives like pixel sampling, it seems like a risk for memory leaks, etc.

@kevin-thankyou-lin's point, the masks will only be of shape N,H,W is the entire dataset is cached, right? If someone uses the dataset without caching it all, then this will cause issues I think. Is that true @kerrj?

@ethanweber if that's true I think the current behavior is also bugged, since it will always ever sample pixels from the provided masks

anc2001 · 2023-11-06T01:56:42Z

Hi, not sure what the status of this PR is, but wanted to suggest a speed up to mask sampling that has worked for me: #2585

kerrj added 2 commits August 15, 2023 12:17

properly save masks in the images processing pipeline

cecad3c

speed up mask sampling a lot

1b17372

kerrj changed the title ~~properly save masks in the images processing pipeline~~ speedup mask sampling, properly save masks in images pipeline Aug 15, 2023

kerrj requested a review from tancik August 15, 2023 20:44

tancik approved these changes Aug 15, 2023

View reviewed changes

brentyi reviewed Aug 16, 2023

View reviewed changes

Merge branch 'main' into justin/fix_crop

81e6724

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

speedup mask sampling, properly save masks in images pipeline #2346

speedup mask sampling, properly save masks in images pipeline #2346

kerrj commented Aug 15, 2023

kerrj commented Aug 15, 2023

tancik left a comment

brentyi Aug 16, 2023 •

edited

Loading

ilbash Aug 16, 2023

kevin-thankyou-lin Aug 20, 2023

kerrj Aug 20, 2023

ilbash Aug 21, 2023

brentyi Aug 21, 2023

ethanweber Aug 22, 2023

kerrj Aug 23, 2023

anc2001 commented Nov 6, 2023 •

edited

Loading

speedup mask sampling, properly save masks in images pipeline #2346

Are you sure you want to change the base?

speedup mask sampling, properly save masks in images pipeline #2346

Conversation

kerrj commented Aug 15, 2023

kerrj commented Aug 15, 2023

tancik left a comment

Choose a reason for hiding this comment

brentyi Aug 16, 2023 • edited Loading

Choose a reason for hiding this comment

ilbash Aug 16, 2023

Choose a reason for hiding this comment

kevin-thankyou-lin Aug 20, 2023

Choose a reason for hiding this comment

kerrj Aug 20, 2023

Choose a reason for hiding this comment

ilbash Aug 21, 2023

Choose a reason for hiding this comment

brentyi Aug 21, 2023

Choose a reason for hiding this comment

ethanweber Aug 22, 2023

Choose a reason for hiding this comment

kerrj Aug 23, 2023

Choose a reason for hiding this comment

anc2001 commented Nov 6, 2023 • edited Loading

brentyi Aug 16, 2023 •

edited

Loading

anc2001 commented Nov 6, 2023 •

edited

Loading