Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to RandomBatchGeoSampler #380

Closed
tritolol opened this issue Jan 31, 2022 · 2 comments · Fixed by #755
Closed

Improvements to RandomBatchGeoSampler #380

tritolol opened this issue Jan 31, 2022 · 2 comments · Fixed by #755
Assignees
Labels
samplers Samplers for indexing datasets
Milestone

Comments

@tritolol
Copy link
Contributor

tritolol commented Jan 31, 2022

I propose two improvements to RandomBatchGeoSampler:

  1. The size parameter should be defined in pixels, not in CRS units. The output of a DataLoader using this sampler will be fed directly into a CNN or similar in most cases, which expect a certain defined input size in pixels.
    To accomplish this, the user has to perform the conversion of CRS to pixels manually like this
tile_size_pix = 256
sampler_size = tile_size_pix * input_ds.res
sampler = RandomBatchGeoSampler(input_ds, size=sampler_size, ...)

which is inconvenient.

  1. The length parameter should have a meaningful default value. Currently, it is described as "length (int) – number of samples per epoch". Since an epoch is defined as processing an entire dataset exactly once, a meaningful default value could be calculated by using the dataset area and the sampler tile area
sampler_len = int(get_bbox_area(input_ds.bounds)/(sampler_size*sampler_size))
sampler = RandomBatchGeoSampler(input_ds, size=sampler_size, batch_size=8, length=sampler_len)

with

# This function would be nice to have in the BoundingBox class
def get_bbox_area(box):
    return (box.maxx - box.minx) * (box.maxy - box.miny)

Let me know if this makes sense.

@calebrob6
Copy link
Member

Thanks @tritolol! Both changes make sense to me.

@RitwikGupta has something similar to your first change here #294.

@adamjstewart is adding a volume attribute to BoundingBox, adding area in the same PR would make sense to me.

@adamjstewart adamjstewart added the samplers Samplers for indexing datasets label Jan 31, 2022
@adamjstewart adamjstewart added this to the 0.3.0 milestone Mar 16, 2022
@adamjstewart adamjstewart modified the milestones: 0.3.0, 0.3.1 Jul 9, 2022
@TCherici
Copy link
Contributor

With regards to point 1. by @tritolol, the actual output size of the sample mask is not guaranteed to be the actual number of pixels set by tile_size_pix (see #674). I am however assuming that this is a bug, and that it will be fixed soon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
samplers Samplers for indexing datasets
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants