Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dataset/sampler benchmarking script #115

Merged
merged 17 commits into from
Sep 22, 2021
Merged

Add dataset/sampler benchmarking script #115

merged 17 commits into from
Sep 22, 2021

Conversation

adamjstewart
Copy link
Collaborator

This PR adds a script to benchmark our GeoDataset/GeoSampler implementations. I would like to commit this script to the repo for three reasons:

  1. Reproducibility: if someone wants to reproduce the results in our paper, it should be easy
  2. Regression: we should run these benchmarks again before each release to make sure sampling hasn't gotten slower
  3. Refinement: if someone submits a PR to improve performance, we should have a well-defined metric for comparison

Now I just need to download some data and run it. Before I run it, we should make sure these default sizes/CRS/res/etc. look reasonable.

Closes #81

@adamjstewart adamjstewart added datasets Geospatial or benchmark datasets samplers Samplers for indexing datasets labels Sep 8, 2021
@calebrob6
Copy link
Member

calebrob6 commented Sep 12, 2021

Some fun stuff I found while trying to get this to work:

  • The CDL dataset will think /datadrive/cdl is corrupt -- Dataset downloading expected behavior pt. 2 #99
  • This line is mega broken https://github.com/microsoft/torchgeo/blob/paper/benchmarks/torchgeo/datasets/geo.py#L234
    • Specifically, this adds each tiff to the index occupying a single point in time. Now, the index will only return hits if a query has exactly the same timestamp for mint and maxt. The bounds of the index will usually look valid, but you'll never get results back (e.g. this is what happens in benchmark.py).
    • A simple workaround if you just want to benchmark stuff is to make maxt = mint + 1 year
  • I don't think the RandomGeoSampler is going to work (unless you have a layer of Landsat scenes covering the extent of CDL)
  • The Landsat8 dataset will fail unintuitively if all the bands aren't downloaded (and bands is set to default)

The ZipDataset looks to be working after the hack though:

image

@calebrob6
Copy link
Member

calebrob6 commented Sep 12, 2021

Changed Broke a few things and the script works now

benchmark.py notes:

  • I added a way to specify the total number of samples you want to load OR the total number of batches you want to load mainly for convenience.
  • Note: previously, len(samples) was used in the dataloader to count the number of patches, but this returns the number of keys in the dict, not the batch size.
  • We need to add a way to save results to file (appending to a CSV with column headers for all the args is sufficient, no need to over-engineer)
  • I was wrong about the RandomGeoSampler -- it works totally fine!

@adamjstewart
Copy link
Collaborator Author

Undid some of the hacks and opened up separate PRs to fix them (#134, #138). Once those are merged I'll rebase this PR and start doing some benchmarking.

@adamjstewart
Copy link
Collaborator Author

adamjstewart commented Sep 16, 2021

Most recent commit fixes a couple of bugs I found:

  • We were running the GridGeoSampler for N+1 batches instead of N batches
  • We were using the same number of total patches even if the GridGeoSampler finished early

Results are still mostly the same, GridGeoSampler is significantly faster and I'm not sure why.

@adamjstewart
Copy link
Collaborator Author

Found another bug, we're reusing the same dataset for all samplers. For the first sampler, the cache is empty, but for later samplers it is not. So this results in an unfair comparison between samplers.

Which of the following options should we use?

  1. Create a new dataset for each sampler
  2. Run some random sampler first before testing all 3 samplers

Option 1 represents the average sampling rate when starting from scratch with an empty cache. Option 2 represents a better average after the cache has already been populated. I'm leaning towards 2 since the cache will be full for the vast majority of the training time when running through multiple epochs.

@adamjstewart
Copy link
Collaborator Author

We can also manually clear the cache between samplers: https://www.geeksforgeeks.org/clear-lru-cache-in-python/

@adamjstewart adamjstewart marked this pull request as ready for review September 19, 2021 19:01
for _ in range(num_batches):
num_total_patches += args.batch_size
x = torch.rand(args.batch_size, len(bands), args.patch_size, args.patch_size)
# y = torch.randint(0, 256, (args.batch_size, args.patch_size, args.patch_size))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is left over from when I was thinking about benchmarking a segmentation model instead of ResNet. I think segmentation is one of the more common tasks in remote sensing, and the models are more complex and therefore slower. If we want to have a slower model for comparison with our data loading rates, it might be good to use something like Mask R-CNN instead.

.gitignore Show resolved Hide resolved
benchmark.py Outdated Show resolved Hide resolved
benchmark.py Show resolved Hide resolved
@adamjstewart adamjstewart merged commit 04c754f into main Sep 22, 2021
@adamjstewart adamjstewart deleted the paper/benchmarks branch September 22, 2021 14:47
@adamjstewart adamjstewart added this to the 0.1.0 milestone Nov 20, 2021
yichiac pushed a commit to yichiac/torchgeo that referenced this pull request Apr 29, 2023
* Add dataset/sampler benchmarking script

* Some changes to get the benchmark script working

* Added writing results to file

* Added script for running a grid of benchmark experiments

* Actual experiment configuration

* Improve help message formatting

* Remove default for mutually exclusive required group

* Rounding, units, ignore output file

* Fix a couple bugs in counting patches

* Display cache info

* Increase cache size

* Cache is shared

* Benchmark model as well

* Warp to same CRS/res as CDL

* Work around bug in sampler

* Fix isort

* Allow specification of CPU/GPU device

Co-authored-by: Caleb Robinson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasets Geospatial or benchmark datasets samplers Samplers for indexing datasets
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Benchmarking of GeoDataset for a paper result
2 participants