Add dataset/sampler benchmarking script #115

adamjstewart · 2021-09-08T21:48:06Z

This PR adds a script to benchmark our GeoDataset/GeoSampler implementations. I would like to commit this script to the repo for three reasons:

Reproducibility: if someone wants to reproduce the results in our paper, it should be easy
Regression: we should run these benchmarks again before each release to make sure sampling hasn't gotten slower
Refinement: if someone submits a PR to improve performance, we should have a well-defined metric for comparison

Now I just need to download some data and run it. Before I run it, we should make sure these default sizes/CRS/res/etc. look reasonable.

Closes #81

calebrob6 · 2021-09-12T00:51:22Z

Some fun stuff I found while trying to get this to work:

The CDL dataset will think /datadrive/cdl is corrupt -- Dataset downloading expected behavior pt. 2 #99
This line is mega broken https://github.com/microsoft/torchgeo/blob/paper/benchmarks/torchgeo/datasets/geo.py#L234
- Specifically, this adds each tiff to the index occupying a single point in time. Now, the index will only return hits if a query has exactly the same timestamp for mint and maxt. The bounds of the index will usually look valid, but you'll never get results back (e.g. this is what happens in benchmark.py).
- A simple workaround if you just want to benchmark stuff is to make maxt = mint + 1 year
I don't think the RandomGeoSampler is going to work (unless you have a layer of Landsat scenes covering the extent of CDL)
The Landsat8 dataset will fail unintuitively if all the bands aren't downloaded (and bands is set to default)

The ZipDataset looks to be working after the hack though:

calebrob6 · 2021-09-12T05:04:24Z

~~Changed~~ Broke a few things and the script works now

benchmark.py notes:

I added a way to specify the total number of samples you want to load OR the total number of batches you want to load mainly for convenience.
Note: previously, len(samples) was used in the dataloader to count the number of patches, but this returns the number of keys in the dict, not the batch size.
We need to add a way to save results to file (appending to a CSV with column headers for all the args is sufficient, no need to over-engineer)
I was wrong about the RandomGeoSampler -- it works totally fine!

adamjstewart · 2021-09-15T20:35:26Z

Undid some of the hacks and opened up separate PRs to fix them (#134, #138). Once those are merged I'll rebase this PR and start doing some benchmarking.

adamjstewart · 2021-09-16T19:57:02Z

Most recent commit fixes a couple of bugs I found:

We were running the GridGeoSampler for N+1 batches instead of N batches
We were using the same number of total patches even if the GridGeoSampler finished early

Results are still mostly the same, GridGeoSampler is significantly faster and I'm not sure why.

adamjstewart · 2021-09-16T20:07:07Z

Found another bug, we're reusing the same dataset for all samplers. For the first sampler, the cache is empty, but for later samplers it is not. So this results in an unfair comparison between samplers.

Which of the following options should we use?

Create a new dataset for each sampler
Run some random sampler first before testing all 3 samplers

Option 1 represents the average sampling rate when starting from scratch with an empty cache. Option 2 represents a better average after the cache has already been populated. I'm leaning towards 2 since the cache will be full for the vast majority of the training time when running through multiple epochs.

adamjstewart · 2021-09-16T20:15:52Z

We can also manually clear the cache between samplers: https://www.geeksforgeeks.org/clear-lru-cache-in-python/

adamjstewart · 2021-09-19T19:03:43Z

benchmark.py

+    for _ in range(num_batches):
+        num_total_patches += args.batch_size
+        x = torch.rand(args.batch_size, len(bands), args.patch_size, args.patch_size)
+        # y = torch.randint(0, 256, (args.batch_size, args.patch_size, args.patch_size))


This line is left over from when I was thinking about benchmarking a segmentation model instead of ResNet. I think segmentation is one of the more common tasks in remote sensing, and the models are more complex and therefore slower. If we want to have a slower model for comparison with our data loading rates, it might be good to use something like Mask R-CNN instead.

.gitignore

benchmark.py

* Add dataset/sampler benchmarking script * Some changes to get the benchmark script working * Added writing results to file * Added script for running a grid of benchmark experiments * Actual experiment configuration * Improve help message formatting * Remove default for mutually exclusive required group * Rounding, units, ignore output file * Fix a couple bugs in counting patches * Display cache info * Increase cache size * Cache is shared * Benchmark model as well * Warp to same CRS/res as CDL * Work around bug in sampler * Fix isort * Allow specification of CPU/GPU device Co-authored-by: Caleb Robinson <[email protected]>

adamjstewart added datasets Geospatial or benchmark datasets samplers Samplers for indexing datasets labels Sep 8, 2021

adamjstewart requested a review from calebrob6 September 8, 2021 21:48

adamjstewart mentioned this pull request Sep 13, 2021

Proposal: better handling of partial timestamps #134

Merged

adamjstewart force-pushed the paper/benchmarks branch from bc1e7ea to 6b9aede Compare September 16, 2021 16:26

adamjstewart and others added 13 commits September 18, 2021 12:15

Add dataset/sampler benchmarking script

1b6710d

Some changes to get the benchmark script working

0e02445

Added writing results to file

4cdc8ff

Added script for running a grid of benchmark experiments

e2085eb

Actual experiment configuration

6342119

Improve help message formatting

7669ad0

Remove default for mutually exclusive required group

4e25b30

Rounding, units, ignore output file

f529b90

Fix a couple bugs in counting patches

1c19188

Display cache info

da2172c

Increase cache size

0dafaaa

Cache is shared

17a25bd

Benchmark model as well

dd1c8a2

adamjstewart force-pushed the paper/benchmarks branch from 40a3b3e to dd1c8a2 Compare September 18, 2021 17:16

adamjstewart added 3 commits September 19, 2021 04:03

Warp to same CRS/res as CDL

d795c0a

Work around bug in sampler

aa99d47

Fix isort

da1656e

adamjstewart marked this pull request as ready for review September 19, 2021 19:01

adamjstewart commented Sep 19, 2021

View reviewed changes

calebrob6 reviewed Sep 21, 2021

View reviewed changes

.gitignore Show resolved Hide resolved

benchmark.py Outdated Show resolved Hide resolved

benchmark.py Show resolved Hide resolved

Allow specification of CPU/GPU device

5d1a9cc

calebrob6 approved these changes Sep 22, 2021

View reviewed changes

adamjstewart merged commit 04c754f into main Sep 22, 2021

adamjstewart deleted the paper/benchmarks branch September 22, 2021 14:47

adamjstewart added this to the 0.1.0 milestone Nov 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add dataset/sampler benchmarking script #115

Add dataset/sampler benchmarking script #115

adamjstewart commented Sep 8, 2021

calebrob6 commented Sep 12, 2021 •

edited

Loading

calebrob6 commented Sep 12, 2021 •

edited

Loading

adamjstewart commented Sep 15, 2021

adamjstewart commented Sep 16, 2021 •

edited

Loading

adamjstewart commented Sep 16, 2021

adamjstewart commented Sep 16, 2021

adamjstewart Sep 19, 2021

Add dataset/sampler benchmarking script #115

Add dataset/sampler benchmarking script #115

Conversation

adamjstewart commented Sep 8, 2021

calebrob6 commented Sep 12, 2021 • edited Loading

calebrob6 commented Sep 12, 2021 • edited Loading

adamjstewart commented Sep 15, 2021

adamjstewart commented Sep 16, 2021 • edited Loading

adamjstewart commented Sep 16, 2021

adamjstewart commented Sep 16, 2021

adamjstewart Sep 19, 2021

Choose a reason for hiding this comment

calebrob6 commented Sep 12, 2021 •

edited

Loading

calebrob6 commented Sep 12, 2021 •

edited

Loading

adamjstewart commented Sep 16, 2021 •

edited

Loading