Custom Sampler - Unknown Exception #594
Replies: 10 comments 2 replies
-
if you just load the entire GeoTIFF into memory do you see this error?
```
with
rasterio.open("MODIS/2015/20150701-ESACCI-L3S_FIRE-BA-MODIS-AREA_3-fv5.1-JD.tif")
as f:
data = f.read()
```
…On Wed, Mar 2, 2022 at 9:36 AM Hamish Campbell ***@***.***> wrote:
Hi! I am currently trying to use a custom sampler that I have written
using the torchGeo framework, however, I keep getting an exception that I
don't understand.
Using the RandomBatchGeoSampler source code as is: everything works fine.
However, I find that simply adding the 2 new lines shown below gives me an
exception. I believe that this may be something to do with the threading
processes occurring (which I have very limited knowledge of).
More than happy to give more info on why I need this operation but any
help in solving it would be really appreciated!
# NEW LINE 1 (no problem by itself)
self.dataset = dataset
def __iter__(self) -> Iterator[List[BoundingBox]]:
"""Return the indices of a dataset.
Returns:
batch of (minx, maxx, miny, maxy, mint, maxt) coordinates to index a dataset
"""
for _ in range(len(self)):
# Choose a random tile
hit = random.choice(self.hits)
bounds = BoundingBox(*hit.bounds)
# Choose random indices within that tile
batch = []
for _ in range(self.batch_size):
bounding_box = get_random_bounding_box(bounds, self.size, self.res)
# NEW LINE 2
test = self.dataset[bounding_box]['mask']
batch.append(bounding_box)
yield batch
Error message:
RasterioIOError: Caught RasterioIOError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "rasterio/_io.pyx", line 701, in rasterio._io.DatasetReaderBase._read
File "rasterio/shim_rasterioex.pxi", line 162, in
rasterio._shim.io_multi_band
File "rasterio/_err.pyx", line 193, in rasterio._err.exc_wrap_int
rasterio._err.CPLE_AppDefinedError:
MODIS/2015/20150701-ESACCI-L3S_FIRE-BA-MODIS-AREA_3-fv5.1-JD.tif, band 1:
IReadBlock failed at X offset 95, Y offset 55: TIFFReadEncodedTile() failed.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File
"/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/worker.py",
line 287, in _worker_loop
data = fetcher.fetch(index)
File
"/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py",
line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File
"/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py",
line 49, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.7/dist-packages/torchgeo/datasets/geo.py",
line 910, in *getitem*
samples = [ds[query] for ds in self.datasets]
File "/usr/local/lib/python3.7/dist-packages/torchgeo/datasets/geo.py",
line 910, in
samples = [ds[query] for ds in self.datasets]
File "/usr/local/lib/python3.7/dist-packages/torchgeo/datasets/geo.py",
line 429, in *getitem*
data = self._merge_files(filepaths, query)
File "/usr/local/lib/python3.7/dist-packages/torchgeo/datasets/geo.py",
line 464, in _merge_files
dest, _ = rasterio.merge.merge(vrt_fhs, bounds, self.res)
File "/usr/local/lib/python3.7/dist-packages/rasterio/merge.py", line 333,
in merge
resampling=resampling,
File "rasterio/_io.pyx", line 367, in rasterio._io.DatasetReaderBase.read
File "rasterio/_io.pyx", line 704, in rasterio._io.DatasetReaderBase._read
rasterio.errors.RasterioIOError: Read or write failed.
MODIS/2015/20150701-ESACCI-L3S_FIRE-BA-MODIS-AREA_3-fv5.1-JD.tif, band 1:
IReadBlock failed at X offset 95, Y offset 55: TIFFReadEncodedTile() failed.
—
Reply to this email directly, view it on GitHub
<#449>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAIJUTSV4WDIJTQ5TET5CY3U56RIZANCNFSM5PYEFYMQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
No problems doing that |
Beta Was this translation helpful? Give feedback.
-
are you able to share the file? I'd like to try and reproduce this |
Beta Was this translation helpful? Give feedback.
-
I would be interested in knowing this. There might be an easier way to get what you need that doesn't involve querying the dataset inside of the sampler. |
Beta Was this translation helpful? Give feedback.
-
The reason I want to do this is to do make a kind of "constrained sampler" i.e. a sampler which rejects samples which do not satisfy some criteria and instead takes another sample, and keeps samples which do meet these constraints. To check the constraints I'm interested in I need to access the data contained within a given sample such that I can test these constraint conditions. The overall point in the constrained sampler is to create a balanced training dataset, since the abundance of "fires" seen in our dataset is much lower than the abundance of "no fire".
The data file (there appears to be nothing special about this file as I get errors with many different) can be found here: https://drive.google.com/file/d/10M-fL8ha0WKUdo10QWdf_4sWWUSzkbAg/view?usp=sharing I also include a link to some code which reproduces the error: https://colab.research.google.com/drive/1MqcLrwEprWc1nE-2JfNkc9aRXdROPwTW?usp=sharing |
Beta Was this translation helpful? Give feedback.
-
Thanks @Hamish-Cam, that sounds like a perfectly reasonable use case to me. I'm not sure exactly how to do what you want. I would suggest posting on https://discuss.pytorch.org/ to get broader PyTorch expertise. Someone must have tried to do this before. |
Beta Was this translation helpful? Give feedback.
-
Update: setting However, I have limited knowledge of the hardware processes involved in PyTorch so I am not 100% sure what |
Beta Was this translation helpful? Give feedback.
-
This is definitely related to multithreading, see rasterio/rasterio#2053. Setting |
Beta Was this translation helpful? Give feedback.
-
There's one thing that confuses me about this issue. According to rasterio/rasterio#2053 (comment):
So one solution should be to close the file handle after opening. But we leave filehandles open all the time in |
Beta Was this translation helpful? Give feedback.
-
Of note to this discussion -- @bw4sz has found slowdowns in Dataloaders that use Datasets where rasterio file handles are kept open (e.g. opened then stored somewhere during @bw4sz -- am I summarizing this correctly / do you want to elaborate any? |
Beta Was this translation helpful? Give feedback.
-
Hi! I am currently trying to use a custom sampler that I have written using the torchGeo framework, however, I keep getting an exception that I don't understand.
Using the RandomBatchGeoSampler source code as is: everything works fine. However, I find that simply adding the 2 new lines shown below gives me an exception. I believe that this may be something to do with the threading processes occurring (which I have very limited knowledge of).
More than happy to give more info on why I need this operation but any help in solving it would be really appreciated!
Error message:
Beta Was this translation helpful? Give feedback.
All reactions