Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazy loading, on-the-fly TorchGeo dataset creator method for ML usecases #1

Open
print-sid8 opened this issue Jan 7, 2025 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@print-sid8
Copy link
Member

print-sid8 commented Jan 7, 2025

I imagine it would be useful to to have TorchGeo Dataset creator, for ML model training, and inference, by being able to create image chips of satellite data.

Thinking it might look something like this -

from rasteret import Rasteret

# Create regular collection
processor = Rasteret(
    custom_name="sentinel2",
    data_source="sentinel-2-l2a" 
)

collection = processor.create_collection(
    bbox=[10.1, 45.5, 10.5, 45.8],
    date_range=["2023-01-01", "2023-12-31"]
)

# Convert to ML dataset
dataset = collection.to_ml_dataset(
    chip_size=256,
    bands=["B02", "B03", "B04", "B08"],  # RGB + NIR
    geometries=[aoi_polygon]  # Optional
)

# Use with PyTorch/torchgeo
from torch.utils.data import DataLoader
loader = DataLoader(dataset, batch_size=32)


# Load trained model
model = torch.load("path/to/model.pth")
model.eval()


# Run inference
predictions = []
with torch.no_grad():
    for batch in loader:
        pred = model(batch)
        predictions.append(pred)

TorchGeo GeoDatasets and most of its other Classes already work with remote COGs. Im going to attempt to create the TorchGeo dataset via Rasteret to see if it makes it even faster or not.

Would love to hear thoughts on this.

@print-sid8 print-sid8 added the enhancement New feature or request label Jan 7, 2025
@print-sid8 print-sid8 self-assigned this Jan 7, 2025
@print-sid8 print-sid8 changed the title Satellite image ChipDataset creator for ML usecases Lazy loading, on-the-fly TorchGeo dataset creator method for ML usecases Jan 11, 2025
@print-sid8
Copy link
Member Author

print-sid8 commented Jan 13, 2025

@calebrob6 thanks for correcting me on this. TorchGeo does work with remote COG files, and it is preferable to work that way.

I will edit my line in the issue -

TorchGeo GeoDatasets and most of its other Classes expect data to be downloaded to local disk.

TorchGeo GeoDatasets and most of its other Classes already work with remote COGs. Im going to attempt to create the TorchGeo dataset via Rasteret to see if it makes it even faster or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant