Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Job upscaling: spatial splitting utilities #735

Open
jdries opened this issue Feb 19, 2025 · 4 comments
Open

Job upscaling: spatial splitting utilities #735

jdries opened this issue Feb 19, 2025 · 4 comments
Assignees

Comments

@jdries
Copy link
Collaborator

jdries commented Feb 19, 2025

When using the job manager, users still need to somehow construct the GeoDataFrame that defines initial job splitting.

A typical use case is that users want to run a job over e.g. a full country, and don't want to know the details about tile grids.

There are 'well-known' tile grids that apply: UTM at different sizes for global processing, and LAEA for Europe.

The utility should focus on making UDP based upscaling as simple as possible, reducing the required input parameters to a minimum, while having optional parameters in case the user has preferences.
It is also an option for the UDP itself to formally indicate splitting options, like a preferred tile grid. For instance, if UDP author knows that included job options are optimized for 20km tiles, it makes sense for the job splitter to take this into account, allowing for a more predictable outcome.

Add example with geometry handling.

Existing and similar code in aggregator:
https://github.com/Open-EO/openeo-aggregator/blob/master/src/openeo_aggregator/partitionedjobs/splitting.py

@HansVRP
Copy link
Contributor

HansVRP commented Feb 19, 2025

We now have 3 types of job splitters which we are looking into @VictorVerhaert,

@jdries would you propose here to have it 'non-optimized' and fixed splitting per 20km tiles?

Or should we offer eventually also alternative options for sentinel2 tile based splitting ect.

@jdries
Copy link
Collaborator Author

jdries commented Feb 19, 2025

Should we perhaps turn this into an epic? Because non-optimized would be the most basic thing to start with, and then we should indeed add things like UTM grid. (Overlapping S2 tiles is not a grid I typically recommend, so that would be more like an advanced option.)

@VictorVerhaert
Copy link
Contributor

To get the correct scope of this issue: is it only for splitting bounding boxes (e.g. for inference) or also for geometries (e.g. for point extractions).

The 3 existing job splitters in gfmap are to be used mainly for extractions I would say so maybe not that relevant for this issue

@jdries do we know what the current maximum spatial extent currenty is given all the optimizations and parallelisations done for lcfm? perhaps running a job for a whole county is already possible and we could rather focus on testing it with setting the output tiling grid like the save_Result option for GTiffs to ensure we don't create too large COG's.

@HansVRP
Copy link
Contributor

HansVRP commented Feb 24, 2025

@jdries perhaps good if we indeed create a couple of sub-tasks for this one.

It would be good to have a clear sight on how we want users to 'ínteract' or 'experience' this feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants