-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
load_uploaded_files #106
load_uploaded_files #106
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this could indeed be used instead of our current read_vector process. So overall proposal seems fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I would leave indeed "format" as required, it's more explicit and it may be helpful when loading data from a folder.
how is loading of a folder defined? loading all files in the folder? only files matching the format? |
@soxofaan I tried to incorporate your feedback. I tried to make it as useful as possible, but I'm not so sure about that part:
A re-review would be appreciated. |
Another minor note: |
@soxofaan No, I don't think this is a good idea. This process is really only for loading files from the user-uploaded files, uploaded using the /files endpoints and there's no absolute path for them (one could argue the root of each users workspace is / though), but I guess it's simpler to restrict to relative?! If you want to load data from other sources, it really is up to the ones proposed in #105. |
Indeed, folder support adds a bunch of complications. Maybe we could start with just providing a |
My reason to add it was for example a (larger) set of GeoTIFF files I want to load. Could be hundreds to be combined in a single data cube. Listing them all individually is a pain, but the user experience could probably be somewhat improved by clients. Adding the paragraph about skipping files, I'm thinking about skipping accompanying metadata files (e.g. if you upload a STAC catalog). What other difficulties you are thinking of, @soxofaan ? |
As you hinted yourself in the process documentation, folder support introduces some aspects that complicate the API and backend implementation. Over time one might even want to expose these things as options to the user:
A file-only API is more explicit and avoids a lot of the above issues. That being said, I was coming to this issue from a "single file" use case, so folder support seemed overkill. If you indeed are thinking of "lots of files" use cases, folder support makes sense. On the other hand, wouldn't there "lots of files" use cases be better served by the |
@soxofaan Okay, I'm fine with a simpler specification, so I removed folder support from the API subtypes and this process, but added that "Clients should assist to generate a list of files for folders." to the API subtype
No, "import" and "load" refer to different data sources, not the amount of data loaded. "import" is for loading from non-API sources, "load" is for loading from API sources (e.g. from /collections and /files). For example, if we'd go for a "load from folder" process separately, I'd probably name it "load_uploaded_folder" or so. |
I understand, but I meant it more like: if you want to use openEO on a lot of (possibly large) files of your own, maybe the openEO "upload"+"load" feature is not the best approach and you are better served with an approach that uses external storage (like S3) that you then "import". |
Sure, this could be a better alternative if supported by the back-end. If the proposal if fine now, I'd appreciate an approval for the PR. :-) |
Added load_uploaded files process.
Notes:
format
be required or be optional and if not specified the back-end can try to guess based on the file_extensions?