Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

load_uploaded_files #106

Merged
merged 4 commits into from
Dec 17, 2019
Merged

load_uploaded_files #106

merged 4 commits into from
Dec 17, 2019

Conversation

m-mohr
Copy link
Member

@m-mohr m-mohr commented Dec 13, 2019

Added load_uploaded files process.

Notes:

  • Input file formats are specified in /file_formats
  • Process to transform files into data cubes must be described in the /file_formats descriptions.
  • Should format be required or be optional and if not specified the back-end can try to guess based on the file_extensions?
  • For API changes see commit Open-EO/openeo-api@c8eab70

load_uploaded_files.json Outdated Show resolved Hide resolved
Copy link
Contributor

@jdries jdries left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this could indeed be used instead of our current read_vector process. So overall proposal seems fine.

Copy link
Contributor

@lforesta lforesta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I would leave indeed "format" as required, it's more explicit and it may be helpful when loading data from a folder.

load_uploaded_files.json Outdated Show resolved Hide resolved
load_uploaded_files.json Outdated Show resolved Hide resolved
@soxofaan
Copy link
Member

how is loading of a folder defined? loading all files in the folder? only files matching the format?

@m-mohr
Copy link
Member Author

m-mohr commented Dec 17, 2019

@soxofaan I tried to incorporate your feedback. I tried to make it as useful as possible, but I'm not so sure about that part:

Implementations may skip files in folders that clearly can't be read using the specified format

A re-review would be appreciated.

@soxofaan
Copy link
Member

Another minor note: file-path and folder-path from Open-EO/openeo-api@c8eab70 explicitly state "relative path to a user-uploaded file". It might be useful to relax this a bit and allow absolute paths as well, with the caveat that this assumes backend-specific knowledge. This would make sense for some use cases at VITO. Alternatively this could also be covered by suggestions from #105

@m-mohr
Copy link
Member Author

m-mohr commented Dec 17, 2019

@soxofaan No, I don't think this is a good idea. This process is really only for loading files from the user-uploaded files, uploaded using the /files endpoints and there's no absolute path for them (one could argue the root of each users workspace is / though), but I guess it's simpler to restrict to relative?! If you want to load data from other sources, it really is up to the ones proposed in #105.

@m-mohr m-mohr requested a review from soxofaan December 17, 2019 11:16
@soxofaan
Copy link
Member

but I'm not so sure about that part: Implementations may skip files in folders that clearly can't be read using the specified format

Indeed, folder support adds a bunch of complications. Maybe we could start with just providing a file-path only version of this process and keeping the folder support feature for further discussion and follow-up PR? I also haven't seen enough use cases to estimate the value of folder support at the moment.

@m-mohr
Copy link
Member Author

m-mohr commented Dec 17, 2019

My reason to add it was for example a (larger) set of GeoTIFF files I want to load. Could be hundreds to be combined in a single data cube. Listing them all individually is a pain, but the user experience could probably be somewhat improved by clients.

Adding the paragraph about skipping files, I'm thinking about skipping accompanying metadata files (e.g. if you upload a STAC catalog).

What other difficulties you are thinking of, @soxofaan ?

@soxofaan
Copy link
Member

As you hinted yourself in the process documentation, folder support introduces some aspects that complicate the API and backend implementation. Over time one might even want to expose these things as options to the user:

  • recurse in subfolders or not? or skip particular ones?
  • filter out some files (black listing)
  • select files (based on format guessing, file extension, file globbing, ...)
  • fail on unreadable files or skip them?

A file-only API is more explicit and avoids a lot of the above issues.
And indeed, clients are a good place to alleviate the pain of explicit file listing.

That being said, I was coming to this issue from a "single file" use case, so folder support seemed overkill. If you indeed are thinking of "lots of files" use cases, folder support makes sense. On the other hand, wouldn't there "lots of files" use cases be better served by the import_ ideas of #105 ?

m-mohr added a commit to Open-EO/openeo-api that referenced this pull request Dec 17, 2019
m-mohr added a commit that referenced this pull request Dec 17, 2019
@m-mohr
Copy link
Member Author

m-mohr commented Dec 17, 2019

@soxofaan Okay, I'm fine with a simpler specification, so I removed folder support from the API subtypes and this process, but added that "Clients should assist to generate a list of files for folders." to the API subtype file-paths.

On the other hand, wouldn't there "lots of files" use cases be better served by the import_ ideas of #105 ?

No, "import" and "load" refer to different data sources, not the amount of data loaded. "import" is for loading from non-API sources, "load" is for loading from API sources (e.g. from /collections and /files). For example, if we'd go for a "load from folder" process separately, I'd probably name it "load_uploaded_folder" or so.

@soxofaan
Copy link
Member

No, "import" and "load" refer to different data sources, not the amount of data loaded.

I understand, but I meant it more like: if you want to use openEO on a lot of (possibly large) files of your own, maybe the openEO "upload"+"load" feature is not the best approach and you are better served with an approach that uses external storage (like S3) that you then "import".

@m-mohr
Copy link
Member Author

m-mohr commented Dec 17, 2019

Sure, this could be a better alternative if supported by the back-end.

If the proposal if fine now, I'd appreciate an approval for the PR. :-)

@m-mohr m-mohr merged commit a4334b2 into draft Dec 17, 2019
@m-mohr m-mohr deleted the issue-83 branch December 17, 2019 15:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed new process
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants