load_uploaded_files #106

m-mohr · 2019-12-13T14:11:27Z

Added load_uploaded files process.

Notes:

Input file formats are specified in /file_formats
Process to transform files into data cubes must be described in the /file_formats descriptions.
Should format be required or be optional and if not specified the back-end can try to guess based on the file_extensions?
For API changes see commit Open-EO/openeo-api@c8eab70

load_uploaded_files.json

jdries

Seems like this could indeed be used instead of our current read_vector process. So overall proposal seems fine.

lforesta

Looks good. I would leave indeed "format" as required, it's more explicit and it may be helpful when loading data from a folder.

load_uploaded_files.json

soxofaan · 2019-12-17T10:44:35Z

how is loading of a folder defined? loading all files in the folder? only files matching the format?

m-mohr · 2019-12-17T11:09:31Z

@soxofaan I tried to incorporate your feedback. I tried to make it as useful as possible, but I'm not so sure about that part:

Implementations may skip files in folders that clearly can't be read using the specified format

A re-review would be appreciated.

soxofaan · 2019-12-17T11:12:26Z

Another minor note: file-path and folder-path from Open-EO/openeo-api@c8eab70 explicitly state "relative path to a user-uploaded file". It might be useful to relax this a bit and allow absolute paths as well, with the caveat that this assumes backend-specific knowledge. This would make sense for some use cases at VITO. Alternatively this could also be covered by suggestions from #105

m-mohr · 2019-12-17T11:15:58Z

@soxofaan No, I don't think this is a good idea. This process is really only for loading files from the user-uploaded files, uploaded using the /files endpoints and there's no absolute path for them (one could argue the root of each users workspace is / though), but I guess it's simpler to restrict to relative?! If you want to load data from other sources, it really is up to the ones proposed in #105.

soxofaan · 2019-12-17T11:24:30Z

but I'm not so sure about that part: Implementations may skip files in folders that clearly can't be read using the specified format

Indeed, folder support adds a bunch of complications. Maybe we could start with just providing a file-path only version of this process and keeping the folder support feature for further discussion and follow-up PR? I also haven't seen enough use cases to estimate the value of folder support at the moment.

m-mohr · 2019-12-17T11:29:21Z

My reason to add it was for example a (larger) set of GeoTIFF files I want to load. Could be hundreds to be combined in a single data cube. Listing them all individually is a pain, but the user experience could probably be somewhat improved by clients.

Adding the paragraph about skipping files, I'm thinking about skipping accompanying metadata files (e.g. if you upload a STAC catalog).

What other difficulties you are thinking of, @soxofaan ?

soxofaan · 2019-12-17T14:54:50Z

As you hinted yourself in the process documentation, folder support introduces some aspects that complicate the API and backend implementation. Over time one might even want to expose these things as options to the user:

recurse in subfolders or not? or skip particular ones?
filter out some files (black listing)
select files (based on format guessing, file extension, file globbing, ...)
fail on unreadable files or skip them?

A file-only API is more explicit and avoids a lot of the above issues.
And indeed, clients are a good place to alleviate the pain of explicit file listing.

That being said, I was coming to this issue from a "single file" use case, so folder support seemed overkill. If you indeed are thinking of "lots of files" use cases, folder support makes sense. On the other hand, wouldn't there "lots of files" use cases be better served by the import_ ideas of #105 ?

m-mohr · 2019-12-17T15:15:24Z

@soxofaan Okay, I'm fine with a simpler specification, so I removed folder support from the API subtypes and this process, but added that "Clients should assist to generate a list of files for folders." to the API subtype file-paths.

On the other hand, wouldn't there "lots of files" use cases be better served by the import_ ideas of #105 ?

No, "import" and "load" refer to different data sources, not the amount of data loaded. "import" is for loading from non-API sources, "load" is for loading from API sources (e.g. from /collections and /files). For example, if we'd go for a "load from folder" process separately, I'd probably name it "load_uploaded_folder" or so.

soxofaan · 2019-12-17T15:30:16Z

No, "import" and "load" refer to different data sources, not the amount of data loaded.

I understand, but I meant it more like: if you want to use openEO on a lot of (possibly large) files of your own, maybe the openEO "upload"+"load" feature is not the best approach and you are better served with an approach that uses external storage (like S3) that you then "import".

m-mohr · 2019-12-17T15:32:41Z

Sure, this could be a better alternative if supported by the back-end.

If the proposal if fine now, I'd appreciate an approval for the PR. :-)

Added load_uploaded_files #83

9351fee

m-mohr added the new process label Dec 13, 2019

m-mohr added this to the v1.0 milestone Dec 13, 2019

m-mohr requested review from bossie, neteler, lforesta, jdries, aljacob, mkadunc and kempenep December 13, 2019 14:11

m-mohr added the help wanted Extra attention is needed label Dec 13, 2019

This was referenced Dec 13, 2019

load_user_data (or similar) #83

Closed

Support GeoJSON files? #100

Closed

Load assets (e.g. GeoJSON) from user workspace #47

Closed

jdries reviewed Dec 13, 2019

View reviewed changes

load_uploaded_files.json Outdated Show resolved Hide resolved

jdries reviewed Dec 13, 2019

View reviewed changes

"save to" => "read from" #106

eaa627f

lforesta approved these changes Dec 16, 2019

View reviewed changes

jdries mentioned this pull request Dec 17, 2019

Phase out read_vector usage Open-EO/openeo-python-client#104

Closed

soxofaan reviewed Dec 17, 2019

View reviewed changes

load_uploaded_files.json Outdated Show resolved Hide resolved

load_uploaded_files.json Outdated Show resolved Hide resolved

Clarifications for load_uploaded_files

3e3fab9

m-mohr force-pushed the issue-83 branch from eee550a to 3e3fab9 Compare December 17, 2019 11:09

m-mohr requested a review from soxofaan December 17, 2019 11:16

m-mohr added a commit to Open-EO/openeo-api that referenced this pull request Dec 17, 2019

Disallow folder paths, see Open-EO/openeo-processes#106

26c5d59

m-mohr added a commit that referenced this pull request Dec 17, 2019

Disallow folders #106

bd1de54

Disallow folders #106

b243591

m-mohr force-pushed the issue-83 branch from bd1de54 to b243591 Compare December 17, 2019 15:16

soxofaan approved these changes Dec 17, 2019

View reviewed changes

m-mohr merged commit a4334b2 into draft Dec 17, 2019

m-mohr deleted the issue-83 branch December 17, 2019 15:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

load_uploaded_files #106

load_uploaded_files #106

m-mohr commented Dec 13, 2019

jdries left a comment

lforesta left a comment

soxofaan commented Dec 17, 2019

m-mohr commented Dec 17, 2019 •

edited

Loading

soxofaan commented Dec 17, 2019

m-mohr commented Dec 17, 2019 •

edited

Loading

soxofaan commented Dec 17, 2019

m-mohr commented Dec 17, 2019

soxofaan commented Dec 17, 2019

m-mohr commented Dec 17, 2019 •

edited

Loading

soxofaan commented Dec 17, 2019

m-mohr commented Dec 17, 2019

load_uploaded_files #106

load_uploaded_files #106

Conversation

m-mohr commented Dec 13, 2019

jdries left a comment

Choose a reason for hiding this comment

lforesta left a comment

Choose a reason for hiding this comment

soxofaan commented Dec 17, 2019

m-mohr commented Dec 17, 2019 • edited Loading

soxofaan commented Dec 17, 2019

m-mohr commented Dec 17, 2019 • edited Loading

soxofaan commented Dec 17, 2019

m-mohr commented Dec 17, 2019

soxofaan commented Dec 17, 2019

m-mohr commented Dec 17, 2019 • edited Loading

soxofaan commented Dec 17, 2019

m-mohr commented Dec 17, 2019

m-mohr commented Dec 17, 2019 •

edited

Loading

m-mohr commented Dec 17, 2019 •

edited

Loading

m-mohr commented Dec 17, 2019 •

edited

Loading