load_user_data (or similar) #83

m-mohr · 2019-09-13T10:48:16Z

A new process was proposed on the 3rd year planning: load_user_data (or similar).
Should load user-uploaded data and convert it into a data cube, similar to load_collection and load_result.
We need to check how to communicate to a user what file formats are allowed to be uploaded. (Change /output_formats to /file_formats and add a list of supported formats for loading as data cube?)

mkadunc · 2019-09-13T11:45:35Z

👍 to using the existing endpoint

bossie · 2019-10-10T12:38:44Z

Since it returns an image collection, maybe it makes sense to name it load_user_collection, analog to load_collection. Any thoughts?

m-mohr · 2019-10-10T12:41:32Z

It doesn't return an image collection, but a data cube. load_user_data says where to load it from (user workspace), which is consistent with load_collection (loads data made available with the collections endpoints) and load_result (load a job result).

bossie · 2019-10-10T13:13:48Z

Maybe a general load_data process scales better if one wants to load a data cube from data that is not uploaded to the user workspace or the result of a batch job, e.g. S3. In this case the process might look like this:

{
  "process_id": "load_data",
  "arguments": {
    "format": "GTiff",
    "source": "S3",
    "options": {
      "uri": "s3://bucket/prefix",
      "more_options": "here"
    }
  }
}

m-mohr · 2019-10-10T14:23:43Z

I feel that this is a bit too generic and it will be hard to document all the options. Wouldn't it bet easier to use if we would define more processes for specific use cases? For example, load_s3_data and load_gcs_data or so?

bossie · 2019-10-11T05:47:56Z

We have a use case where we want to load geotiffs from disk so I'd like to add something like this:

{
  "process_id":"load_disk_data",
  "arguments":{
    "format":"GTiff",
    "glob_pattern":"/data/MTDA/CGS_S2/CGS_S2_FAPAR/2019/04/24/*/*/10M/*_FAPAR_10M_V102.tif",
    "options":{
      "date_regex":"_(\\d{4})(\\d{2})(\\d{2})T"
    }
  }
}

m-mohr · 2019-10-11T07:52:31Z

@bossie Go ahead and define such a function. I don't think this is a function for the process catalogue though as usually users won't know anything about the internal structure of your disks?! I think this function would be a good start for a list of proprietary extensions for the processes we could list somewhere here.

jdries · 2019-10-11T08:47:00Z

Hi Matthias,
in fact the use case is that the users has put the data there himself, so he does know the structure. It is basically the same as a user managing his files in object storage, only that we use good old NFS.
That's why we thought this might be a candidate for a generic process.

m-mohr · 2019-10-11T09:56:31Z

Go ahead and define such a function. I don't know what you need therefore it is better if you make a proposal we can discuss. The process looks relatively complicated with regex etc and therefore I'm not sure whether that might be too much for the "core". Also, I'm not sure whether this process is limited to your driver or whether other back-ends would also make use of it. I think we should discuss this process separately. In general, I think we should not discuss all kinds of loading functions in this single issue, but make an issue for each of them. Otherwise it gets complicated to follow and manage.

jdries · 2019-12-12T10:35:54Z

Telco conclusion: wait with standardized definition until other backends (want to) implement this.
Meanwhile, here is the current VITO process definition:
http://openeo.vgt.vito.be/openeo/0.4.0/processes/load_disk_data

m-mohr · 2019-12-13T08:30:32Z

Thanks for the conclusion, @jdries. I'm not sure you discussed what the issue was originally about. load_user_data (but we may choose another name, maybe load_uploaded_files?) was already accepted as solution in the Rome meeting to import files from the uploaded files and the API already has changed /output_formats to /file_formats to also list supported input file formats.

For the other processes to import from non-API sources: I would clearly separate and define functions such as (names to be discussed): import_s3 (or load_s3), import_nfs, import_gcs etc. whenever required. For this I'd propose to open separate issues or PRs for discussion. Edit: see #105

…pecify the data cube loading/storing mechanism in /file_formats, see Open-EO/openeo-processes#83

m-mohr · 2019-12-13T14:12:46Z

See PR #106 for a proposal of load_uploaded_files.
See issue #105 for everything related to "non-API" imports.

m-mohr · 2019-12-17T15:50:08Z

Pr has been merged.

m-mohr added the new process label Sep 13, 2019

m-mohr mentioned this issue Sep 13, 2019

Allow /output_formats to list input formats? Open-EO/openeo-api#215

Closed

m-mohr added the help wanted Extra attention is needed label Nov 22, 2019

m-mohr added this to the v1.0 milestone Nov 22, 2019

m-mohr added the accepted label Nov 26, 2019

m-mohr mentioned this issue Dec 11, 2019

Support GeoJSON files? #100

Closed

m-mohr mentioned this issue Dec 13, 2019

import_*: Import data from non-API sources #105

Closed

m-mohr added a commit to Open-EO/openeo-api that referenced this issue Dec 13, 2019

Added subtypes for user file access and clarify that back-ends must s…

c8eab70

…pecify the data cube loading/storing mechanism in /file_formats, see Open-EO/openeo-processes#83

m-mohr added work in progress and removed help wanted Extra attention is needed labels Dec 13, 2019

m-mohr added a commit that referenced this issue Dec 13, 2019

Added load_uploaded_files #83

9351fee

m-mohr mentioned this issue Dec 13, 2019

Load assets (e.g. GeoJSON) from user workspace #47

Closed

m-mohr closed this as completed Dec 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

load_user_data (or similar) #83

load_user_data (or similar) #83

m-mohr commented Sep 13, 2019 •

edited

Loading

mkadunc commented Sep 13, 2019

bossie commented Oct 10, 2019 •

edited

Loading

m-mohr commented Oct 10, 2019

bossie commented Oct 10, 2019

m-mohr commented Oct 10, 2019

bossie commented Oct 11, 2019

m-mohr commented Oct 11, 2019

jdries commented Oct 11, 2019

m-mohr commented Oct 11, 2019

jdries commented Dec 12, 2019

m-mohr commented Dec 13, 2019 •

edited

Loading

m-mohr commented Dec 13, 2019

m-mohr commented Dec 17, 2019

load_user_data (or similar) #83

load_user_data (or similar) #83

Comments

m-mohr commented Sep 13, 2019 • edited Loading

mkadunc commented Sep 13, 2019

bossie commented Oct 10, 2019 • edited Loading

m-mohr commented Oct 10, 2019

bossie commented Oct 10, 2019

m-mohr commented Oct 10, 2019

bossie commented Oct 11, 2019

m-mohr commented Oct 11, 2019

jdries commented Oct 11, 2019

m-mohr commented Oct 11, 2019

jdries commented Dec 12, 2019

m-mohr commented Dec 13, 2019 • edited Loading

m-mohr commented Dec 13, 2019

m-mohr commented Dec 17, 2019

m-mohr commented Sep 13, 2019 •

edited

Loading

bossie commented Oct 10, 2019 •

edited

Loading

m-mohr commented Dec 13, 2019 •

edited

Loading