Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server-side virtual data processing #527

Open
marceloandrioni opened this issue Aug 16, 2024 · 21 comments
Open

Server-side virtual data processing #527

marceloandrioni opened this issue Aug 16, 2024 · 21 comments

Comments

@marceloandrioni
Copy link

marceloandrioni commented Aug 16, 2024

Hello. I was checking Unidata's news and stumble upon this article.
It says that @matakleo implemented a server-side virtual data processing in TDS, as part of his summer intern project. Any idea when this feature will be available on a main release?
I watched his project presentation and he demonstrated the EnhancementProvider using a classifier.
Do you think it will also be possible to create a provider capable of getting two variables (u and v) and returning two more (magnitude and direction)?

I know uv to magdir it's a simple calculation, but I lost count of how many times we had some kind of problem where non metocan people (e.g. naval architects, subsea engineers) got u/v data directly from our inhouse TDS server and applied an incorrect transformation when converting to magdir. Thus, I believe a resource capable of offering server-side calculations directly for the APIs (opendap, ncss, wcs) would be very helpful.

Thank you and congratulations for the great work.

@tdrwenski
Copy link
Contributor

tdrwenski commented Aug 16, 2024

Hi @marceloandrioni, very cool that you are interested in using this! The way it currently works, it can apply a transformation to a single variable. So some extra work may be necessary before you could transform two variables into two others.

It is available in the current 5.6-SNAPSHOT which does require JDK 17 and some extra JVM args (see CHRONICLE_CACHE here). We are in the process of some security updates, after which we plan to make another release, and that would also contain this feature. It could be nice if you could start to test with the 5.6-SNAPSHOT, because then we can make adjustments to the EnhancementProvider if you run into any issues.

@marceloandrioni
Copy link
Author

Hi @tdrwenski , sorry for the late reply. I am glad to know this option is already available in the snapshot. I will try to set the 5.6-snap + JDK17 on my side to run some tests and get back to you.
Thank you.

@haileyajohnson
Copy link

Hi @marceloandrioni - I have this implemented now here: https://github.com/haileyajohnson/vectorize-thredds-plugin
I'm not what the performance is like because I've only tested it on test data but it's a start at least!

@haileyajohnson
Copy link

also pinging @matakleo - if you wanna see your project in use :)

@marceloandrioni
Copy link
Author

Thanks for this @haileyajohnson. I am out of the office at the moment, but I will try this as soon as possible, probably with some ERA5 wind data. I imagine an extra argument will be needed to indicate if the vector direction calculated from U/V should be "reversed" to indicate "coming from", like the wind and wave convetions.
Thank you!

@matakleo
Copy link

Hi everyone! I've been following this, and I think it's amazing to see it already in use. It's great to know that my contribution can benefit others. Long live TDS and netCDF!

@marceloandrioni
Copy link
Author

Hello @haileyajohnson, I managed to get a TDS running with the following versions:

Linux 5.4.0-155-generic
OpenJDK17U-jdk_x64_linux_hotspot_17.0.13_11
apache-tomcat-10.1.31
THREDDS Data Server 5.6 2024-10-16 (beta)

I ran "mvn package" for the vectorize plugin and moved the resulting vectorize-tds-plugin-1.0-SNAPSHOT.jar file to /usr/local/tds/tomcat/webapps/thredds##5.6/WEB-INF/lib.

Then I ran some tests using this netcdf with dims:

  • time = 3 ;
  • depth = 3 ;
  • latitude = 121 ;
  • longitude = 169 ;

In the thredds catalog.xml I added the following definitions:

    <dataset name="cmems_uv_only"
             ID="cmems_uv_only"
             urlPath="datasets/cmems/cmems_uv_only"
             dataType="Grid">
        <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2"
                location="file:/usr/local/tds/datasets/cmems/cmems_forecast_20210101.nc"/>
    </dataset>

    <dataset name="cmems_uv_and_magdir"
             ID="cmems_uv_and_magdir"
             urlPath="datasets/cmems/cmems_uv_and_magdir"
             dataType="Grid">

        <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2"
                location="file:/usr/local/tds/datasets/cmems/cmems_forecast_20210101.nc">

            <variable name="cspd" shape="time depth latitude longitude" type="float">
                <attribute name="vectorize_mag" value="uo/vo" />
                <attribute name="long_name" value="current speed" />
                <attribute name="units" value="m/s" />
            </variable>

            <variable name="cdir" shape="time depth latitude longitude" type="float">
                <attribute name="vectorize_dir" value="uo/vo" />
                <attribute name="long_name" value="current direction" />
                <attribute name="units" value="degrees" />
            </variable>

        </netcdf>

    </dataset>

When I tried to access cmems_uv_and_magdir the first time I got some errors:

Throwable exception handled : jakarta.servlet.ServletException: Handler dispatch failed: java.lang.UnsupportedClassVersionError: org/example/VectorMagnitude$Provider has been compiled by a more recent version of the Java Runtime (class file version 63.0), this version of the Java Runtime only recognizes class file versions up to 61.0 (unable to load class [org.example.VectorMagnitude$Provider])
	at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1104)
	at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:979)
	at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:1014)
	at org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:903)

I then went back to the vectorize plugin and replaced version 19 with 17 in the source and target in the pom.xml:

    <properties>
        <maven.compiler.source>17</maven.compiler.source>
        <maven.compiler.target>17</maven.compiler.target>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    </properties>

and again moved the resulting vectorize-tds-plugin-1.0-SNAPSHOT.jar file to /usr/local/tds/tomcat/webapps/thredds##5.6/WEB-INF/lib.

With this new file the magnitude and direction appeared in the TDS interface:

Image

But when I tried to get the magnitude and direction values everything showed as zero, despite valid values of u/v.

Image

I am not sure if I missing some steps. It is enough to just put the plugin jar file in the WEB-inf/lib folder or do I need to also declare it as an ioServiceProvider in the threddsConfig.xml config file?

  <!--                                                                          
  Configuring the CDM (netcdf-java library)                                     
  see https://www.unidata.ucar.edu/software/netcdf-java/reference/RuntimeLoading.html
                                                                                
  <nj22Config>                                                                                                                                                                                                                                                                                                       
    <ioServiceProvider class="edu.univ.ny.stuff.FooFiles"/>                     
    <coordSysBuilder convention="foo" class="test.Foo"/>                        
    <coordTransBuilder name="atmos_ln_sigma_coordinates" type="vertical" class="my.stuff.atmosSigmaLog"/>
    <typedDatasetFactory datatype="Point" class="gov.noaa.obscure.file.Flabulate"/>
  </nj22Config>                                                                 
  --> 

Thank you!

@haileyajohnson
Copy link

Cool! Thanks for trying it out! I think you need to put values in your magnitude and direction variables, they need to contain just the index of the corresponding u and v (so just 0 to u/v.length)

@marceloandrioni
Copy link
Author

Hi @haileyajohnson. I am not sure I got this right. I included the values definition in the variables:

    <dataset name="cmems_uv_and_magdir"
             ID="cmems_uv_and_magdir"
             urlPath="datasets/cmems/cmems_uv_and_magdir"
             dataType="Grid">

        <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2"
                location="file:/usr/local/tds/datasets/cmems/cmems_forecast_20210101.nc">

            <variable name="cspd" shape="time depth latitude longitude" type="float">
                <attribute name="vectorize_mag" value="uo/vo" />
                <values start="0" increment="1" />
                <attribute name="long_name" value="current speed" />
                <attribute name="units" value="m/s" />
            </variable>

            <variable name="cdir" shape="time depth latitude longitude" type="float">
                <attribute name="vectorize_dir" value="uo/vo" />
                <values start="0" increment="1" />
                <attribute name="long_name" value="current direction" />
                <attribute name="units" value="degrees" />
            </variable>

        </netcdf>

    </dataset>

But now, after downloading the file using NCSS, the largest value for magnitude and direction shows as 184040, that is, the size of my dataset (3 x 121 x 3 x 169)

Image

@haileyajohnson
Copy link

hmm that looks like it's getting the min/max values from the un-converted values, which would be a bug....
do the values themselves look right?

@marceloandrioni
Copy link
Author

A ncdump -v cspd cmems_uv_and_magdir.nc shows the values of the indexes instead of the magnitude.

Image

The funny thing is that I also tried a direct access using xarray and then the values were all zero.

Image

@haileyajohnson
Copy link

Doesn't look like it's working then haha. I'll take a look at it this afternoon, but we should maybe move this discussion to an issue on my repo and let unidata close this one.

@marceloandrioni
Copy link
Author

No problem. Should I close this and open a new one on vectorize?

@marceloandrioni
Copy link
Author

Hello. With the vectorize plugin it is now possible to calculate magnitude and direction directly on the server using u and v components.
Would be possible to include this functionality on the main TDS codebase? I believe this option would be really useful to a lot of users.
Also, the mag/dir calculation is working fine for data retrieved using NCSS and WMS, but when the data is accessed using the opendap protocol, everything is returned as zero.

Thank you!

@lesserwhirls
Copy link
Collaborator

I agree, this would be useful to have shipped with the TDS. I am working towards a release of the TDS this week - @haileyajohnson, how big of a lift would it be to get the code in the vectorize plugin repo into a PR?

@haileyajohnson
Copy link

Should be as easy as copy-pasting into the filters package. I have a few docs contributions sitting in a branch somewhere, I'll take a look at getting it all into a PR this evening.

@lesserwhirls
Copy link
Collaborator

It does not look like the TDS has its own filters package - is that right, @haileyajohnson? Since I just pushed out a netCDF-Java release, we won't be able to sneak it in there, but I could put it in the TDS, use the SPI to load it, and move it to netCDF-Java for the next release. If it is a simple copy/paste/change package statements job (plus the SPI resource file), I can take care of that.

Also, the mag/dir calculation is working fine for data retrieved using NCSS and WMS, but when the data is accessed using the opendap protocol, everything is returned as zero.

@marceloandrioni I seem to recall that we do not apply any netCDF-Java "enhancements" to datasets served via OPeNDAPa by default. We do some things in the code to map from the CDM to the OPeNDAP data model, but that's done outside of applying NetcdfDataset enhancements (like scale/offset, add coordinate systems, etc.). That said, you might be able to modify the NcML to do this - can you try adding the attribute enhance="all" to the netcdf element of the NcML? Something like:

<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2"
        enhance="all"
        location="file:/usr/local/tds/datasets/cmems/cmems_forecast_20210101.nc">

@marceloandrioni
Copy link
Author

marceloandrioni commented Jan 8, 2025

Hi @lesserwhirls, I did a test with

<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2"
        enhance="all"
        location="file:/usr/local/tds/datasets/cmems/cmems_forecast_20210101.nc">

The results are different, but not correct. Without the enhance="all", the server-side variables are returned as zero, with the enhance="all" the variables are returned as NaN.

Without enhance="all"
Image

With enhance="all"
Image

Thank you.

@lesserwhirls
Copy link
Collaborator

Ok, well, dang. It looks like maybe enhancements via NcML are being...weird...I will need to dig into that. Whatever the issue is, it is likely on the netCDF-Java side, so it won't make this release of the TDS (if it turns out to be something I can address in the TDS release, then we're in business and I'll get it in).

@lesserwhirls
Copy link
Collaborator

I've been digging around and so far what I have found is that when opening the dataset through the opendap service, the values explicitly set on the NcML defined variables cspd and cdir are not being read correctly. These values are being read as 0 or NaN (depending on the value of enhance attribute). The intended values (0, 1, 2, ...) are set in the VariableDS cache, but caching isn't enabled on the Variable that is ultimately read by the opendap service code, so the read call returns an empty array - it skips checking the cache and tries to read from the underlying netCDF file, which does not have these variables. Since those values (0, 1, 2, 3...) are intended to be used to index into u0 and v0 to compute the magnitude and direction, and we instead get 0 or NaN, the reads using those values for indexing into the uo and vo variables come back as 0 or NaN, and thus the calculations come back the same.

All of that said, I think there is a bug in the netCDF-Java side using the particular read path that the opendap server is using, so we'll need to track it down over there. Unfortunately that means this won't end up in the upcoming release of the TDS, but I think we'll get it in the SNAPSHOT pretty soon.

@marceloandrioni
Copy link
Author

Hi @lesserwhirls , thank you for the detailed explanation and for looking into this issue. It’s not a problem at all having the bug fix in the snapshot version. I appreciate your efforts to track it down and address it on the netCDF-Java side. Let me know if there’s anything else I can help with! Maybe some testing after the TDS snapshot release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants