Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set HDF5_PLUGIN_PATH programmatically #2753

Open
edwardhartnett opened this issue Sep 19, 2023 · 21 comments
Open

set HDF5_PLUGIN_PATH programmatically #2753

edwardhartnett opened this issue Sep 19, 2023 · 21 comments

Comments

@edwardhartnett
Copy link
Contributor

For filters (and for Zarr?) we are using the HDF5_PLUGIN_PATH environment variable to specify where plugins should be found.

But as I just learned today, we can actually set the plugin path programmatically now.

https://docs.hdfgroup.org/hdf5/develop/group___h5_p_l.html#title3

Turns out they have had this capability for years, so we should have known this when we did the compression work. But we did not. I guess there is a good reason to go to workshops in person.

How should we use this?

We could:

  1. Allow the builder to specify a default plugin path that will always be used whenever netCDF is run. We could do this by calling the appropriate functions on library startup. Then we don't have to care about the environment variable HDF5_PLUGIN_PATH (but would interoperate with it). This is easy and we should do this immediately.
  2. We could put new functions in the netCDF API to allow the user to directly append to, delete, and inquire the plugin path. This would be maximally flexible. However I don't think this is a good enough reason to extend the API.

So I suggest 1 but not 2. However I defer to @DennisHeimbigner the expert on filters.

@DennisHeimbigner
Copy link
Collaborator

I note that there are also prepend (put at front) and insert (put anywhere in list) functions.

@DennisHeimbigner
Copy link
Collaborator

Allow the builder to specify a default plugin path that will always be used whenever netCDF is run.

It occurs to me that I do not quite understand this proposal. I guess that you are proposing an option.
say --with-default-plugin-path=XXX.
Is my guess correct?

@edwardhartnett
Copy link
Contributor Author

Yes, that's what I had in mind.

I think we already allow them to specify a plugin directory, right? And then they have to set the environment varialble HDF5_PLUGIN_PATH to that location before running any programs that use the filter.

Instead, netcdf-c could set that programmatically, so there would be no need for the environment var to be set. This would help a lot, since many users are sure to get confused and forget to set the environment var.

@DennisHeimbigner
Copy link
Collaborator

Also, we already have --with-plugin-dir to specify where to install plugins. How, if at all, should
this proposal interact with --with-plugin-dir?

@edwardhartnett
Copy link
Contributor Author

I think it should just take that directory, and set it programmatically whenever netcdf starts. So the user does not have to set HDF5_PLUGIN_PATH environment variable for filters to work.

@DennisHeimbigner
Copy link
Collaborator

DennisHeimbigner commented Sep 30, 2023

I don't follow. the --with-plugin-dir options says where to install the plugins, while --with-default-plugin-path
(my name for the option you propose) would specify from where to load plugins.
We could do, say, the following:
1.If --with-plugin-dir is set, and --with-default-plugin-path is not set, we use the former as the value of the latter.
2. If --with-plugin-dir is not set, and --with-default-plugin-path is set, we use the latter as the value of the former.
3. We leave them as completely independent.

@edwardhartnett
Copy link
Contributor Author

What good can possibly come of allowing the user to install filters in one place, and then look for them in another?

Let there be zero or one directory specified at netCDF configure time. If zero, the HDF5 default plugin dir is used.
If one, then filters are installed there, and every time netcdf-c starts up, it prepends that directory to the plugin path programmatically.

I assume the HDF5 code would give the environment variable HDF5_PLUGIN_PATH the final say in the matter, so that is still available to advanced users.

@DennisHeimbigner
Copy link
Collaborator

DennisHeimbigner commented Oct 1, 2023

What good can possibly come of allowing the user to install filters in one place, and then look for them in another?

It is quite conceivable that an organization might have a directory of globally available plugins and a user might have his own directory of plugins. In this case, the local directory is for installation and the path "global;local" is the path for dynamic loading. This raises the question if the default path should be a single
directory or should be allowed to a true path (i.e. a sequence of directories).

@edwardhartnett
Copy link
Contributor Author

edwardhartnett commented Oct 1, 2023

I would say that the netcdf installation has the global path and the user should set the HDF5_PLUGIN_PATH environment variable to include his local plugin directory.

Yes, a true path should be allowed.

@DennisHeimbigner
Copy link
Collaborator

Proposal for Default Plugin Path

The plugin path list manipulation functions in HDF5 operate basically as follows.

  1. On startup, the internal plugin path list is one path that is either the built-in default (i.e. /usr/local/hdf5/lib/plugin) or the value of HDF5_PLUGIN_PATH (if defined).
  2. If HDF5_PLUGIN_PATH has multiple directories (e.g. dirpath1:dirpath2) then it is parsed as a list so the internal plugin path list is as follows.
element [0]: dirpath1
element [1]: dirpath2
  1. The append and prepend operations work as expected to insert a single directory at the front or end of the internal plugin path.

So, the proposal is as follows.

Add configure option

--with-default-plugin-path=*;\<path\>;\<path\>;...;*

The * indicates where the existing default plugin path goes.
So, the sequence of <path>s is parsed to produce a list
and the list is inserted (in order) at either the front or the end of the
existing internal plugin list as controlled by the placement of the star.
The leading star (or no star) means insert at the end of the existing path.
Trailing start means insert at the beginning of the existing path.

Note that semicolon separator is always allowed even on linux machines.
The colon is also allowed on non-windows platforms, but not on
platforms that allow the "X:" windows drive letter notation in paths.

The --with-plugin-dir option is renamed --with-plugin-install-dir
but otherwise its semantics remains unchanged.
For back compatibility, the --with-plugin-dir option
is deprecated but aliased to --with-plugin-install-dir for some
period of time before removing it.

As noted above, if HDF5_PLUGIN_PATH is defined at startup, then it serves to initialize the internal path list.

@Alexander-Barth
Copy link
Contributor

Being able to set the plugin path programmatically would help us a lot in in the julia community where all packages are installed in the users home directory (typically) and the path contains a hashsum of the package (e.g. /home/abarth/.julia/artifacts/980adb65c877c384f59bf7b2b800699e22a38815/lib/libhdf5.so). Multiple installation of HDF5 can coexists. But for a user, these path are difficult to know. I assume that the HDF5_PLUGIN_PATH is used when the HDF5 library is loaded (and later changes will be ignored, this is correct?). If a NetCDF package or the user can tell HDF5/NetCDF where to find the plugin within a julia session after the HDF5 library is loaded, this would help us a lot.

@DennisHeimbigner
Copy link
Collaborator

This just slipped down the stack. I will put together a proposal for this and run it by
you and Unidata. May take a little while since it is not my highest priority.

@DennisHeimbigner
Copy link
Collaborator

I assume that the HDF5_PLUGIN_PATH is used when the HDF5 library is loaded (and later changes will be ignored, this is correct?).

Sort of. AFAIK the value of HDF5_PLUGIN_PATH is parsed when the HDF5 library is initialized. You are correct
that any changes to HDF5_PLUGIN_PATH are ignored after that point.

If a NetCDF package or the user can tell HDF5/NetCDF where to find the plugin within a julia session after the HDF5 library is loaded, this would help us a lot.

If I understand, your requirements are somewhat more extensive than Ed's original proposal above.
It sounds like you want to be able to do the equivalent of having later changes to HDF5_PLUGIN_PATH
to be propagated to the HDF5 library plugin search immediately. As opposed to Ed's at build time.

@Alexander-Barth
Copy link
Contributor

Alexander-Barth commented Jan 30, 2024

Yes, indeed I would like a way to set the plugin path after the HDF5 and netCDF libraries are loaded via an environment variable or via an API call. A somewhat similar problem was the discovering of the SSL certificates which is possible via nc_rc_set API that you implemented (thanks again:-)). Maybe an API call would be preferable, so that we do not leave a modified HDF5_PLUGIN_PATH variable once the julia program finishes.

As I understand, HDF5 expects all plugins inside its own plugin directory per default and the NetCDF installation script copies them there automatically so that the user does not have to worry about setting the HDF5_PLUGIN_PATH variable.

In julia we currently do not ship NetCDF/HDF5 filters because we cannot modify the HDF5 package after building the HDF5 source code. Including the NetCDF filters in the HDF5 package would result that HDF5 becomes dependency on netCDF (and vice versa). So one possibility for us would be to ship the NetCDF/HDF5 plugin with the NetCDF package. But then we need to tell HDF5 where to look for the plugins.

Maybe an alternative (simpler?) approach for our problem, could be that when libnetcdf.so is initialized the path plugins or ../plugins (relative to libnetcdf.so) is appended using H5PLappend (as in julia all libraries are installed with different path prefix unlike Linux systems when all libraries are install in a common prefix /usr for example)

@edwardhartnett
Copy link
Contributor Author

OK, I am taking a look at this now.

If netcdf-c called H5PLappend() at initialization, feeding it the directory used in the configure, would that remove the need for setting HDF5_PLUGIN_PATH?

I would call H5PLappend() from the NC4_initialize() function, currently in libsrc4/nc4dispatch.c. I would move that function to its own file, with a .in ending, and then use the configure step to fill in the value of the directory specified at configure.

@DennisHeimbigner @Alexander-Barth any thoughts?

@edwardhartnett
Copy link
Contributor Author

@Alexander-Barth the proposal to set the plugin path pro grammatically should be discussed as a separate issue. I'm not convinced we should expose this level of detail to netCDF users. I note however that you can easily call the HDF5 function H5PLappend() at any time and achieve this.

@DennisHeimbigner I don't think HDF5_PLUGIN_PATH should play any role at configure-time. It should be ignored. We do not want to increase use of environment variables, that brings many problems. And there can be no user expectation that HDF5_PLUGIN_PATH will have meaning at netCDF install time. It's clearly a run-time issue.

So that leaves only whatever the builder passes to configure with --with-plugin-dir= (and the CMake equivalent). I propose to append this value to the HDF5 plugin path. This will allow uses to set the plugin dir at configure, and have it work out of the box. If they wish to set multiple directories, separated by semi-colons, they can do that. If they wish to override this later with HDF5_PLUGIN_PATH, they can easily do that.

But ordinary users who don't care will just get a working installation without have to set any environment variable.

Any objections?

@Alexander-Barth
Copy link
Contributor

Ok, I will try to use H5PLappend(). I was not aware of this function and I guess too that this should work.

@edwardhartnett
Copy link
Contributor Author

There's a bunch of related functions: https://docs.hdfgroup.org/hdf5/v1_12/group___h5_p_l.html

Unlike most HDF5 functions, these do not require any file or object IDs. So they can be called without any knowledge of netCDF internals.

@Alexander-Barth
Copy link
Contributor

Thanks, this is very helpful to know :-)

@DennisHeimbigner
Copy link
Collaborator

Please review discussion item #2495

DennisHeimbigner added a commit to DennisHeimbigner/netcdf-c that referenced this issue Sep 14, 2024
…earch path

re: Unidata#2753

As suggested by Ed Hartnett, This PR extends the netcdf.h API to support programmatic control over the search path used to locate plugins.

I created several different APIs, but finally settled on the following
API as being the simplest possible. It has the disadvantage that
it requires use of a global lock (not implemented) if used
in a threaded environment.

Specifically, note that modifying the plugin paths must be done "atomically".
That is, in a multi-threaded environment, it is important that the sequence of actions involved in setting up the plugin paths must be done by a single processor or in some other way as to guarantee that two or more processors are not simultaneously accessing the plugin path read/write operations.

As an example, assume there exists a mutex lock called PLUGINLOCK.
Then any processor accessing the plugin paths should operate as follows:
````
lock(PLUGINLOCK);
nc_plugin_path_read(...);
<rebuild plugin path>
nc_plugin_path_write(...);
unlock(PLUGINLOCK);
````

The API proposed in this PR looks like this (from netcdf-c/include/netcdf_filter.h).

* ````int nc_plugin_path_read(int formatx, size_t* ndirsp, char** dirs);````

    This function returns the current sequence of directories in the internal plugin path list. Since this function does not modify the plugin path, it can be called at any time.

    The arguments are as follows:
    - _formatx_ specify which dispatch implementation to read: currently NC_FORMATX_NC_HDF5 or NC_FORMATX_NCZARR.
    - _ndirsp_ return the number of dirs in the internal path list
    - _dirs_ memory for storing the sequence of directies in the internal path list.

    In practice, this function needs to be called twice. The first time with npaths not NULL and pathlist set to NULL to get the size of the path list. The second time with pathlist not NULL to get the actual sequence of paths.

* ````int nc_plugin_path_write(int formatx, size_t ndirs, char** const dirs);````

    This function empties the current internal path sequence and replaces it with the sequence of directories argument. Using a paths argument of NULL or npaths argument of 0 will clear the set of plugin paths.

    The arguments are as follows:
    - _formatx_ specify which dispatch implementation to write: currently NC_FORMATX_NC_HDF5 or NC_FORMATX_NCZARR or 0 (zero).
    - _ndirs_ length of the dirs argument
    - _dirs_ a vector of directory path string used to overwrite the current internal path list

    If the value zero is used for the formatx argument, then the value being written is applied to all implemention: currently NC_FORMATX_NC_HDF5 and NC_FORMATX_NCZARR.

In addition, two other API functions are defined.
````
int nc_plugin_path_initialize(void);
int nc_plugin_path_finalize(void);
````
As a rule, the initialize and finalize functions do not need to be explicitly called by the user because they are called as part of *nc_initialize()/nc_finalize()*.

In addition to the above changes, add a plugin path testcase:
unit_tests/run_pluginpaths.sh+tst_pluginpaths.c.

## Misc. Changes
1. Added a version number for the formatx dispatcher.
2. Setup a per-dispatcher global state mechanism.
3. Add some path manipulation utilities to netcf_aux.h
4. Fix the construction of netcdf_json.h as a BUILT_SOURCE.
5. Fix some minor bugs in netcdf_json.h
6. Fix the construction of netcdf_proplist.h as a BUILT_SOURCE.
@DennisHeimbigner
Copy link
Collaborator

See #3024

DennisHeimbigner added a commit to DennisHeimbigner/netcdf-c that referenced this issue Sep 30, 2024
…earch path

re: Unidata#2753

As suggested by Ed Hartnett, This PR extends the netcdf.h API to support programmatic control over the search path used to locate plugins.

I created several different APIs, but finally settled on the following
API as being the simplest possible. It does have the disadvantage that
it requires use of a global lock (not implemented) if used
in a threaded environment.

Specifically, note that modifying the plugin path must be done "atomically".
That is, in a multi-threaded environment, it is important that the sequence of actions involved in setting up the plugin path must be done by a single processor or in some other way as to guarantee that two or more processors are not simultaneously accessing the plugin path get/set operations.

As an example, assume there exists a mutex lock called PLUGINLOCK.
Then any processor accessing the plugin paths should operate as follows:
````
lock(PLUGINLOCK);
nc_plugin_path_get(...);
<rebuild plugin path>
nc_plugin_path_set(...);
unlock(PLUGINLOCK);
````
## Internal Architecture

It is assumed here that there only needs to be a single set of plugin path
directories that is shared by all filter code and is independent of any file descriptor; it is global in other words.
This means, for example, that the path list for NCZarr and for HDF5 will always be the same.

However internally, processing the set of plugin paths depends on the particular
NC_FORMATX value (NC_FORMATX_NC_HDF5 and NC_FORMATX_NCZARR, currently).
So the *nc_plugin_path_set* function, will take the paths it is given and
propagate them to each of the NC_FORMATX dispatchers to store in a way that is
appropriate to the given dispatcher.

There is a complication with respect to the *nc_plugin_path_get* function.
It is possible for users to bypass the netcdf API and modify the HDF5 plugin paths directly. This can result in an inconsistent plugin path between the value
used by HDF5 and the global value used by netcdf-c. Since there is no obvious fix for this, we warn the user of this possibility and otherwise ignore it.

## Test Changes
* A new test -- unit_test/run_pluginpaths.sh -- was created to test this new capability.
* A new test utility has been added as *nczarr_test/ncpluginpath* to return information about the default plugin path list.

## Documentation
* A new file -- docs/pluginpath.md -- provides documentation of the new API.

## Misc. Changes
1. Add some path manipulation utilities to netcf_aux.h
2. Fix the construction of netcdf_json.h as a BUILT_SOURCE
3. Fix some minor bugs in netcdf_json.h
4. Convert netcdf_json.h and netcdf_proplist.h to BUILT_SOURCE.
5. Add NETCDF_ENABLE_HDF5 as synonym for USE_HDF5
6. Fix some size_t <-> int conversion warnings.
7. Encountered and fixed the Windows \r\n problem in tst_pluginpaths.c.
8. Cleanup some minor CMakeLists.txt problems.

## Addendum: Proposed API

The API makes use of a counted vector of strings representing the sequence of directories in the path. The relevant type definition is as follows.
````
typedef struct NCPluginList {size_t ndirs; char** dirs;} NCPluginList;
````

The API proposed in this PR looks like this (from netcdf-c/include/netcdf_filter.h).

````int nc_plugin_path_ndirs(size_t* ndirsp);````

    This function returns the number of directories in the sequence if internal directories of the internal plugin path list.

    The argument is as follows:
    - *ndirsp* store the number of directories in this memory.

* ````int nc_plugin_path_get(NCPluginList* dirs);````

    This function returns the current sequence of directories from the internal plugin path list. Since this function does not modify the plugin path, it does not need to be locked; it is only when used to get the path to be modified that locking is required.

    The argument is as follows:
    - *dirs* counted vector for storing the sequence of directies in the internal path list.

    If the value of *dirs.dirs is NULL (the normal case), then memory is allocated to hold the vector of directories. Otherwise, use the memory of *dirs.dirs* to hold the vector of directories.

* ````int nc_plugin_path_set(const NCPluginList* dirs);````

    This function empties the current internal path sequence and replaces it with the sequence of directories argument. Using an *ndirs* argument of 0 will clear the set of plugin paths.

    The argument are as follows:
    - *dirs* counted vector for storing the sequence of directies in the internal path list.
DennisHeimbigner added a commit to DennisHeimbigner/netcdf-c that referenced this issue Sep 30, 2024
…earch path

re: Unidata#2753

As suggested by Ed Hartnett, This PR extends the netcdf.h API to support programmatic control over the search path used to locate plugins.

I created several different APIs, but finally settled on the following
API as being the simplest possible. It does have the disadvantage that
it requires use of a global lock (not implemented) if used
in a threaded environment.

Specifically, note that modifying the plugin path must be done "atomically".
That is, in a multi-threaded environment, it is important that the sequence of actions involved in setting up the plugin path must be done by a single processor or in some other way as to guarantee that two or more processors are not simultaneously accessing the plugin path get/set operations.

As an example, assume there exists a mutex lock called PLUGINLOCK.
Then any processor accessing the plugin paths should operate as follows:
````
lock(PLUGINLOCK);
nc_plugin_path_get(...);
<rebuild plugin path>
nc_plugin_path_set(...);
unlock(PLUGINLOCK);
````
## Internal Architecture

It is assumed here that there only needs to be a single set of plugin path
directories that is shared by all filter code and is independent of any file descriptor; it is global in other words.
This means, for example, that the path list for NCZarr and for HDF5 will always be the same.

However internally, processing the set of plugin paths depends on the particular
NC_FORMATX value (NC_FORMATX_NC_HDF5 and NC_FORMATX_NCZARR, currently).
So the *nc_plugin_path_set* function, will take the paths it is given and
propagate them to each of the NC_FORMATX dispatchers to store in a way that is
appropriate to the given dispatcher.

There is a complication with respect to the *nc_plugin_path_get* function.
It is possible for users to bypass the netcdf API and modify the HDF5 plugin paths directly. This can result in an inconsistent plugin path between the value
used by HDF5 and the global value used by netcdf-c. Since there is no obvious fix for this, we warn the user of this possibility and otherwise ignore it.

## Test Changes
* Two new tests
    a. unit_test/run_pluginpaths.sh -- was created to test this new capability.
    b. A new test utility has been added as *unit_test/run_dfaltpluginpath.sh* to test the default plugin path list.

## Documentation
* A new file -- docs/pluginpath.md -- provides documentation of the new API.

## Misc. Changes
1. Add some path manipulation utilities to netcf_aux.h
2. Fix the construction of netcdf_json.h as a BUILT_SOURCE
3. Fix some minor bugs in netcdf_json.h
4. Convert netcdf_json.h and netcdf_proplist.h to BUILT_SOURCE.
5. Add NETCDF_ENABLE_HDF5 as synonym for USE_HDF5
6. Fix some size_t <-> int conversion warnings.
7. Encountered and fixed the Windows \r\n problem in tst_pluginpaths.c.
8. Cleanup some minor CMakeLists.txt problems.

## Addendum: Proposed API

The API makes use of a counted vector of strings representing the sequence of directories in the path. The relevant type definition is as follows.
````
typedef struct NCPluginList {size_t ndirs; char** dirs;} NCPluginList;
````

The API proposed in this PR looks like this (from netcdf-c/include/netcdf_filter.h).

````int nc_plugin_path_ndirs(size_t* ndirsp);````

    This function returns the number of directories in the sequence if internal directories of the internal plugin path list.

    The argument is as follows:
    - *ndirsp* store the number of directories in this memory.

* ````int nc_plugin_path_get(NCPluginList* dirs);````

    This function returns the current sequence of directories from the internal plugin path list. Since this function does not modify the plugin path, it does not need to be locked; it is only when used to get the path to be modified that locking is required.

    The argument is as follows:
    - *dirs* counted vector for storing the sequence of directies in the internal path list.

    If the value of *dirs.dirs is NULL (the normal case), then memory is allocated to hold the vector of directories. Otherwise, use the memory of *dirs.dirs* to hold the vector of directories.

* ````int nc_plugin_path_set(const NCPluginList* dirs);````

    This function empties the current internal path sequence and replaces it with the sequence of directories argument. Using an *ndirs* argument of 0 will clear the set of plugin paths.

    The argument are as follows:
    - *dirs* counted vector for storing the sequence of directies in the internal path list.
DennisHeimbigner added a commit to DennisHeimbigner/netcdf-c that referenced this issue Sep 30, 2024
re: Unidata#2753

As suggested by Ed Hartnett, This PR extends the netcdf.h API to support programmatic control over the search path used to locate plugins.

I created several different APIs, but finally settled on the following API as being the simplest possible. It does have the disadvantage that it requires use of a global lock (not implemented) if used in a threaded environment.

Specifically, note that modifying the plugin path must be done "atomically". That is, in a multi-threaded environment, it is important that the sequence of actions involved in setting up the plugin path must be done by a single processor or in some other way as to guarantee that two or more processors are not simultaneously accessing the plugin path get/set operations.

As an example, assume there exists a mutex lock called PLUGINLOCK. Then any processor accessing the plugin paths should operate as follows:
````
lock(PLUGINLOCK);
nc_plugin_path_get(...);
<rebuild plugin path>
nc_plugin_path_set(...);
unlock(PLUGINLOCK);
````
## Internal Architecture

It is assumed here that there only needs to be a single set of plugin path directories that is shared by all filter code and is independent of any file descriptor; it is global in other words. This means, for example, that the path list for NCZarr and for HDF5 will always be the same.

However internally, processing the set of plugin paths depends on the particular NC_FORMATX value (NC_FORMATX_NC_HDF5 and NC_FORMATX_NCZARR, currently). So the *nc_plugin_path_set* function, will take the paths it is given and propagate them to each of the NC_FORMATX dispatchers to store in a way that is appropriate to the given dispatcher.

There is a complication with respect to the *nc_plugin_path_get* function. It is possible for users to bypass the netcdf API and modify the HDF5 plugin paths directly. This can result in an inconsistent plugin path between the value used by HDF5 and the global value used by netcdf-c. Since there is no obvious fix for this, we warn the user of this possibility and otherwise ignore it.

## Test Changes
* Two new tests
    a. unit_test/run_pluginpaths.sh -- was created to test this new capability.
    b. A new test utility has been added as *unit_test/run_dfaltpluginpath.sh* to test the default plugin path list.

## Documentation
* A new file -- docs/pluginpath.md -- provides documentation of the new API.

## Misc. Changes
1. Add some path manipulation utilities to netcf_aux.h
2. Fix the construction of netcdf_json.h as a BUILT_SOURCE
3. Fix some minor bugs in netcdf_json.h
4. Convert netcdf_json.h and netcdf_proplist.h to BUILT_SOURCE.
5. Add NETCDF_ENABLE_HDF5 as synonym for USE_HDF5
6. Fix some size_t <-> int conversion warnings.
7. Encountered and fixed the Windows \r\n problem in tst_pluginpaths.c.
8. Cleanup some minor CMakeLists.txt problems.

## Addendum: Proposed API

The API makes use of a counted vector of strings representing the sequence of directories in the path. The relevant type definition is as follows.
````
typedef struct NCPluginList {size_t ndirs; char** dirs;} NCPluginList;
````

The API proposed in this PR looks like this (from netcdf-c/include/netcdf_filter.h).

* ````int nc_plugin_path_ndirs(size_t* ndirsp);````
    Arguments: *ndirsp* -- store the number of directories in this memory.

    This function returns the number of directories in the sequence if internal directories of the internal plugin path list.

* ````int nc_plugin_path_get(NCPluginList* dirs);````
    Arguments:  *dirs* -- counted vector for storing the sequence of directies in the internal path list.

    This function returns the current sequence of directories from the internal plugin path list. Since this function does not modify the plugin path, it does not need to be locked; it is only when used to get the path to be modified that locking is required.  If the value of *dirs.dirs* is NULL (the normal case), then memory is allocated to hold the vector of directories. Otherwise, use the memory of *dirs.dirs* to hold the vector of directories.

* ````int nc_plugin_path_set(const NCPluginList* dirs);````
    Arguments: *dirs* -- counted vector for providing the new sequence of directories in the internal path list.

    This function empties the current internal path sequence and replaces it with the sequence of directories argument. Using an *ndirs* argument of 0 will clear the set of plugin paths.
DennisHeimbigner added a commit to DennisHeimbigner/netcdf-c that referenced this issue Oct 19, 2024
…earch path

Replaces PR Unidata#3024
         and PR Unidata#3033

re: Unidata#2753

As suggested by Ed Hartnett, This PR extends the netcdf.h API to support programmatic control over the search path used to locate plugins.

I created several different APIs, but finally settled on the following API as being the simplest possible. It does have the disadvantage that it requires use of a global lock (not implemented) if used in a threaded environment.

Specifically, note that modifying the plugin path must be done "atomically". That is, in a multi-threaded environment, it is important that the sequence of actions involved in setting up the plugin path must be done by a single processor or in some other way as to guarantee that two or more processors are not simultaneously accessing the plugin path get/set operations.

As an example, assume there exists a mutex lock called PLUGINLOCK. Then any processor accessing the plugin paths should operate as follows:
````
lock(PLUGINLOCK);
nc_plugin_path_get(...);
<rebuild plugin path>
nc_plugin_path_set(...);
unlock(PLUGINLOCK);
````
## Internal Architecture

It is assumed here that there only needs to be a single set of plugin path directories that is shared by all filter code and is independent of any file descriptor; it is global in other words. This means, for example, that the path list for NCZarr and for HDF5 will always be the same.

However internally, processing the set of plugin paths depends on the particular NC_FORMATX value (NC_FORMATX_NC_HDF5 and NC_FORMATX_NCZARR, currently). So the *nc_plugin_path_set* function, will take the paths it is given and propagate them to each of the NC_FORMATX dispatchers to store in a way that is appropriate to the given dispatcher.

There is a complication with respect to the *nc_plugin_path_get* function. It is possible for users to bypass the netcdf API and modify the HDF5 plugin paths directly. This can result in an inconsistent plugin path between the value used by HDF5 and the global value used by netcdf-c. Since there is no obvious fix for this, we warn the user of this possibility and otherwise ignore it.

## Test Changes
* New tests<br>
    a. unit_test/run_pluginpaths.sh -- was created to test this new capability.<br>
    b. A new test utility has been added as *unit_test/run_dfaltpluginpath.sh* to test the default plugin path list.
* New test support utilities<br>
    a. unit_test/ncpluginpath.c -- report current state of the plugin path<br>
    b. unit_test/tst_pluginpaths.c -- test program to support run_pluginpaths.sh

## Documentation
* A new file -- docs/pluginpath.md -- provides documentation of the new API. It includes some
  material taken fro filters.md.

## Other Major Changes
1. Cleanup the whole plugin path decision tree. This is described in the *docs/pluginpath.md* document and summarized in Addendum 2 below.
2. I noticed that the ncdump/testpathcvt.sh had been disabled, so fixed and re-enabled it. This necessitated some significant changes to dpathmgr.c.

## Misc. Changes
1. Add some path manipulation utilities to netcf_aux.h
2. Fix some minor bugs in netcdf_json.h
3. Convert netcdf_json.h and netcdf_proplist.h to BUILT_SOURCE.
4. Add NETCDF_ENABLE_HDF5 as synonym for USE_HDF5
5. Fix some size_t <-> int conversion warnings.
6. Encountered and fixed the Windows \r\n problem in tst_pluginpaths.c.
7. Cleanup some minor CMakeLists.txt problems.
8. Provide an implementation of echo -n since it appears to not be
   available on all platforms.
9. Add a property list mechanism to pass environmental information to filters.
10. Cleanup Doxyfile.in
11. Fixed a memory leak in libdap2; surprised that I did not find this earlier.

## Addendum 1: Proposed API

The API makes use of a counted vector of strings representing the sequence of directories in the path. The relevant type definition is as follows.
````
typedef struct NCPluginList {size_t ndirs; char** dirs;} NCPluginList;
````

The API proposed in this PR looks like this (from netcdf-c/include/netcdf_filter.h).

* ````int nc_plugin_path_ndirs(size_t* ndirsp);````
    Arguments: *ndirsp* -- store the number of directories in this memory.

    This function returns the number of directories in the sequence if internal directories of the internal plugin path list.

* ````int nc_plugin_path_get(NCPluginList* dirs);````
    Arguments:  *dirs* -- counted vector for storing the sequence of directies in the internal path list.

    This function returns the current sequence of directories from the internal plugin path list. Since this function does not modify the plugin path, it does not need to be locked; it is only when used to get the path to be modified that locking is required.  If the value of *dirs.dirs* is NULL (the normal case), then memory is allocated to hold the vector of directories. Otherwise, use the memory of *dirs.dirs* to hold the vector of directories.

* ````int nc_plugin_path_set(const NCPluginList* dirs);````
    Arguments: *dirs* -- counted vector for providing the new sequence of directories in the internal path list.

    This function empties the current internal path sequence and replaces it with the sequence of directories argument. Using an *ndirs* argument of 0 will clear the set of plugin paths.

## Addendum 2: Build-Time and Run-Time Constants.

### Build-Time Constants
<table style="border:2px solid black;border-collapse:collapse">
<tr style="outline: thin solid;" align="center"><td colspan="4">Table showing the build-time computation of NETCDF_PLUGIN_INSTALL_DIR and NETCDF_PLUGIN_SEARCH_PATH.</td>
<tr style="outline: thin solid" ><th>--with-plugin-dir<th>--prefix<th>NETCDF_PLUGIN_INSTALL_DIR<th>NETCDF_PLUGIN_SEARCH_PATH
<tr style="outline: thin solid" ><td>undefined<td>undefined<td>undefined<td>PLATFORMDEFALT
<tr style="outline: thin solid" ><td>undefined<td>&lt;abspath-prefix&gt;<td>&lt;abspath-prefix&gt;/hdf5/lib/plugin<td>&lt;abspath-prefix&gt;/hdf5/lib/plugin&lt;SEP&gt;PLATFORMDEFALT
<tr style="outline: thin solid" ><td>&lt;abspath-plugins&gt;<td>N.A.<td>&lt;abspath-plugins&gt;<td>&lt;abspath-plugins&gt;&lt;SEP&gt;PLATFORMDEFALT
</table>

<table style="border:2px solid black;border-collapse:collapse">
<tr style="outline: thin solid" align="center"><td colspan="2">Table showing the computation of the initial global plugin path</td>
<tr style="outline: thin solid"><th>HDF5_PLUGIN_PATH<th>Initial global plugin path
<tr style="outline: thin solid"><td>undefined<td>NETCDF_PLUGIN_SEARCH_PATH
<tr style="outline: thin solid"><td>&lt;path1;...pathn&gt;<td>&lt;path1;...pathn&gt;
</table>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants