Skip to content

Commit

Permalink
Change: Only embed specs/namespaces for types that are included in NW…
Browse files Browse the repository at this point in the history
…B file on export (#615)

* Only include namespaces for types that are included in NWB file on export (Issue #607)

* Add functionality for installing extensions

* Minor fixes

* Update comment

* Add comment + print message when extension has been installed

* Update installExtension.m

* Fix changed variable name

* Update matnwb_createNwbInstallExtension.m

Update docstring

* Create listNwbTypeHierarchy.m

Add utility function for listing the type hierarchy of an nwb type

* Add private method for embedding specifications to file on export

* Fix variable name

* Add workflow for updating nwbInstallExtension

* Add option to save extension in custom location

* Create InstallExtensionTest.m

* Update docstring

* Change dispExtensionInfo to return info instead of displaying + add test

* Reorganize code into separate functions and add tests

* Minor changes to improve test coverage

* add nwbInstallExtension to docs

* Update update_extension_list.yml

Add schedule event for workflow to update nwbInstallExtension

* Update downloadExtensionRepository.m

Remove local function

* Update docstring for nwbInstallExtension

* Fix docstring indentation in nwbInstallExtension

* Add doc pages describing how to use (ndx) extensions

* Fix typo

* Update +tests/+unit/InstallExtensionTest.m

Co-authored-by: Ben Dichter <[email protected]>

* Update docs/source/pages/getting_started/using_extensions/generating_extension_api.rst

Co-authored-by: Ben Dichter <[email protected]>

* Add docstrings for functions to retrieve and list extension info

* Fix docstring formatting/whitespace

* Update listExtensions.m

Add example to docstring

* Move static test methods into io.internal.h5 namespace

Introduce some functions that will be useful later

* Update writeEmbeddedSpecifications.m

Add arguments block, fix function name

* Add validateEmbeddedSpecifications

* Update NwbFile.m

Redefine listNwbTypes method, add validation of embedded namespaces

* Create listEmbeddedSpecNamespaces.m

* Update nwbExportTest.m

* Update test for spec/namespace embedding

* Update read_indexed_column.m

* Add disclaimer in deleteGroup function

* Update read_indexed_column.m

* Fix broken test

* add test-requirement

* Fix: Ensure object is group before deleting

* Fix error id

* Add unittests for functions in io.internal.h5 namespace

* Update nwbExportTest.m

Added comments and a better test to test for warning with ID 'NWB:validators:MissingEmbeddedNamespace'

* Fix failing tests

---------

Co-authored-by: Ben Dichter <[email protected]>
  • Loading branch information
ehennestad and bendichter authored Feb 11, 2025
1 parent 1a2f696 commit c3eedb7
Show file tree
Hide file tree
Showing 26 changed files with 1,362 additions and 46 deletions.
20 changes: 20 additions & 0 deletions +io/+internal/+h5/deleteAttribute.m
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
function deleteAttribute(fileReference, objectLocation, attributeName)
% deleteAttribute - Delete the specified attribute from an NWB file

arguments
fileReference {io.internal.h5.mustBeH5FileReference}
objectLocation (1,1) string
attributeName (1,1) string
end

objectLocation = io.internal.h5.validateLocation(objectLocation);

% Open the HDF5 file in read-write mode
[fileId, fileCleanupObj] = io.internal.h5.resolveFileReference(fileReference, "w"); %#ok<ASGLU>

% Open the object (dataset or group)
[objectId, objectCleanupObj] = io.internal.h5.openObject(fileId, objectLocation); %#ok<ASGLU>

% Delete the attribute
H5A.delete(objectId, attributeName);
end
37 changes: 37 additions & 0 deletions +io/+internal/+h5/deleteGroup.m
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
function deleteGroup(fileReference, groupLocation)
% deleteGroup - Delete the specified group from an NWB file
%
% NB NB NB: Deleting groups & datasets from an HDF5 file does not free up space
%
% HDF5 files use a structured format to store data in hierarchical groups and
% datasets. Internally, the file maintains a structure similar to a filesystem,
% with metadata pointing to the actual data blocks.
%
% Implication: When you delete a group or dataset in an HDF5 file, the metadata
% entries for that group or dataset are removed, so they are no longer accessible.
% However, the space previously occupied by the actual data is not reclaimed or
% reused by default. This is because HDF5 does not automatically reorganize or
% compress the file when items are deleted.

arguments
fileReference {io.internal.h5.mustBeH5FileReference}
groupLocation (1,1) string
end

groupLocation = io.internal.h5.validateLocation(groupLocation);

% Open the HDF5 file in read-write mode
[fileId, fileCleanupObj] = io.internal.h5.resolveFileReference(fileReference, "w"); %#ok<ASGLU>

[objectId, objectCleanupObj] = io.internal.h5.openObject(fileId, groupLocation); %#ok<ASGLU>
objInfo = H5O.get_info(objectId);
clear objectCleanupObj

if objInfo.type == H5ML.get_constant_value('H5O_TYPE_GROUP')
% Delete the group
H5L.delete(fileId, groupLocation, 'H5P_DEFAULT');
else
error('NWB:DeleteGroup:NotAGroup', ...
'The h5 object in location "%s" is not a group', groupLocation)
end
end
28 changes: 28 additions & 0 deletions +io/+internal/+h5/listGroupNames.m
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
function groupNames = listGroupNames(fileReference, h5Location)

arguments
fileReference {io.internal.h5.mustBeH5FileReference}
h5Location (1,1) string
end

[fileId, fileCleanupObj] = io.internal.h5.resolveFileReference(fileReference); %#ok<ASGLU>

% Open the specified location (group)
[groupId, groupCleanupObj] = io.internal.h5.openGroup(fileId, h5Location); %#ok<ASGLU>

% Use H5L.iterate to iterate over the links
[~, ~, groupNames] = H5L.iterate(...
groupId, "H5_INDEX_NAME", "H5_ITER_INC", 0, @collectGroupNames, {});

% Define iteration function
function [status, groupNames] = collectGroupNames(groupId, name, groupNames)
% Only retrieve name of groups
objId = H5O.open(groupId, name, 'H5P_DEFAULT');
objInfo = H5O.get_info(objId);
if objInfo.type == H5ML.get_constant_value('H5O_TYPE_GROUP')
groupNames{end+1} = name;
end
H5O.close(objId);
status = 0; % Continue iteration
end
end
17 changes: 17 additions & 0 deletions +io/+internal/+h5/mustBeH5File.m
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
function mustBeH5File(value)
arguments
value {mustBeFile}
end

VALID_FILE_ENDING = ["h5", "nwb"];
validExtensions = "." + VALID_FILE_ENDING;

hasH5Extension = endsWith(value, validExtensions, 'IgnoreCase', true);

if ~hasH5Extension
exception = MException(...
'NWB:validators:mustBeH5File', ...
'Expected file "%s" to have .h5 or .nwb file extension', value);
throwAsCaller(exception)
end
end
15 changes: 15 additions & 0 deletions +io/+internal/+h5/mustBeH5FileReference.m
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
function mustBeH5FileReference(value)
arguments
value {mustBeA(value, ["char", "string", "H5ML.id"])}
end

if isa(value, "char") || isa(value, "string")
try
io.internal.h5.mustBeH5File(value)
catch ME
throwAsCaller(ME)
end
else
% value is a H5ML.id, ok!
end
end
46 changes: 46 additions & 0 deletions +io/+internal/+h5/openFile.m
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
function [fileId, fileCleanupObj] = openFile(fileName, permission)
% openFile Opens an HDF5 file with the specified permissions and ensures cleanup.
%
% [fileId, fileCleanupObj] = io.internal.h5.openFile(fileName) opens the HDF5
% file specified by fileName in read-only mode ('r') by default.
%
% [fileId, fileCleanupObj] = io.internal.h5.openFile(fileName, permission)
% opens the HDF5 file specified by fileName with the access mode defined by
% permission.
%
% Input Arguments:
% fileName - A string or character vector specifying the path to the
% HDF5 file. This must be a .h5 or .nwb file.
%
% permission - (Optional) A scalar string specifying the file access mode.
% Valid values are "r" for read-only (default) and "w" for
% read-write.
%
% Output Arguments:
% fileId - The file identifier returned by H5F.open, used to
% reference the open file.
%
% fileCleanupObj - A cleanup object (onCleanup) that ensures the file is
% closed automatically when fileCleanupObj goes out of
% scope.
%
% Example:
% [fid, cleanupObj] = io.internal.h5.openFile("data.h5", "w");
% % Use fid for file operations.
% % When cleanupObj is cleared or goes out of scope, the file is
% % automatically closed.

arguments
fileName {io.internal.h5.mustBeH5File}
permission (1,1) string {mustBeMember(permission, ["r", "w"])} = "r"
end

switch permission
case "r"
accessFlag = 'H5F_ACC_RDONLY';
case "w"
accessFlag = 'H5F_ACC_RDWR';
end
fileId = H5F.open(fileName, accessFlag, 'H5P_DEFAULT');
fileCleanupObj = onCleanup(@(fid) H5F.close(fileId));
end
13 changes: 13 additions & 0 deletions +io/+internal/+h5/openGroup.m
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
function [groupId, groupCleanupObj] = openGroup(fileId, h5Location)
% openGroup Opens an HDF5 group at given location and ensures cleanup.

arguments
fileId {mustBeA(fileId, "H5ML.id")}
h5Location (1,1) string
end

% Open the specified location (group)
groupLocation = io.internal.h5.validateLocation(h5Location);
groupId = H5G.open(fileId, groupLocation);
groupCleanupObj = onCleanup(@(gid) H5G.close(groupId));
end
13 changes: 13 additions & 0 deletions +io/+internal/+h5/openObject.m
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
function [objectId, objectCleanupObj] = openObject(fileId, objectLocation)
% openObject Opens an HDF5 object at given location and ensures cleanup.

arguments
fileId {mustBeA(fileId, "H5ML.id")}
objectLocation (1,1) string
end

% Open the object (dataset or group)
objectLocation = io.internal.h5.validateLocation(objectLocation);
objectId = H5O.open(fileId, objectLocation, 'H5P_DEFAULT');
objectCleanupObj = onCleanup(@(oid) H5O.close(objectId));
end
30 changes: 30 additions & 0 deletions +io/+internal/+h5/resolveFileReference.m
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
function [h5FileId, fileCleanupObj] = resolveFileReference(fileReference, permission)
% resolveFileReference - Resolve a file reference to a H5 File ID.
%
% Utility method to resolve a file reference, which can be either a
% filepath or a file id for a h5 file.
%
% The returned value will always be a file ID. This allows functions that
% does operations on h5 files to receive either a file path or a file id
%
% Note: If the file reference is a file ID for an open file, the permission
% might be different than the provided/requested permission.

arguments
fileReference {io.internal.h5.mustBeH5FileReference}
permission (1,1) string {mustBeMember(permission, ["r", "w"])} = "r"
end

if isa(fileReference, "char") || isa(fileReference, "string")
% Need to open the file
if isfile(fileReference)
[h5FileId, fileCleanupObj] = io.internal.h5.openFile(fileReference, permission);
else
error('File "%s" does not exist', fileReference)
end
else
h5FileId = fileReference;
% If the file is already open, we are not responsible for closing it
fileCleanupObj = [];
end
end
9 changes: 9 additions & 0 deletions +io/+internal/+h5/validateLocation.m
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
function locationName = validateLocation(locationName)
arguments
locationName (1,1) string
end

if ~startsWith(locationName, "/")
locationName = "/" + locationName;
end
end
11 changes: 11 additions & 0 deletions +io/+spec/listEmbeddedSpecNamespaces.m
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
function namespaceNames = listEmbeddedSpecNamespaces(fileReference)

arguments
fileReference {io.internal.h5.mustBeH5FileReference}
end

[fileId, fileCleanupObj] = io.internal.h5.resolveFileReference(fileReference); %#ok<ASGLU>

specLocation = io.spec.internal.readEmbeddedSpecLocation(fileId);
namespaceNames = io.internal.h5.listGroupNames(fileId, specLocation);
end
48 changes: 48 additions & 0 deletions +io/+spec/validateEmbeddedSpecifications.m
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
function validateEmbeddedSpecifications(h5_file_id, expectedNamespaceNames)
% validateEmbeddedSpecifications - Validate the embedded specifications
%
% This function does two things:
% 1) Displays a warning if specifications of expected namespaces
% are not embedded in the file.
% E.g if cached namespaces were cleared prior to export.
%
% 2) Deletes specifications for unused namespaces that are embedded.
% - E.g. If neurodata type from an embedded namespace was removed and the
% file was re-exported

% NB: Input h5_file_id must point to a file opened with write access

specLocation = io.spec.internal.readEmbeddedSpecLocation(h5_file_id);
embeddedNamespaceNames = io.internal.h5.listGroupNames(h5_file_id, specLocation);

checkMissingNamespaces(expectedNamespaceNames, embeddedNamespaceNames)

unusedNamespaces = checkUnusedNamespaces(...
expectedNamespaceNames, embeddedNamespaceNames);

if ~isempty(unusedNamespaces)
deleteUnusedNamespaces(h5_file_id, unusedNamespaces, specLocation)
end
end

function checkMissingNamespaces(expectedNamespaceNames, embeddedNamespaceNames)
% checkMissingNamespaces - Check if any namespace specs are missing from the file
missingNamespaces = setdiff(expectedNamespaceNames, embeddedNamespaceNames);
if ~isempty(missingNamespaces)
missingNamespacesStr = strjoin(" " + string(missingNamespaces), newline);
warning('NWB:validators:MissingEmbeddedNamespace', 'Namespace is missing:\n%s', missingNamespacesStr)
end
end

function unusedNamespaces = checkUnusedNamespaces(expectedNamespaceNames, embeddedNamespaceNames)
% checkUnusedNamespaces - Check if any namespace specs in the file are unused
unusedNamespaces = setdiff(embeddedNamespaceNames, expectedNamespaceNames);
end

function deleteUnusedNamespaces(fileId, unusedNamespaces, specRootLocation)
for i = 1:numel(unusedNamespaces)
thisName = unusedNamespaces{i};
namespaceSpecLocation = strjoin( {specRootLocation, thisName}, '/');
io.internal.h5.deleteGroup(fileId, namespaceSpecLocation)
end
end
11 changes: 9 additions & 2 deletions +io/+spec/writeEmbeddedSpecifications.m
Original file line number Diff line number Diff line change
@@ -1,4 +1,11 @@
function writeEmbeddedSpecifications(fid, jsonSpecs)
% writeEmbeddedSpecifications - Write schema specifications to an NWB file

arguments
fid % File id for a h5 file
jsonSpecs % String representation of schema specifications in json format
end

specLocation = io.spec.internal.readEmbeddedSpecLocation(fid);

if isempty(specLocation)
Expand Down Expand Up @@ -37,8 +44,8 @@ function writeEmbeddedSpecifications(fid, jsonSpecs)
function versionNames = getVersionNames(namespaceGroupId)
[~, ~, versionNames] = H5L.iterate(namespaceGroupId,...
'H5_INDEX_NAME', 'H5_ITER_NATIVE',...
0, @removeGroups, {});
function [status, versionNames] = removeGroups(~, name, versionNames)
0, @appendName, {});
function [status, versionNames] = appendName(~, name, versionNames)
versionNames{end+1} = name;
status = 0;
end
Expand Down
21 changes: 21 additions & 0 deletions +schemes/+utility/listNwbTypeHierarchy.m
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
function parentTypeNames = listNwbTypeHierarchy(nwbTypeName)
% listNwbTypeHierarchy - List the NWB type hierarchy for an NWB type
arguments
nwbTypeName (1,1) string
end

parentTypeNames = string.empty; % Initialize an empty cell array
currentType = nwbTypeName; % Start with the specific type

while ~strcmp(currentType, 'types.untyped.MetaClass')
parentTypeNames(end+1) = currentType; %#ok<AGROW>

% Use MetaClass information to get the parent type
metaClass = meta.class.fromName(currentType);
if isempty(metaClass.SuperclassList)
break; % Reached the base type
end
% NWB parent type should always be the first superclass in the list
currentType = metaClass.SuperclassList(1).Name;
end
end
Loading

0 comments on commit c3eedb7

Please sign in to comment.