-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
First pass at validating an XML document against the TigerData XSD (#899
) * First pass at validating an XML document against the TigerData XSD * Minor tweak * Adding a few notes/questions to the example XML * Added summary document * Added note/question * Moar documentation * Clarified some sections * Update lib/assets/xml.xsd Co-authored-by: carolyncole <[email protected]> --------- Co-authored-by: carolyncole <[email protected]>
- Loading branch information
1 parent
7884518
commit 95c91ea
Showing
5 changed files
with
3,081 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,98 @@ | ||
# TigerData Schema Overall Structure | ||
|
||
An opinionated summary of some of the most important parts of the TigerData XML schema referenced in [GitHub issue #186](https://github.com/pulibrary/tigerdata-app/issues/896): https://drive.google.com/file/d/1qrdPXwAt57uqPmHxh0Y86zac4RvtoJ2e/view | ||
|
||
This is a human created summary and it includes copied and pasted sections from the actual schema (e.g. the description text for the elements were taken and tweaked from the actual schema.) When in doubt, the source of truth is the XML schema referenced above. | ||
|
||
The XML schema defines both `primitives` (types, attributes, elements, groups) and the `payload` (i.e. the root element). | ||
|
||
The XML schema is defined from the most granular type to the root element, since that's the way the XML standard prescribes it. This summary presents the data in the opposite direction: starting at the root and then diving into the more granular elements. | ||
|
||
|
||
## Root | ||
The Root element of any metadata record for TigerData is the `resource`. This can be used for both `Projects` and `Items` in TigerData. In this context `Items` refer to files within the project. | ||
|
||
If the `resourceClass` of the `resource` is `Project` then fields from the `projectFields` group must be used. If the `resourceClass` is `Item` then fields from the `itemFields` group must be used. | ||
|
||
Below is an snipet of a project definition, notice the `resourceClass` defines that this is a `Project`: | ||
|
||
``` | ||
<resource resourceClass="Project" resourceID="10.34770/az09-0001" resourceIDType="DOI"> | ||
<projectID projectIDType="DOI" inherited="false" discoverable="true" trackingLevel="ResourceRecord">10.34770/az09-0001</projectID> | ||
... | ||
</resource> | ||
``` | ||
|
||
At this point we have not explored how, in practice, `Items` are going to be managed within a `resource`, but in theory the schema supports managing both: `Projects` and `Items`. | ||
|
||
|
||
## Project Fields and Item Fields | ||
|
||
There are two distinctive groups within the TigerData schema: `projectFields` and `itemFields`. | ||
|
||
* `projectFields`: A group of all elements/groups included in the TigerData standard metadata for projects. | ||
|
||
This includes `projectID`, `alternativeIDs`, `parentProject`, `projectRoles`, `projectDescription`, `storageAndAccess`, `additionalProjectInformation`, `supplementalMetadata`, and `projectProvenance`. | ||
|
||
* `itemFields`: A group of all elements/groups included in the TigerData standard metadata for items. | ||
|
||
This includes: `itemID`, `alternativeIDs`, `parentProject`, `dataUsers`, `title`, `description`, `resourceType`, `supplementalMetadata`, `languages`, `licenses`, `fundingReferences`, `duaReferences`, and `dates`. | ||
|
||
Notice that `projectFields` apply only to projects (i.e. not to items), whereas the reverse is true for `itemFields` (i.e. they apply only to items and not to projects). However, although `projectFields` and `itemFields` are mutually exclusive they do share a lot of types and attribute types. | ||
|
||
|
||
## Element Groups | ||
|
||
Group definitions for elements, including some reference to common elements (see next section) and some new element definitions within: | ||
|
||
* `projectRoles`: A group of all elements included in TigerData project roles. | ||
|
||
Does not apply to Items. | ||
|
||
Includes `dataSponsor`, `dataManager`, and `dataUsers` (`dataUsers` in turn includes many `dataUser`) | ||
|
||
* `projectDescription`: A group of all elements included in TigerData project descriptions. | ||
|
||
Does not apply to Items. | ||
|
||
Includes `researchDomains`, `departments`, `projectDirectory`, `title`, `description`, and `languages`. | ||
|
||
**Note:** `projectDescription` and `description` will be source of confusion, could we rename one of them? | ||
|
||
* `storageAndAccess`: A group of all elements included in TigerData project storage and access needs. | ||
|
||
Does not apply to Items. | ||
|
||
* `additionalProjectInformation`: A group of all elements included in TigerData additional project information fields. | ||
|
||
Does not apply to Items. | ||
|
||
* `supplementalMetadata`: A group of all elements included in TigerData supplemental metadata fields. | ||
|
||
May apply to either Projects and Items. | ||
|
||
|
||
## Common Types | ||
|
||
Simple and complex types that are either necessary building blocks for further types or that have application in various places (e.g., both elements and attributes) | ||
|
||
* `doiType`: Standard type used for DOI values (just the prefix and suffix; not a full URL). | ||
* `projectIDValueType`: Standard type for the values of projectID and parentProject fields. *Applies to both Projects and Items.* | ||
* `netIDType`: Standard type used for values meant to be a Princeton NetID. | ||
* `limitedTextType`: Specification for the practical limit applied to free text values. | ||
* `textType`: Standard type used for free text values | ||
* `pathSafeType`: Primitive type used within pathType. Restricts to alphanumeric characters, underscore, forward and back slashes, and minus-dash. | ||
* `byteUnitType`: Standard type that defines the controlled vocabulary for byte units in storageQuantityType (B, KB, MB, ...) | ||
* `dateOrRangeType`: Standard type used for values that may be either dates or date ranges. | ||
|
||
|
||
## Common Elements | ||
|
||
Element definitions that appear in multiple groups and/or apply to both projects and items: | ||
|
||
* `alternativeID`: An alternative identifier for the resource (not the standard TigerData projectID or itemID). | ||
* `alternativeIDs`: The container element for all alternative IDs for a resource. | ||
* `parentProject`: The ID of the project to which the resource belongs directly. *Applies to both Projects and Items.* | ||
* `dataSponsor`: The person who takes primary responsibility for the project. Does not apply to Items. | ||
* `dataManager`: The person who manages the day-to-day activities for the project. Does not apply to Items. | ||
* `dataUser`: A person who has access privileges to the resource. *May apply to either Projects or Items.* |
148 changes: 148 additions & 0 deletions
148
lib/assets/TigerData_MetadataExample-Project_2024-08-27.xml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,148 @@ | ||
<!-- | ||
Root node is a `resource` with class `Project`. | ||
The elements inside of this node are all part of xs:group `projectFields` | ||
--> | ||
<resource resourceClass="Project" resourceID="10.34770/az09-0001" resourceIDType="DOI"> | ||
<projectID projectIDType="DOI" inherited="false" discoverable="true" trackingLevel="ResourceRecord">10.34770/az09-0001</projectID> | ||
<alternativeIDs discoverable="true" trackingLevel="ResourceRecord"> | ||
<alternativeID alternativeIDType="Local accession number" inherited="false">abc123</alternativeID> | ||
</alternativeIDs> | ||
<!-- | ||
I think `parentProject` this is a new concept. | ||
Need to ask Matt. | ||
Is this to handle labs with many projects vs normal/small projects? | ||
--> | ||
<parentProject projectIDType="DOI" inherited="true" discoverable="true" trackingLevel="ResourceRecord">10.34770/az09-0000</parentProject> | ||
<dataSponsor userID="abcd12" userIDType="NetID" discoverable="true" inherited="true" trackingLevel="ResourceRecord"> | ||
<netID>abcd12</netID> | ||
<orcid>https://orcid.org/0000-0001-2345-6789</orcid> | ||
<fullName>Family, Given</fullName> | ||
<givenName>Given</givenName> | ||
<familyName>Family</familyName> | ||
<nameDate>2024-08-21</nameDate> | ||
<alternativeNameIdentifier nameIdentifierScheme="ScopusAuthorID" schemeURI="https://www.elsevier.com/products/scopus/author-profiles">123456789</alternativeNameIdentifier> | ||
</dataSponsor> | ||
<dataManager userID="def3" userIDType="NetID" discoverable="true" inherited="true" trackingLevel="ResourceRecord"/> | ||
<dataUsers trackingLevel="ResourceRecord"> | ||
<dataUser userID="ghijk" userIDType="NetID" readOnly="true" inherited="true" discoverable="true"> | ||
<netID>ghijk</netID> | ||
<orcid>https://orcid.org/0000-0001-2345-6789</orcid> | ||
<fullName>Family1 Family2, Given Jr.</fullName> | ||
<givenName>Given Jr.</givenName> | ||
<familyName>Family1 Family2</familyName> | ||
<nameDate>2024-08-21</nameDate> | ||
<alternativeNameIdentifier nameIdentifierScheme="ScopusAuthorID" schemeURI="https://www.elsevier.com/products/scopus/author-profiles">123456789</alternativeNameIdentifier> | ||
</dataUser> | ||
<dataUser userID="lmno8" userIDType="NetID" readOnly="false" inherited="false" discoverable="false"/> | ||
</dataUsers> | ||
<researchDomains discoverable="true" trackingLevel="ResourceRecord"> | ||
<researchDomain inherited="true">Natural Sciences</researchDomain> | ||
<researchDomain inherited="true">Engineering</researchDomain> | ||
</researchDomains> | ||
<departments discoverable="true" trackingLevel="ResourceRecord"> | ||
<!-- | ||
Where are the departments inherited from? The parent project? | ||
Or does this mean inherited from a project into the files of the project? | ||
--> | ||
<department departmentCode="23500" departmentAbbreviation="CHM" inherited="true">Chemistry</department> | ||
<department departmentCode="25300" departmentAbbreviation="CBE" inherited="true">Chemical and Biological Engineering</department> | ||
</departments> | ||
<!-- | ||
The protocol attribute in the `projectDirectoryPath` is new right? | ||
Can there be more than one? | ||
--> | ||
<projectDirectory inherited="false" discoverable="false" trackingLevel="InternalUseOnly"> | ||
<projectDirectoryPath protocol="NFS">/tigerdata/abc/123</projectDirectoryPath> | ||
<projectDirectoryPath protocol="SMB">\\tigerdata\abc\123</projectDirectoryPath> | ||
<requested protocol="NFS">/tigerdata/abc/123</requested> | ||
<approved protocol="NFS">/tigerdata/abc/123</approved> | ||
</projectDirectory> | ||
<title xml:lang="en" inherited="false" discoverable="true" trackingLevel="ResourceRecord">Example Title</title> | ||
<description xml:lang="en" inherited="false" discoverable="true" trackingLevel="ResourceRecord">This is just an example description.</description> | ||
<storageCapacity inherited="false" discoverable="false" trackingLevel="InternalUseOnly"> | ||
<storageCapacitySetting> | ||
<size>500</size> | ||
<unit>GB</unit> | ||
</storageCapacitySetting> | ||
<requested> | ||
<size>500</size> | ||
<unit>GB</unit> | ||
</requested> | ||
<approved> | ||
<size>500</size> | ||
<unit>GB</unit> | ||
</approved> | ||
</storageCapacity> | ||
<projectVisibility inherited="true" discoverable="false" trackingLevel="InternalUseOnly">Limited</projectVisibility> | ||
<storagePerformance inherited="true" discoverable="false" trackingLevel="InternalUseOnly"> | ||
<storagePerformanceSetting>Standard</storagePerformanceSetting> | ||
<requested>Standard</requested> | ||
<approved>Standard</approved> | ||
</storagePerformance> | ||
<numberOfFiles inherited="false" discoverable="false" trackingLevel="InternalUseOnly">Less than 10,000</numberOfFiles> | ||
<!-- hpc can be yes, no, or not sure --> | ||
<hpc inherited="true" discoverable="false" trackingLevel="InternalUseOnly">No</hpc> | ||
<projectPurpose inherited="true" discoverable="true" trackingLevel="InternalUseOnly">Research</projectPurpose> | ||
<provisionalProject inherited="true" discoverable="true" trackingLevel="InternalUseOnly">false</provisionalProject> | ||
<grantFunded inherited="true" discoverable="false" trackingLevel="InternalUseOnly">true</grantFunded> | ||
<!-- | ||
Notice that funding information mimics DataCite | ||
--> | ||
<fundingReferences discoverable="true" trackingLevel="ResourceRecord"> | ||
<fundingReference inherited="true"> | ||
<funderName>Example Funder</funderName> | ||
<funderID funderIDType="Crossref Funder ID" funderIDSchema="https://www.crossref.org/services/funder-registry/">abc123</funderID> | ||
<awardNumber awardURI="www.fakeuri.fake">123456</awardNumber> | ||
<awardTitle>Example Award Title</awardTitle> | ||
</fundingReference> | ||
</fundingReferences> | ||
<dates discoverable="true" trackingLevel="ResourceRecord"> | ||
<startDate inherited="true">2024-07-23</startDate> | ||
<endDate inherited="true">2026-12-31</endDate> | ||
<retirementDate inherited="true">2030-12-31</retirementDate> | ||
<publicationDate inherited="true">2027-01-01</publicationDate> | ||
<otherDate dateType="Collected" inherited="true">2024-07-23/2025-12-31</otherDate> | ||
<otherDate dateType="Updated" dateInformation="Error correction" inherited="true">2026-03-03</otherDate> | ||
</dates> | ||
<resourceType resourceTypeGeneral="Project" inherited="false" discoverable="true" trackingLevel="ResourceRecord">TigerData Project</resourceType> | ||
<licenses discoverable="true" trackingLevel="ResourceRecord"> | ||
<license licenseURI="https://creativecommons.org/licenses/by/4.0/" licenseID="CC BY 4.0" licenseIDScheme="SPDX" licenseIDSchemeURI="https://spdx.org/licenses/" inherited="true">Creative Commons Attribution 4.0 International</license> | ||
</licenses> | ||
<dataUseAgreement inherited="true" discoverable="false" trackingLevel="InternalUseOnly">true</dataUseAgreement> | ||
<duaReferences discoverable="true" trackingLevel="ResourceRecord"> | ||
<duaReference inherited="true"> | ||
<grantorName>Example Grantor</grantorName> | ||
<duaID duaURI="www.fakeuri-dua.fake">123.456</duaID> | ||
<duaTitle xml:lang="en">Example DUA Title</duaTitle> | ||
</duaReference> | ||
</duaReferences> | ||
<keywords discoverable="true" trackingLevel="ResourceRecord"> | ||
<keyword xml:lang="en" inherited="true">Example keyword</keyword> | ||
<keyword xml:lang="en" subjectScheme="Library of Congress Subject Headings (LCSH)" subjectSchemeURI="https://id.loc.gov/authorities/subjects.html" valueURI="https://id.loc.gov/authorities/subjects/sh2009009655.html" inherited="true">Climate change mitigation</keyword> | ||
<keyword xml:lang="en" subjectScheme="ANZSRC Fields of Research" subjectSchemeURI="https://www.abs.gov.au/statistics/classifications/australian-and-new-zealand-standard-research-classification-anzsrc" classificationCode="370201" inherited="true">Climate change processes</keyword> | ||
</keywords> | ||
<relations discoverable="true" trackingLevel="ResourceRecord"> | ||
<relation relatedIDType="DOI" relationType="IsCitedBy" resourceTypeGeneral="JournalArticle" inherited="false">10.21384/bar1</relation> | ||
<relation relatedIDType="DOI" relationType="IsDerivedFrom" resourceTypeGeneral="Project" inherited="false">10.21384/bar2</relation> | ||
<relation relatedIDType="DOI" relationType="HasMetadata" relatedMetadataScheme="Metadata Title" relatedMetadataSchemeURI="www.fakeuri-m.fake" relatedMetadataSchemeType="Turtle" resourceTypeGeneral="Text" inherited="false">10.21384/bar2</relation> | ||
</relations> | ||
<extendedMetadataSchemas discoverable="false" trackingLevel="InternalUseOnly"> | ||
<extendedMetadataSchema inherited="false">Example supported schema name</extendedMetadataSchema> | ||
</extendedMetadataSchemas> | ||
<projectProvenance> | ||
<submission> | ||
<requestedBy userID="abdc12" userIDType="NetID"/> | ||
<requestDateTime>2024-07-23T11:53:03-04:00</requestDateTime> | ||
<approvedBy userID="def34" userIDType="NetID"/> | ||
<approvalDateTime>2024-07-23T11:54:47-04:00</approvalDateTime> | ||
<eventNote> | ||
<noteBy userID="def34" userIDType="NetID"/> | ||
<noteDateTime>2024-07-23T11:54:12-04:00</noteDateTime> | ||
<eventType>Quota</eventType> | ||
<message>Delivering just 500 GB to start, and planning to increase to 500 TB by 2024-12-31</message> | ||
</eventNote> | ||
</submission> | ||
<status>Active</status> | ||
<schemaVersion>1.0</schemaVersion> | ||
</projectProvenance> | ||
</resource> |
Oops, something went wrong.