Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make clarifications in 1.2 text #392

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
107 changes: 48 additions & 59 deletions docs/_specification/1.2-DRAFT/data-entities.md
Original file line number Diff line number Diff line change
Expand Up @@ -354,13 +354,6 @@ As files on the web may change, the timestamp property [sdDatePublished] SHOULD
{% include callout.html type="note" content="Do not use web based URI identifiers for files which _are_ present in the crate root, see [below](#embedded-data-entities-that-are-also-on-the-web)." %}


### Encoding file paths

Note that all `@id` [identifiers must be valid URI references](appendix/jsonld#describing-entities-in-json-ld), care must be taken to express any relative paths using `/` separator, correct casing, and escape special characters like space (`%20`) and percent (`%25`), for instance a _File Data Entity_ from the Windows path `Results and Diagrams\almost-50%.png` becomes `"@id": "Results%20and%20Diagrams/almost-50%25.png"` in the _RO-Crate JSON-LD_.

In this document the term _URI_ includes international *IRI*s; the _RO-Crate Metadata File_ is always UTF-8 and international characters in identifiers SHOULD be written using native UTF-8 characters (*IRI*s), however traditional URL encoding of Unicode characters with `%` MAY appear in `@id` strings. Example: `"@id": "面试.mp4"` is preferred over the equivalent `"@id": "%E9%9D%A2%E8%AF%95.mp4"`


### Embedded data entities that are also on the web

File Data Entities that are present as local files may already have a corresponding web presence, for instance a landing page that describes the file, including persistent identifiers (e.g. DOI) resolving to an intermediate HTML page instead of the downloadable file directly.
Expand Down Expand Up @@ -394,15 +387,56 @@ Note that if a local file is intended to be packaged within an _Attached RO-Crat

### Directories on the web; dataset distributions

A _Directory File Entry_ or [Dataset] identifier expressed as an absolute URL on the web can be harder to download than a [File] because it consists of multiple resources. It is RECOMMENDED that such directories have a complete listing of their content in [hasPart], enabling download traversal, or are themselves RO-Crates.
A _Directory File Entry_ or [Dataset] identifier expressed as an absolute URL on the web can be harder to download than a [File] because it consists of multiple resources. It is RECOMMENDED that such directories have a complete listing of their content in [hasPart], enabling download traversal, or are themselves RO-Crates (see [Referencing other RO-Crates](#referencing-other-ro-crates)).

#### Downloadable dataset

A common mechanism to provide downloads of a reasonably sized directory is as an archive file in formats such as [`application/zip`](https://www.nationalarchives.gov.uk/PRONOM/x-fmt/263) or [`application/gzip`](https://www.nationalarchives.gov.uk/PRONOM/x-fmt/266), described as a [DataDownload].

```json
{
"@id": "lots_of_little_files/",
"@type": "Dataset",
"name": "Too many files",
"description": "This directory contains many small files, that we're not going to describe in detail.",
"distribution": {"@id": "http://example.com/downloads/2020/lots_of_little_files.zip"}
},
{
"@id": "http://example.com/downloads/2020/lots_of_little_files.zip",
"@type": "DataDownload",
"encodingFormat": ["application/zip", {"@id": "https://www.nationalarchives.gov.uk/PRONOM/x-fmt/263"}],
"contentSize": "82818928"
}
```

Similarly, the _RO-Crate Root_ entity (or a reference to another RO-Crate as a `Dataset`) may provide a [distribution] URL, in which case the download SHOULD be an archive that contains the _RO-Crate Metadata Document_ (either directly in the archive's root, or within a single folder in the archive), indicated by a version-less `conformsTo`:

```json
{
"@id": "./",
"@type": "Dataset",
"identifier": "https://doi.org/10.48546/workflowhub.workflow.775.1",
"name": "Research Object Crate for Jupyter Notebook Molecular Structure Checking",
"distribution": {"@id": "https://workflowhub.eu/workflows/775/ro_crate?version=1"},
"…": ""
},
{
"@id": "https://workflowhub.eu/workflows/775/ro_crate?version=1",
"@type": "DataDownload",
"encodingFormat": ["application/zip", {"@id": "https://www.nationalarchives.gov.uk/PRONOM/x-fmt/263"}],
"conformsTo": { "@id": "https://w3id.org/ro/crate" }
}
```

In all cases, consumers should be aware that a `DataDownload` is a snapshot that may not reflect the current state of the `Dataset` or RO-Crate.

#### Referencing other RO-Crates
## Referencing other RO-Crates

A referenced RO-Crate is also a [Dataset] data entity, but where its [hasPart] do not need to be listed. Instead, its content and further metadata is available from its own RO-Crate Metadata Document, which may be retrieved or packaged within an archive. An entity representing a referenced RO-Crate SHOULD have `conformsTo` pointing to the generic RO-Crate profile using the fixed URI `https://w3id.org/ro/crate`.

This section defines how a _referencing_ RO-Crate ("A") can declare data entities within A's RO-Crate Metadata Document, in order to indicate a _referenced_ RO-Crate ("B"). There are different options on how to find the identifier to assign the referenced RO-Crate in A, and how a consumer of A finding such a reference can find the corresponding RO-Crate Metadata Document for B.

##### Referencing RO-Crates that have a persistent identifier
### Referencing RO-Crates that have a persistent identifier

If the referenced RO-Crate B has an `identifier` declared as B's [Root Data Entity identifier](root-data-entity#root-data-entity-identifier), then this is a _persistent identifier_ which SHOULD be used as the URI in the `@id` of the corresponding entity in RO-Crate A. For instance, if RO-Crate B had declared the identifier `https://pid.example.com/another-crate/` then RO-Crate A can reference B as an entity:

Expand All @@ -423,7 +457,7 @@ Consumers that find a reference to a `Dataset` with the generic RO-Crate profile
If an `identifier` is not declared in a referenced RO-Crate B, but the determined absolute URI has [Signposting] declared for a `Link:` with `rel=cite-as`, then that link MAY be considered as an equivalent permalink for B.


##### Determining entity identifier for a referenced RO-Crate
### Determining entity identifier for a referenced RO-Crate

In some cases, if the referenced RO-Crate B has not got a resolvable `identifier` declared, additional steps are needed to find the correct `@id` to use:

Expand All @@ -434,7 +468,7 @@ In some cases, if the referenced RO-Crate B has not got a resolvable `identifier

If the RO-Crate Metadata Document is not available as a web resource, but only within an archive (e.g. ZIP), then instead reference it as a [Downloadable dataset](#downloadable-dataset).

##### Referencing another metadata document
### Referencing another metadata document

If a referenced RO-Crate Metadata Document is known at a given URI or path, but its corresponding RO-Crate identifier can't be determined as above (e.g. [Retrieving an RO-Crate](#retrieving-an-ro-crate) fails or requires heuristics), then a referenced metadata descriptor entity SHOULD be added. For instance, if `http://example.com/another-crate/ro-crate-metadata.json` resolves to an RO-Crate Metadata Document describing root `./`, but `http://example.com/another-crate/` always returns a HTML page without [Signposting] to the metadata document, then `subjectOf` SHOULD be added to an explicit metadata descriptor entity, which has `encodingFormat` declared for JSON-LD:

Expand All @@ -456,7 +490,7 @@ If a referenced RO-Crate Metadata Document is known at a given URI or path, but
{% include callout.html type="tip" content="Counter to [file format profile](data-entities.html#file-format-profiles) recommendations, the referenced RO-Crate metadata descriptor SHOULD NOT include its own `conformsTo` declarations to `https://w3id.org/ro/crate` or reference the dataset with `about`; this is to avoid confusion with the referencing RO-Crate's own [metadata descriptor](root-data-entity#ro-crate-metadata-descriptor). " %}


##### Profiles of referenced crates
### Profiles of referenced crates

If the referenced crate conforms to a given [RO-Crate profile](profiles), this MAY be indicated by expanding `conformsTo` on the `Dataset` to an array to reference the profile as an contextual entity:

Expand All @@ -478,52 +512,7 @@ If the referenced crate conforms to a given [RO-Crate profile](profiles), this M

{% include callout.html type="note" content="The profile declaration of a referenced crate is a hint. Consumers should check `conformsTo` as declared in the retrieved RO-Crate, as it may have been updated after this RO-Crate." %}



#### Downloadable dataset


Alternatively, a common mechanism to provide downloads of a reasonably sized directory is as an archive file in formats such as [`application/zip`](https://www.nationalarchives.gov.uk/PRONOM/x-fmt/263) or [`application/gzip`](https://www.nationalarchives.gov.uk/PRONOM/x-fmt/266), described as a [DataDownload].

```json
{
"@id": "lots_of_little_files/",
"@type": "Dataset",
"name": "Too many files",
"description": "This directory contains many small files, that we're not going to describe in detail.",
"distribution": {"@id": "http://example.com/downloads/2020/lots_of_little_files.zip"}
},
{
"@id": "http://example.com/downloads/2020/lots_of_little_files.zip",
"@type": "DataDownload",
"encodingFormat": ["application/zip", {"@id": "https://www.nationalarchives.gov.uk/PRONOM/x-fmt/263"}],
"contentSize": "82818928"
}
```

Similarly, the _RO-Crate Root_ entity (or a reference to another RO-Crate as a `Dataset`) may provide a [distribution] URL, in which case the download SHOULD be an archive that contains the _RO-Crate Metadata Document_ (either directly in the archive's root, or within a single folder in the archive), indicated by a version-less `conformsTo`:

```json
{
"@id": "./",
"@type": "Dataset",
"identifier": "https://doi.org/10.48546/workflowhub.workflow.775.1",
"name": "Research Object Crate for Jupyter Notebook Molecular Structure Checking",
"distribution": {"@id": "https://workflowhub.eu/workflows/775/ro_crate?version=1"},
"…": ""
},
{
"@id": "https://workflowhub.eu/workflows/775/ro_crate?version=1",
"@type": "DataDownload",
"encodingFormat": ["application/zip", {"@id": "https://www.nationalarchives.gov.uk/PRONOM/x-fmt/263"}],
"conformsTo": { "@id": "https://w3id.org/ro/crate" }
}
```

In all cases, consumers should be aware that a `DataDownload` is a snapshot that may not reflect the current state of the `Dataset` or RO-Crate.


#### Retrieving an RO-Crate
### Retrieving an RO-Crate

To resolve a reference to an RO-Crate, but where `subjectOf` or `distribution` is unknown (e.g. an RO-Crate is cited from a journal article), the below approach is recommended to retrieve its [RO-Crate Metadata Document](structure#ro-crate-metadata-document-ro-crate-metadatajson):

Expand Down
12 changes: 6 additions & 6 deletions docs/_specification/1.2-DRAFT/metadata.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ redirect_from:
- /1.2-DRAFT/metadata
excerpt: |
RO-Crate aims to capture and describe the Research Object using
structured metadata. The RO-Crate Metadata Descriptor contains the
structured metadata. The RO-Crate Metadata Document contains the
metadata that describes the RO-Crate and its content. This machine-readable
metadata can also be represented for human consumption in the RO-Crate Website,
linking to data and Web resources.
Expand Down Expand Up @@ -38,9 +38,9 @@ parent: RO-Crate 1.2-DRAFT
1. TOC
{:toc}

RO-Crate aims to capture and describe the [Research Object][ResearchObject] using structured _metadata_.
RO-Crate aims to capture and describe the [Research Object][ResearchObject] using structured _metadata_. Specifically, an RO-Crate is described using _JSON-LD_ by an _RO-Crate Metadata Document_. As explained in section [RO-Crate Structure](structure) this may be stored in an _RO-Crate Metadata File_.

The _RO-Crate Metadata Descriptor_ contains the metadata that describes the RO-Crate and its content, in particular:
The _RO-Crate Metadata Document_ contains the metadata that describes the RO-Crate and its content, in particular:

* [Root Data Entity](root-data-entity) - the RO-Crate `Dataset` itself, a gathering of data
* [Data Entities](data-entities) - the _data_ payload, in the form of files and folders
Expand Down Expand Up @@ -75,7 +75,7 @@ For all entities listed in an _RO-Crate Metadata Document_ the following princip
3. The `@type` SHOULD include at least one [Schema.org] type that accurately describe the entity. [Thing] or [CreativeWork] are valid fallbacks if no alternative external or ad-hoc term is found (see [Extending RO-Crate](appendix/jsonld#extending-ro-crate)).
5. The entity SHOULD have a human-readable `name`, in particular if its `@id` does not go to a human-readable Web page.
6. The properties used on the entity SHOULD be applicable to the `@type` (or superclass) according to their definitions. For instance, the property [publisher] can be used on a [Dataset] as it applies to its superclass [CreativeWork].
7. Property references to other entities (e.g. `author` property to a `Person` entity) SHOULD use the `{ "@id": "..."}` object form (see [JSON-LD appendix](appendix/jsonld)).
7. Property references to other entities (e.g. `author` property to a `Person` entity) MUST use the `{ "@id": "..."}` object form (see [JSON-LD appendix](appendix/jsonld)).
8. The entity SHOULD be ultimately referenceable from the root data entity (possibly through another reachable [data entity](data-entities) or [contextual entity](contextual-entities)).


Expand All @@ -97,7 +97,7 @@ Generally, the standard _type_ and _property_ names (_terms_) from [Schema.org]
* `File` is mapped to <http://schema.org/MediaObject> which was chosen as a compromise as it has many of the properties that are needed to describe a generic file. Future versions of Schema.org or a research data extension may re-define `File`.
* `Journal` is mapped to <http://schema.org/Periodical>.

{% include callout.html type="warning" content="JSON-LD examples given on the [Schema.org] website might not be in _flattened_ form; any nested entities in _RO-Crate JSON-LD_ SHOULD be described as separate contextual entities in the flat `@graph` list. " %}
{% include callout.html type="warning" content="JSON-LD examples given on the [Schema.org] website might not be in _flattened_ form, but _RO-Crate JSON-LD_ is flattened; any nested entities in _RO-Crate JSON-LD_ MUST be described as separate contextual entities in the flat `@graph` list. " %}

To simplify processing and avoid confusion with string values, the _RO-Crate JSON-LD Context_ requires URIs and entity references to be given in the form `"author": {"@id": "http://example.com/alice"}`, even where [Schema.org] for some properties otherwise permit shorter forms like `"author": "http://example.com/alice"`.

Expand Down Expand Up @@ -159,7 +159,7 @@ From [CodeMeta 3.0](https://w3id.org/codemeta/3.0):
* `referencePublication` mapped to <https://codemeta.github.io/terms/referencePublication>
* `softwareSuggestions` mapped to <https://codemeta.github.io/terms/softwareSuggestions>

{% include callout.html type="warning" content="As of 2024-05-23, the CodeMeta URIs do not resolve correctly, but are used here to match the Codemeta JSON-LD context <https://w3id.org/codemeta/3.0> (issue [#275](https://github.com/ResearchObject/ro-crate/issues/275)).
{% include callout.html type="warning" content="As of 2025-01-09, the CodeMeta URIs do not resolve correctly, but are used here to match the Codemeta JSON-LD context <https://w3id.org/codemeta/3.0> (issue [#275](https://github.com/ResearchObject/ro-crate/issues/275)).
The CodeMeta terms `maintainer` and `funding` are not mapped, as these are already defined by schema.org." %}


Expand Down
7 changes: 2 additions & 5 deletions docs/_specification/1.2-DRAFT/root-data-entity.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,9 +36,6 @@ The **Root Data Entity** is a [Dataset] that represents the RO-Crate as a whole;
a _Research Object_ that includes the _Data Entities_ and the related
_Contextual Entities_.

An RO-Crate is described using _JSON-LD_ by an _RO-Crate Metadata Document_. As explained in section [RO-Crate Structure](structure) this may be stored in an _RO-Crate Metadata File_. In this section we describe the format of the JSON-LD document.


## RO-Crate Metadata Descriptor

The _RO-Crate Metadata Document_ MUST contain a self-describing
Expand Down Expand Up @@ -110,7 +107,7 @@ See also the appendix on

To ensure a base-line interoperability between RO-Crates, and for an RO-Crate to
be considered a _Valid RO-Crate_, a minimum set of metadata is required for the
_Root Data Entity_. As [stated earlier](structure#self-describing-and-self-contained-attached-ro-crates),
_Root Data Entity_. As [stated earlier](structure#flexibility-of-ro-crate-structure),
the _RO-Crate Metadata Document_ is not an
exhaustive manifest or inventory, that is, it does not necessarily list or
describe all files in the package. For this reason, there are no minimum
Expand All @@ -128,7 +125,7 @@ be minimally valid.

## Direct properties of the Root Data Entity

The _Root Data Entity_ MUST have the following properties:
The Root Data Entity MUST have all of the properties listed below. Each property also has requirements that apply to its value:

* `@type`: MUST be [Dataset] or an array that contains `Dataset`
* `@id`: SHOULD be the string `./` or an absolute URI (see [below](#root-data-entity-identifier))
Expand Down
Loading
Loading