Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Create simpleHRT-example.json #80

Merged
merged 6 commits into from
Aug 1, 2024

Conversation

rgopikrishnan91
Copy link
Contributor

This is a first attempt at writing an AI BOM. Need help with

  1. Checking the integrity of the example
  2. How to add licensing information

This AI BOM is for the application available here https://github.com/githubharald/SimpleHTR/tree/master

@kestewart

This is a first attempt at writing an AI BOM
@bact
Copy link
Contributor

bact commented May 18, 2024

@rgopikrishnan91 I made a PR to your branch here rgopikrishnan91#1 to fix few JSON errors.

@bact
Copy link
Contributor

bact commented May 24, 2024

For licensing information, I guess it will look more or less like this:

    {
      "type": "Relationship",
      "spdxId": "urn:",
      "relationshipType": "hasConcludedLicense",
      "from": "...",
      "to": "CC-BY-4.0",
      "creationInfo": "_:creationinfo",
    },
    {
      "type": "Relationship",
      "spdxId": "urn:",
      "relationshipType": "hasDeclaredLicense",
      "from": "...",
      "to": "CC-BY-4.0",
      "creationInfo": "_:creationinfo",
    }

from should be the id of the element that has the licenses

What I'm not sure is the to that should we use the SPDX short name of the license or should it be some kind of IRI?

@rgopikrishnan91
Copy link
Contributor Author

rgopikrishnan91 commented May 24, 2024 via email

@bact
Copy link
Contributor

bact commented Jun 5, 2024

My problem is I am not sure how to define custom licenses and I think the dataset here needs a custom license

Right. Maybe this one is going to be a complex example.

@bact
Copy link
Contributor

bact commented Jun 5, 2024

I found that the dictionary entries is quite cumbersome for a big dictionary. Based on the work of Gopi's, I found that we have to have it like this...

            "ai_hyperparameter": [
                {
                    "type": "DictionaryEntry",
                    "key": "cnn_kernel_vals",
                    "value": "[5, 5, 3, 3, 3]"
                },
                {
                    "type": "DictionaryEntry",
                    "key": "cnn_nfeature_vals",
                    "value": "[1, 32, 64, 128, 128, 256]"
                }
            ]

Hopefully we will soon not have to do this by hand....

@bact
Copy link
Contributor

bact commented Jun 5, 2024

Btw, good news, this example is now passed the validation by spdx3ToGraph (see rgopikrishnan91#1 )

However, due to a very long string (few of them are values in dictionary entries), because of the constraint of JSON string, PlantUML is unable to draw the diagram properly (the diagram got cropped).

@bact
Copy link
Contributor

bact commented Jun 6, 2024

For custom license, I think it will probably look like this:

        {
            "type": "expandedlicensing_CustomLicense",
            "spdxId": "https://spdx.org/spdxdocs/CustomLicense-c63547c2-62e0-48ec-b98d-ff1b917d67db",
            "creationInfo": "_:creationinfo",
            "simplelicensing_licenseText": "This database may be used for non-commercial research purpose only. If you publish material based on this database, we request you to include a reference to paper. U. Marti and H. Bunke. The IAM-database: An English Sentence Database for Off-line Handwriting Recognition. Int. Journal on Document Analysis and Recognition, Volume 5, pages 39 - 46, 2002.",
            "expandedlicensing_isFsfLibre": false,
            "expandedlicensing_isOsiApproved": false
        },
        {
            "type": "Relationship",
            "spdxId": "https://spdx.org/spdxdocs/Relationship/declaredLicense-2c9563dc-baa1-4385-be02-ad671976a8aa",
            "creationInfo": "_:creationinfo",
            "relationshipType": "hasDeclaredLicense",
            "from": "https://my-first-aibom.com/IAMdataset",
            "to": "https://spdx.org/spdxdocs/CustomLicense-c63547c2-62e0-48ec-b98d-ff1b917d67db"
        },
        {
            "type": "Relationship",
            "spdxId": "https://spdx.org/spdxdocs/Relationship/concludedLicense-3bcfa4ce-6a65-46e8-bed1-18985211bb9e",
            "creationInfo": "_:creationinfo",
            "relationshipType": "hasConcludedLicense",
            "from": "https://my-first-aibom.com/IAMdataset",
            "to": "https://spdx.org/spdxdocs/CustomLicense-c63547c2-62e0-48ec-b98d-ff1b917d67db"
        },

The above snippet uses CustomLicense class but I think we can also use SimpleLicensingText class as well if we only want to have the license text.

I have added this to the PR to the Gopi's fork.

@bact
Copy link
Contributor

bact commented Jun 17, 2024

Now pass all the validations (JSON Schema and SHACL)

Available for review at
rgopikrishnan91#2

@zvr
Copy link
Member

zvr commented Jun 19, 2024

AFAIK you cannot split strings into different lines in JSON, so lines 1016-1039 produce a syntax error and they have to be converted to two lines.
The formatting is strange in other places, as well, with lots of whitespace in the middle of some strings.

@zvr
Copy link
Member

zvr commented Jun 19, 2024

For the licensing part, all licensing information is expressed via Relationships. There are two relationshipTypes that can be used: hasDeclaredLicense to record the licensing information found in the artifact and hasConcludedLicense to record what the data creator states that the artifact is licensed under. In both cases, a Relationship is needed, with from being an artifact and to being a license expression.

For example, recording the conclusion that your dataset is under the MIT license might look like:

{
  "spdxId": "https://my-first-aibom.com/rel-1",
  "type": "Relationship",
  "relationshipType": "hasConcludedLicense",
  "from": "https://my-first-aibom.com/IAMdataset",
  "to": "https://spdx.org/licenses/MIT",
  "creationInfo": "_:creationinfo"
}

If the license is not in the SPDX License List, you have to create a new CustomLicense and use a Relationship pointing to that one. For example:

{
  "spdxId": "https://my-first-aibom.com/rel-2",
  "type": "Relationship",
  "relationshipType": "hasConcludedLicense",
  "from": "https://my-first-aibom.com/IAMdataset",
  "to": "https://my-first-aibom.com/LicenseRef-MyStrangeLicense",
  "creationInfo": "_:creationinfo"
},
{
  "spdxId": "https://my-first-aibom.com/LicenseRef-MyStrangeLicense",
  "type": "expandedlicensing_CustomLicense",
  "name": "LicenseRef-MyStrangeLicense",
  "simplelicensing_licenseText": "Anyone can use this dataset without any obligation.",
  "creationInfo": "_:creationinfo"
},

@bact
Copy link
Contributor

bact commented Jun 19, 2024

I have tried to point the to of license relationships to https://spdx.org/licenses/MIT (and any URL in the SPDX licenses list), but SHACL validator tells that it has a wrong class.

For example, this relationship:

{
  "type": "Relationship",
  "spdxId": "https://spdx.org/spdxdocs/Relationship/declaredLicense-e5536c5e-c8b5-4d24-947f-674c27c0b6c1",
  "creationInfo": "_:creationinfo",
  "relationshipType": "hasDeclaredLicense",
  "from": "https://spdx.org/spdxdocs/DatasetPackage1-035470d9-3ede-4952-91c8-c2abb943c90b",
  "to": [
    "https://spdx.org/licenses/CC-BY-4.0"
  ]
}

Will get this error from pyshacl:

Conforms: False

Constraint Violation in ClassConstraintComponent (http://www.w3.org/ns/shacl#ClassConstraintComponent):

	Severity: sh:Violation
	Source Shape: [ sh:class <https://spdx.org/rdf/3.0.0/terms/Core/Element> ; sh:minCount Literal("1", datatype=xsd:integer) ; sh:nodeKind sh:BlankNodeOrIRI ; sh:path <https://spdx.org/rdf/3.0.0/terms/Core/to> ]
	Focus Node: <https://spdx.org/spdxdocs/Relationship/declaredLicense-e5536c5e-c8b5-4d24-947f-674c27c0b6c1>
	Value Node: <https://spdx.org/licenses/CC-BY-4.0>
	Result Path: <https://spdx.org/rdf/3.0.0/terms/Core/to>
	Message: Value does not have class <https://spdx.org/rdf/3.0.0/terms/Core/Element>

(I agree that we should be able to have the license IRI directly if it is in the SPDX license list. I think that should be an expected behavior.)

@zvr
Copy link
Member

zvr commented Jun 20, 2024

Yes, the validator needs to know about every item in the SPDX License List.

Unfortunately, we do not yet generate such definitions for SPDXv3. We do generate for the SPDXv2 RDF, as you can see in https://github.com/spdx/license-list-data/blob/main/jsonld/CC-BY-4.0.jsonld.

I've created an issue at the List Publisher about it.

Keep in mind, that even if we had this data, it would not be straightforward to use, since which data you should be using depends on the value of the licenseListVersion property.

@bact
Copy link
Contributor

bact commented Jun 20, 2024

Thank you @zvr that is very clear and very useful.

@bact
Copy link
Contributor

bact commented Jun 29, 2024

As a workaround to complete the validation, my approach is to declare an AnyLicenseInfo element with an spdxId the same as the target license IRI. Once we have ListedLicense in SPDX 3 RDF, this workaround element will be removed.
See: #84 (comment)

@rgopikrishnan91
Copy link
Contributor Author

@bact @kestewart @zvr The example has now been modified to include all the comments and suggestions. Thank you both Art and Zvr for the help!

Key additions:

  1. Added licensing info for all the dependencies.
  2. Removed concluded license for model and SimpleHRT as we cannot assertively say that (because the dataset has a non-commercial clause and the model is licensed under MIT)

Please approve if everything looks good. Thanks once again!

@bact
Copy link
Contributor

bact commented Jul 28, 2024

Thank you.

Following the structure of examples of Software Profile, we may also like to rename the file/folder to something like:
ai/example01/spdx3.0/simpleHTR.spdx.json

(btw, not sure about the extension now, as we may need to update the naming convention for extension? See: spdx/spdx-spec#987 (comment) / But we can change that later by another PR.)

@rgopikrishnan91
Copy link
Contributor Author

@kestewart Can you please merge this if you are happy with it too?

@bact
Copy link
Contributor

bact commented Jul 30, 2024

btw, we have both "simpleHRT" and "simpleHTR" here.

Copy link

@bennetkl bennetkl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All changes in comments have been made, and example is approved

@rgopikrishnan91
Copy link
Contributor Author

btw, we have both "simple_HRT_" and "simple_HTR_" here.

Hey, I am not sure where you mean we have both version. Could you please point out so that I can fix it?

…implehtr-example.json

I believe the name of the software is SimpleHTR
https://github.com/githubharald/SimpleHTR

The filename was previously. "SimpleHRT" (R and T swap).
Signed-off-by: Arthit Suriyawongkul <[email protected]>
@bact
Copy link
Contributor

bact commented Aug 1, 2024

@rgopikrishnan91

I have put the PR to your repo here: rgopikrishnan91#3

SimpleHRT -> SimpleHTR; Put in example01 folder
@rgopikrishnan91
Copy link
Contributor Author

@bennetkl Hey can you merge it now?

@bact Thanks a bunch!

@kestewart kestewart merged commit dcbf8fd into spdx:master Aug 1, 2024
@bennetkl
Copy link

bennetkl commented Aug 1, 2024

Looks like Kate merged this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants