Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid SPDX generated #6

Closed
vargenau opened this issue Mar 16, 2023 · 5 comments
Closed

Invalid SPDX generated #6

vargenau opened this issue Mar 16, 2023 · 5 comments

Comments

@vargenau
Copy link
Contributor

The SPDX file is in some cases invalid because of incorrect license identifiers.

scancode-toolkit.spdx.txt

Examples in the above scan:

PackageLicenseConcluded: Apache-2
PackageLicenseConcluded: ASL 2.0
PackageLicenseConcluded: BSD
PackageLicenseConcluded: LGPL
PackageLicenseConcluded: MIT/X

I understand the information is taken from a package metadata that is not in SPDX format, but you should not output it as it is.
Or you are able to map it to a correct SPDX identifier, or you should create a custom LicenseRef-

@anthonyharrison
Copy link
Owner

@vargeenau Thanks for the report. The aim was to identify any licences if they were included but as you rightly point out, the license information is obtained from the meta data and there are lots of issues with licenses in the meta data which need to be tidied up. Can I suggest you raise an issue with Scancode to update the licences to be correct SPDX identifiers?

Automatcially mapping it to the 'correct' identifier isn't feasible, for example what would LGPL map to - LGPL2, LGPL2.1 ?

However I could simply ignore the license if it isn't a valid SPDX Id and not include it (note that the NONE or NOASSETION semantics do not cover invalid licences) but this seems to be wrong when the author has attempted to specify a license. Or
I could create a custom LicenceRef as you suggest but this seems to be hiding the issue.

I will have a think how best to proceed.

BTW you could try usingthe --exclude-license option if there are lots of incorrect licenses.

@vargenau
Copy link
Contributor Author

The report was produced for ScanCode, but the incorrect licenses are not from ScanCode but from dependencies.
I have created some pull requests for them:
kmike/text-unidecode#12
pdfminer/pdfminer.six#866
harlowja/fasteners#104

I agree that LGPL cannot be automatically mapped, but Apache-2 and ASL 2.0 could.

@anthonyharrison
Copy link
Owner

@vargenau I have made a number of updates in the latest release (0.9.0) which hopefully should result in the generation of an SPDX document with valid licenses. Let me know if you have any issues.

@vargenau
Copy link
Contributor Author

Hi @anthonyharrison
Thank you for your quick fix!
The SPDX code is now valid.

Two remarks:

In file cryptography,
cryptography.spdx.txt

BSD-3-Clause or Apache-2.0 should be BSD-3-Clause OR Apache-2.0
Keywords are case-sensitive and must be in upper case.

In file chardet,
chardet.spdx.txt

you guessed LGPL-2.0-or-later, it is in fact LGPL-2.1-or-later, but I do not know if it is easy to do better.

@anthonyharrison
Copy link
Owner

@vargenau

Thanks for pointing out the error with the case of the boolean operators in the license expression. I will work on a fix for this although I note that the latest version of the cryptography module (40.0.1) appears to be correct (and the license has changed).

I was advised that LGPL is assumed to mean LGPL-2.0-or-later. Given the 'error' in this assumption for chardet, the only way to fix this is to ensure chardet specifies the correct license in its metadata.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants