Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(license): improve license normalization #7131

Merged
merged 28 commits into from
Sep 11, 2024

Conversation

pbaumard
Copy link
Contributor

@pbaumard pbaumard commented Jul 9, 2024

Description

  • Normalize "+", "-only" and "-or-later" suffixes
  • Many new mappings from oss-review-toolkit
  • Space normalization, including newlines
  • Normalize US / UK spelling for license / licence.
  • Remove "THE " prefix
  • Remove common suffixes: " LICENSE", " LICENSED", -"LICENSE" and "-LICENSED"
  • Add MIT-0
  • Add tests

Related issues

Checklist

  • I've read the guidelines for contributing to this repository.
  • I've followed the conventions in the PR title.
  • I've added tests that prove my fix is effective or that my feature works.
  • I've updated the documentation with the relevant information (if needed).
  • I've added usage information (if the PR introduces new options)
  • I've included a "before" and "after" example to the description (if the PR is a user interface change).

- Remove "THE " prefix
- Remove "LICENSE " suffix
- Fix lowercase key
- Add BSD LICENSE 3
- Add ECLIPSE PUBLIC LICENSE 1.0
- Add MIT-0
- Add tests
@pbaumard pbaumard requested a review from knqyf263 as a code owner July 9, 2024 15:04
@CLAassistant
Copy link

CLAassistant commented Jul 9, 2024

CLA assistant check
All committers have signed the CLA.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link
Contributor

@DmitriyLewen DmitriyLewen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @pbaumard
Thanks for your work!

LGTM.
Can you fix tests?

Copy link
Contributor

@DmitriyLewen DmitriyLewen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @pbaumard
Sorry for delay.

Left comments. Take a look when you have time.

pkg/licensing/normalize.go Outdated Show resolved Hide resolved
pkg/licensing/normalize.go Show resolved Hide resolved
var mapping = make(map[string]expression.SimpleExpr)

func addMap(name, key string, hasPlus bool) {
license := normalizeKeyAndSuffix(name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC licenses are already normalized.
Do we need this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The license from normalizeKeyAndSuffix is only used for the following assertion with the panic error.

I added some comemnts to make it more explicit.

The check could be made in unit test but it means having mapping public.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understand your idea - you want to check the newly added licenses (from another PR later).

I think we don't need to do this check every time we start Trivy.

The test can be done in a unit test, but this means that the mapping will be public.

You have added so many licenses, so I am not sure that many licenses will be added later.
Maybe a comment before mapping to the instructions for the new licenses will be enough. wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Especially with latest commit with the version regular expression, it's becoming quite difficult to make sure to only add standardized keys.

I added an InvalidMappingKeys used in test and which might also be used later if adding new mappings from CLI or other mean becomes possible.

@@ -288,7 +290,7 @@ func (*Marshaler) Licenses(licenses []string) *cdx.Licenses {
choices := lo.Map(licenses, func(license string, i int) cdx.LicenseChoice {
return cdx.LicenseChoice{
License: &cdx.License{
Name: license,
Name: NormalizeLicense(license),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use this function here?

func Normalize(name string) string {
return NormalizeLicense(name).String()
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the CycloneDX normalization from this PR to reduce the scope.

From cycloneDX spec that would ideally mean having :

  • Id: "A valid SPDX license ID"
  • Name: "If SPDX does not define the license used, this field may be used to provide the license name",

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not check all fields for OS packages:

for i := 0; i < len(expectedPkgs); i++ {
require.Equal(t, expectedPkgs[i].Name, detail.Packages[i].Name, tc.name)
require.Equal(t, expectedPkgs[i].Version, detail.Packages[i].Version, tc.name)
}

Therefore, there is no point in updating the OS packages for the files pkg/fanal/test/integration/testdata/goldens/packages/*.json.golden.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted those changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I confused you a little.
containerd tests use full files.
I have updated these golden files in 39f796e.

var mapping = make(map[string]expression.SimpleExpr)

func addMap(name, key string, hasPlus bool) {
license := normalizeKeyAndSuffix(name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understand your idea - you want to check the newly added licenses (from another PR later).

I think we don't need to do this check every time we start Trivy.

The test can be done in a unit test, but this means that the mapping will be public.

You have added so many licenses, so I am not sure that many licenses will be added later.
Maybe a comment before mapping to the instructions for the new licenses will be enough. wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I confused you a little.
containerd tests use full files.
I have updated these golden files in 39f796e.

},
"Version": "1.6.3-r0",
"Arch": "x86_64",
"SrcName": "apr",
"SrcVersion": "1.6.3-r0",
"Licenses": [
"ASL2.0"
"Apache-2.02.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pbaumard can you take a look?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in latest commit.

Comment on lines 12 to 14
func addMap(name, key string, hasPlus bool) {
mapping[name] = expression.SimpleExpr{License: key, HasPlus: hasPlus}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like we don't need this function anymore and we can just put everything in the map right away

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in latest commit, even if I am not sure it is more readable that way.

Comment on lines 593 to 599
var versionRegexpString = "([A-UW-Z)]{2,})( LICENSE)?\\s*[,(-]?\\s*(V|V\\.|VERSION|VERSION-|-)?\\s*([1-9](\\.\\d)*)[)]?"

// case insensitive version match anywhere in string
var versionRegexp = regexp.MustCompile("(?i)" + versionRegexpString)

// version suffix match
var versionSuffixRegexp = regexp.MustCompile(versionRegexpString + "$")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand where we need this.
Can you add an example in the comment?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to avoid using regex if possible.
Maybe we can avoid using this regex here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The many version variations are in normalize_test.

Most of the mappings in current trivy version or in oss-review-toolkit/ort are because of slight variations in the way the version is declared in the license.

So this regexp allows to:

  1. greatly limit the number of mappings
  2. avoid missing version mappings

This regexp is strict by checking only version suffixes.

@@ -8,6 +8,219 @@ import (
"github.com/aquasecurity/trivy/pkg/licensing"
)

func TestMap(t *testing.T) {
assert.Empty(t, licensing.InvalidMappingKeys((nil)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert.Empty(t, licensing.InvalidMappingKeys((nil)))
assert.Empty(t, licensing.InvalidMappingKeys(nil))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InvalidMappingKeys no more used in latest commit.

@@ -453,3 +450,23 @@ func TestParseApkInfo(t *testing.T) {
})
}
}

func TestParseLicense(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a redundant test.
We see this function working in TestParseApkInfo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestLaxSplitLicense was moved to normalize_test in latest commit.

Comment on lines 644 to 656
func InvalidMappingKeys(licenseToNormalized map[string]expression.SimpleExpr) []string {
if licenseToNormalized == nil {
licenseToNormalized = mapping
}
var invalid []string
for key := range licenseToNormalized {
standardized := standardizeKeyAndSuffix(key)
if standardized.License != key {
invalid = append(invalid, key)
}
}
return invalid
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we move this function to the test.
To get mapping, we can create a function like GetBuiltinRules() for secrets.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

normalize_test is now in package licensing to use internal mapping and functions.

Copy link
Contributor

@DmitriyLewen DmitriyLewen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pbaumard Thanks for work!
And sorry for delay.

LGTM.
Left 1 small comment.

Comment on lines 107 to 111
[]string{
"Apache+",
},
"Apache-2.0+",
"Apache-2.0",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add field names for better readability in these tests?
e.g.:

Suggested change
[]string{
"Apache+",
},
"Apache-2.0+",
"Apache-2.0",
licenses: []string{
"Apache+",
},
normalized: "Apache-2.0+",
normalizedKey: "Apache-2.0",

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Field names have been added in last commit.

Copy link
Contributor

@DmitriyLewen DmitriyLewen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @pbaumard
Thanks for your work and sorry for delays.

@knqyf263 take a look, when you have time, please.

Copy link
Collaborator

@knqyf263 knqyf263 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an awesome contribution! I planned to add this kind of improvement but didn't find the time. Thanks for your great work and patience.

@knqyf263 knqyf263 added this pull request to the merge queue Sep 11, 2024
Merged via the queue into aquasecurity:main with commit 6472e3c Sep 11, 2024
12 checks passed
@pbaumard pbaumard deleted the feature/better-license-normalize branch September 11, 2024 13:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat(license): Improve license normalization
4 participants