Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add unittest and fuzzing test for SBOM Conversion #13

Closed

Conversation

wei-deepbits
Copy link
Collaborator

This pull request focuses on implementing unit and fuzz tests for SBOM conversion.

For the unit tests, the following steps are performed:

  • Automatic retrieval of a shared SBOM dataset containing 10,494 SBOM files in SPDX and CycloneDX formats.
  • Utilizing the sbom-convert tool to convert SBOMs from one format to another.
  • Verification that the conversion process proceeds without errors.
  • Comparison of the counts of Package URLs (PURLs) between the original and converted SBOMs.
  • Comparison of the counts of licenses in the original and converted SBOMs.

The fuzzing tests involve generating new inputs through mutations and evaluating whether the binary encounters failures when exposed to specific corner cases.

@jspeed-meyers
Copy link

Cool! Do we have CI tests for this repo? I don't see anything running.

@wei-deepbits
Copy link
Collaborator Author

@puerco @veramine @manifestori , I would greatly appreciate it if you could take some time to review the changes I've made.

@jspeed-meyers
Copy link

@wei-deepbits, are you able to add these tests, or some subset of them, to the CI (i.e. put this in a GH action), so that these tests (or some subset of these tests) runs on every PR?

Or did you intentionally not add these tests to GH actions?

@wei-deepbits
Copy link
Collaborator Author

@jspeed-meyers , Okay, I will attempt to add them to the GitHub action. I believe I may not have the necessary permissions for the sbom-convert repository, but I will try it on my forked repository first.

@jspeed-meyers
Copy link

Sounds good. And I should be able to help with permissions, so let me know what you think you need!

@jspeed-meyers
Copy link

@wei-deepbits, I added you as a maintainer :)

@wei-deepbits
Copy link
Collaborator Author

Thanks. I'll let you know once I've figured it out.

@wei-deepbits
Copy link
Collaborator Author

Hi @jspeed-meyers , I have added unit tests and fuzzing tests to the GitHub Action for the current code. The current code of protobom is unable to pass both the unit tests and fuzzing tests due to some issues.

Regarding the unit tests, the current code of protobom cannot process all SPDX format SBOM files generated by Syft. It results in a runtime error: "panic: runtime error: invalid memory address or nil pointer dereference."

As for the fuzzing tests, the current code of protobom is unable to handle a simple automatically generated fuzzing input: "{"spdxVersion": "SPDX-2.3"}."

@jspeed-meyers
Copy link

@wei-deepbits, this is really helpful. Thank you!

@puerco, @manifestori, @houdini91, @veramine: Do any of you have time to diagnose these bugs? We should probably fix these bugs first and then later merge this PR.

Nice work, @wei-deepbits, on finding what appears to be bugs.

@veramine
Copy link

oh goodness! We should always gracefully fail on bad input, and not panic!

@wei-deepbits can you attach the SBOM files here that cause the panic? If I can reproduce, I will check in fixes!

@wei-deepbits
Copy link
Collaborator Author

@veramine
Copy link

veramine commented Sep 19, 2023

Alright, thanks. protobom reader.ParseFile() worked fine on the first one but panic'd on the second due to a missing spdxDoc.CreationInfo. I added a check before attempting to use it. protobom/protobom#108

Here's the protobom from the first one. Did that error for you? Or did only the second one error?

{
	"metadata": {
		"id": "DOCUMENT",
		"version": "0",
		"name": "/home/wei/code/repos/python/abhiTronix/vidgear",
		"date": {
			"seconds": 1689780012
		},
		"tools": [
			{
				"name": "syft-0.85.0"
			}
		],
		"authors": [
			{
				"name": "Anchore, Inc",
				"is_org": true
			}
		]
	},
	"node_list": {
		"nodes": [
			{
				"id": "Package-python-pyzmq-d345c09cbb0c1e23",
				"name": "pyzmq",
				"version": "24.0.1",
				"url_download": "NOASSERTION",
				"copyright": "NOASSERTION",
				"source_info": "acquired package info from installed python package manifest file: /setup.py",
				"identifiers": {
					"1": "pkg:pypi/[email protected]",
					"3": "cpe:2.3:a:pyzmq:pyzmq:24.0.1:*:*:*:*:*:*:*"
				}
			},
			{
				"id": "File-setup.py-c829033125510d5a",
				"type": 1,
				"name": "/setup.py",
				"license_concluded": "NOASSERTION",
				"hashes": {
					"2": "0000000000000000000000000000000000000000"
				}
			}
		],
		"edges": [
			{
				"type": 30,
				"from": "Package-python-pyzmq-d345c09cbb0c1e23",
				"to": [
					"File-setup.py-c829033125510d5a"
				]
			}
		],
		"root_elements": [
			"DOCUMENT"
		]
	}
}

@wei-deepbits
Copy link
Collaborator Author

@veramine The first one also exhibits the panic error in my environment. Here is the log for reference:

wei@black:~/code/online/remote/sbom-convert$ dist/sbom-convert_linux_amd64_v1/sbom-convert /home/wei/Downloads/abhiTronix_vidgear_syft_spdx.json
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x70 pc=0x86bbe8]

goroutine 1 [running]:
github.com/bom-squad/protobom/pkg/native/serializers.(*SerializerCDX).Serialize(0xc00006e918?, {{0xde3660?, 0xc0001472c0?}, 0xc0001b5880?}, 0xc00020a2c0)
	github.com/bom-squad/[email protected]/pkg/native/serializers/serializer_cdx.go:83 +0x588
github.com/bom-squad/protobom/pkg/writer.(*defaultWriterImplementation).SerializeSBOM(0xc00002e900?, {{0x97faec?, 0x2a?}, 0xc0000106d0?}, {0xa372e0, 0xde3660}, 0xc0001fba20?, {0xa36dc0?, 0xc00006e030})
	github.com/bom-squad/[email protected]/pkg/writer/implementation.go:59 +0x5f
github.com/bom-squad/protobom/pkg/writer.(*Writer).WriteStream(0xc0001fba60, 0xc00020a2c0, {0xa36dc0, 0xc00006e030})
	github.com/bom-squad/[email protected]/pkg/writer/writer.go:40 +0xd5
github.com/bom-squad/go-cli/pkg/convert.(*Service).Convert(0xc00006e4f0, {0x0?, 0xc0001e20a0?}, {0xa37ba0?, 0xc00006e4e8}, {0xa36dc0, 0xc00006e030})
	github.com/bom-squad/go-cli/pkg/convert/convert.go:42 +0x11f
github.com/bom-squad/go-cli/cmd/cli.runConvert({0xa38a80, 0xde3660}, 0xc0001ca180, {0xc000039f10, 0x1, 0x0?})
	github.com/bom-squad/go-cli/cmd/cli/convert.go:96 +0x30e
github.com/bom-squad/go-cli/cmd/cli.ConvertCommand.func1(0xc000004600?, {0xc000039f10?, 0x4?, 0x969dba?})
	github.com/bom-squad/go-cli/cmd/cli/convert.go:41 +0x3c
github.com/spf13/cobra.(*Command).execute(0xc000004600, {0xc000039ee0, 0x1, 0x1})
	github.com/spf13/[email protected]/command.go:940 +0x87c
github.com/spf13/cobra.(*Command).ExecuteC(0xc000004300)
	github.com/spf13/[email protected]/command.go:1068 +0x3a5
github.com/spf13/cobra.(*Command).Execute(...)
	github.com/spf13/[email protected]/command.go:992
github.com/spf13/cobra.(*Command).ExecuteContext(...)
	github.com/spf13/[email protected]/command.go:985
github.com/bom-squad/go-cli/cmd/cli.Execute()
	github.com/bom-squad/go-cli/cmd/cli/main.go:36 +0x68
main.main()
	github.com/bom-squad/go-cli/main.go:8 +0xf

@veramine
Copy link

Aha, that's in the serializing code path. I only tested unserializing. Okay I'll look, thanks!

@veramine
Copy link

The CDX serializer was unexpectedly not including a root component in this test case. I added that check with protobom/protobom#109

% ./go-cli 2.spdx.json                         
serializing sbom: serializing SBOM to native format: Could not find SBOM root component

@wei-deepbits
Copy link
Collaborator Author

@veramine , I appreciate your help in resolving these issues.

@veramine
Copy link

No problem, thanks for fuzzing!

@wei-deepbits
Copy link
Collaborator Author

wei-deepbits commented Sep 19, 2023

@veramine , Using a CycloneDX format SBOM as the seed input for fuzzing can lead to the generation of a simple json:

{
    "bomFormat":"CycloneDX",
    "specVersion":"1.4",
    "":[]
}

that triggers a panic in unserializer_cdx14.go:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x852617]

goroutine 1 [running]:
github.com/bom-squad/protobom/pkg/native/unserializers.(*UnserializerCDX14).ParseStream(0x8e1d60?, 0xc000171890?, {0xa34a00?, 0xc0001184a8})
        github.com/bom-squad/[email protected]/pkg/native/unserializers/unserializer_cdx14.go:40 +0x297
github.com/bom-squad/protobom/pkg/reader.(*defaultParserImplementation).ParseStream(0xc000204101?, {0xa36060?, 0xde3660?}, 0x2a?, {0xa34a00?, 0xc0001184a8?})
        github.com/bom-squad/[email protected]/pkg/reader/implementation.go:77 +0x38
github.com/bom-squad/protobom/pkg/reader.(*Reader).ParseStream(0xc000134630, {0x7f30a85f11e0?, 0xc0001184a8})
        github.com/bom-squad/[email protected]/pkg/reader/reader.go:52 +0x167
github.com/bom-squad/go-cli/pkg/convert.(*Service).Convert(0xc0001184b0, {0x0?, 0xc00012ffd0?}, {0xa37ba0?, 0xc0001184a8}, {0xa36dc0, 0xc00006e030})
        github.com/bom-squad/go-cli/pkg/convert/convert.go:34 +0x89
github.com/bom-squad/go-cli/cmd/cli.runConvert({0xa38a80, 0xde3660}, 0xc000171f20, {0xc00012fe40, 0x1, 0x0?})
        github.com/bom-squad/go-cli/cmd/cli/convert.go:96 +0x30e
github.com/bom-squad/go-cli/cmd/cli.ConvertCommand.func1(0xc0001f0300?, {0xc00012fe40?, 0x4?, 0x969dba?})
        github.com/bom-squad/go-cli/cmd/cli/convert.go:41 +0x3c
github.com/spf13/cobra.(*Command).execute(0xc0001f0300, {0xc00012fe10, 0x1, 0x1})
        github.com/spf13/[email protected]/command.go:940 +0x87c
github.com/spf13/cobra.(*Command).ExecuteC(0xc0001f0000)
        github.com/spf13/[email protected]/command.go:1068 +0x3a5
github.com/spf13/cobra.(*Command).Execute(...)
        github.com/spf13/[email protected]/command.go:992
github.com/spf13/cobra.(*Command).ExecuteContext(...)
        github.com/spf13/[email protected]/command.go:985
github.com/bom-squad/go-cli/cmd/cli.Execute()
        github.com/bom-squad/go-cli/cmd/cli/main.go:36 +0x68
main.main()
        github.com/bom-squad/go-cli/main.go:8 +0xf

Perhaps you could address this issue as well. Thanks!

@veramine
Copy link

veramine commented Sep 19, 2023

Yep, added two more checks. protobom/protobom#110

@wei-deepbits
Copy link
Collaborator Author

Hey @veramine , In pull request protobom/protobom#109, you've implemented a check to return an error if the native serializer doesn't return a valid root node. However, it's worth noting that the native serializer is unable to handle all SPDX format SBOM files generated by Syft. In light of this limitation, I suggest that we should work on finding a way to handle these files correctly, instead of simply returning an error.

@wei-deepbits
Copy link
Collaborator Author

Hi @veramine , what are your thoughts on this?

@veramine
Copy link

Ahhh let me check with @puerco about what to do with an sbom with no root. Could you attach a valid SBOM here that returns an error?

@veramine
Copy link

veramine commented Oct 12, 2023

I tried the /home/wei/code/repos/python/abhiTronix/vidgear syft-generated SBOM and protobom was able to ingest that ok. But attach some specific troublesome examples here and I'll see if we can find a way to handle them. Thanks!

@wei-deepbits
Copy link
Collaborator Author

@veramine , A Syft-generated SPDX SBOM won't trigger an error; instead, it will result in a "serializing SBOM to native format: No SBOM root component" message, halting the conversion to CycloneDX format. It renders our tool unusable for it purposes.

@veramine
Copy link

Aha, I see that.

@veramine
Copy link

In that sample, there is a root node, it's below.. why do we think we don't have a root node, I wonder.. Investigating..

{
"spdxVersion": "SPDX-2.3",
"dataLicense": "CC0-1.0",
"SPDXID": "SPDXRef-DOCUMENT",
"name": "/home/wei/code/repos/python/abhiTronix/vidgear",
"documentNamespace": "https://anchore.com/syft/dir/home/wei/code/repos/python/abhiTronix/vidgear-710e3edc-b86e-4c5a-b71a-2f81c876b318",
"creationInfo": {
"licenseListVersion": "3.21",
"creators": [
"Organization: Anchore, Inc",
"Tool: syft-0.85.0"
],
"created": "2023-07-19T15:20:12Z"
}

@veramine
Copy link

We need to find a root node in this situation. I filed a protobom issue to look at it closer: protobom/protobom#126

@wei-deepbits
Copy link
Collaborator Author

@veramine , could you please take a look at this pull request? Thanks.

return
}

SBOM_CONVERT_PATH := "../../dist/sbom-convert_linux_amd64_v1/sbom-convert"
Copy link

@veramine veramine Dec 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got stuck here momentarily as I was testing on mac/darwin but figured it out.

Copy link

@veramine veramine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine with me. I was able to replicate unittest and fuzztest locally. I do wonder if unittest might be more suited as a conformance test on protobom directly - you might get more from your effort by joining forces with @puerco on that effort. But if you want to continue to use this repo for testing, that's fine with me. I'll approve this. I'm not sure if we are ready for a deluge of fuzzing related bugs yet 😳 but I don't want to hold this up any longer than I already have!! Thanks for the contribution!

@wei-deepbits
Copy link
Collaborator Author

@veramine , I will add platform detection for the path of sbom-convert later and investigate whether we should test protobom directly. Thank you for your review.

@manifestori
Copy link
Collaborator

Hey sorry for being out for awhile, any updates @wei-deepbits?

}

func DownloadSBOMs() {
fileURL := "https://drive.usercontent.google.com/download?id=1LgGlq3g_H02mhzkc94cUd0zzxy0JhFim&export=download&authuser=0&confirm=t&uuid=483eac07-f1af-4356-abeb-4ba254e32b86&at=APZUnTWjSNLUgCQ8wwFZjsLS7Y36:1694113089657"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is scary. Who controls this content? Can it be changed without the end user's knowledge? I don't see a checksum, etc for us to use to know the target is what we expect?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @wei-deepbits, I noticed that you are downloading a 100mb SBOM, extracting, converting and drilling into the results. Is this supposed to be a unit test for the convert package?

I appreciate the effort you put in, but I'm afraid this doesn't qualify as a unit test. In unit tests, we test the functionality of the package itself, including the functions and methods, the constructor with happy/sad results, and cover as many cases as possible.

I argue that this is more of an e2e test, but unfortunately, the framework is not there yet. We should test the code state and not the binary we generated. Therefore, we should test the main entry point of the application, or at least the CLI.

Furthermore, we cannot download an external file from Google Drive with fixtures. Please commit a subset of test cases into this repository so we can know those files are secure. Put them under testdata folder, this way go compiler won't read them.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When comparing fixtures to results, you can use snapshots or gocmp with filters. Complex testing code is hard to maintain. Those tools use reflection to compare structs, making most of the comparison functionality obsolete.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do see the value of counting statistics. How about opening a PR that adds those statistics to the conversion results output? maybe a subcommand or a flag sbom-convert convert ... --show-diff? or sbom-convert convert --benchmark ?
We've discussed those a few times, of a way to show "what is data is missing from the original sbom"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @manifestori , appreciate your valuable suggestions. I will try to fix them accordingly.

f.Add(string(content))

f.Fuzz(func(t *testing.T, orig string) {
ParseStreamWrapper(orig)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is basically the test?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that, Looking neat. but what are we testing here? can you explain more how it works?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just the first step to do the fuzzing tests for our tool. It generates new inputs for the ParseStream() function in protobom and verifies whether it can cause a crash. It helped to find errors in unserializing. we can add more functions for testing.

Copy link

github-actions bot commented Feb 4, 2024

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

@github-actions github-actions bot added the Stale label Feb 4, 2024
Copy link

This PR was closed because it has been stalled for 10 days with no activity.

@github-actions github-actions bot closed this Feb 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants