Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relax validation of license name field #733

Closed

Conversation

justahero
Copy link
Contributor

The name field of the license type is specified differently in the schema files. The 1.5 JSON schema does not define any restriction or pattern for this field. On the other hand the 1.5 XML schema definition declares the name field a NormalizedString. The latter does not allow certain characters, e.g. \n, \t, to be present inside the string.

This change relaxes the validation of the license name field to allow validation for JSON loaded BOM files as well.

  • remove validation checks for license name field

The `name` field of the `license` type is specified differently in the
schema files. The [1.5 JSON schema](https://github.com/CycloneDX/specification/blob/master/schema/bom-1.5.schema.json#L753-L758) does not define any restriction or pattern for this field.
On the other hand the [1.5 XML schema definition](https://github.com/CycloneDX/specification/blob/master/schema/bom-1.5.xsd#L649-L653) declares the `name` field a `NormalizedString`. The latter does not allow certain characters, e.g. `\n`, `\t`, to be present inside the string.

This change relaxes the validation of the license name field to allow
validation for JSON loaded BOM files as well.

* remove validation checks for license name field

Signed-off-by: Sebastian Ziebell <[email protected]>
@justahero justahero requested a review from a team as a code owner June 24, 2024 12:23
@Shnatsel
Copy link
Contributor

What is the motivation here? Is the library found incompatible with some SBOMs found in the wild?

If we do this, we can potentially load more JSONs but then converting the SBOM to XML will fail, which is a footgun.

If the 1.6 draft schemas change this, I could see this being a bugfix we could just apply. But as it is, it looks like the specification contradicting itself to me, and I'd prefer to seek clarification upstream before doing anything. Are you in CycloneDX slack? I can add you if not.

@justahero justahero marked this pull request as draft June 24, 2024 13:11
@justahero
Copy link
Contributor Author

Hi @Shnatsel ,

@lfrancke stumbled upon a validation error after loading a BOM file in JSON format. When I checked the schema definitions for the name license field, I discovered that these were defined differently, the XML schema definition is stricter (expects a NormalizedString). Basically I prepared this PR in case that it's okay to relax the validation, but it's totally fine to close it.

Are you in CycloneDX slack? I can add you if not.

That would be good, I think Lars has moved the discussion to Slack.

@lfrancke
Copy link
Contributor

@Shnatsel
Copy link
Contributor

Conclusions from the Slack discussion:

  1. These restrictions only exist in XML, and including \n in JSON is valid. The presence of \n should not fail validation.
  2. The fully correct way to handle this is to turn NormalizedString into a newtype around String without any constraints, and make it replace the three forbidden characters with a space ONLY when serializing to XML or deserializing from XML. Ideally the XML implementation would take care of that for us, but xml-rs sadly doesn't implement that.

@justahero do you think this is feasible?

Relevant snippet of Slack discussion:

Screenshot from 2024-06-28 15 46 32

@justahero
Copy link
Contributor Author

Hi @Shnatsel ,

These restrictions only exist in XML, and including \n in JSON is valid. The presence of \n should not fail validation.

My understanding is other XML libraries know how to handle the normalizedstring type defined for strings in XML. When parsing a normalizedstring field the replacement happens immediately. After parsing the XML the value will only include the white space characters.

The fully correct way to handle this is to turn NormalizedString into a newtype around String without any constraints, and make it replace the three forbidden characters with a space ONLY when serializing to XML or deserializing from XML. Ideally the XML implementation would take care of that for us, but xml-rs sadly doesn't implement that.

You are right, the serialization code using xml-rs does not handle these strings correctly. One difficulty is all types defined in the /spec folder only use the String type, therefore the code at this point does not know about normalizedstring even though the XML specification expects them. We could introduce a NewType for all types defined in the /spec folder that acts like a normal String as you mentioned. The serialization code with serde_json should be straightforward to adapt. Then we could implement the FromXml & ToXml traits for this new string type, dealing with the replacement of char when objects are serialized or deserialized. This way the code will be closer to comparable XML parsers. Defining all fields with this new string type enforces to use the serialization logic, instead of a calling read_simple_tag alone.

To illustrate

pub struct NewStringType(pub String);

impl FromXml for NewStringType {
    fn read_xml_element<R: std::io::Read>(
        event_reader: &mut xml::EventReader<R>,
        element_name: &xml::name::OwnedName,
        _attributes: &[xml::attribute::OwnedAttribute],
    ) -> Result<Self, crate::errors::XmlReadError>
    where
        Self: Sized,
    {
        let read_string = read_simple_tag(event_reader, element_name)?;
        // replace forbidden chars
        Ok(NewStringType(read_string))
    }
}

// then to parse a `normalizedstring` using the `NewStringType`, applying replacement of forbidden chars
let vendor = NewStringType::read_xml_element(&event_reader, &name, &attributes)?;

Is this close to what you had in mind?

@justahero
Copy link
Contributor Author

justahero commented Jun 28, 2024

I feel this warrants its own GH issue, as it requires more work to fix the serialization code. @Shnatsel, your comments got me further thinking about the serialization issue. I'm now uncertain if we should use the NormalizedString in the general models in the /models folder at all. The NormalizedString type is specific to the XML specification, it's not present in the JSON schema afaict.

@Shnatsel
Copy link
Contributor

Is this close to what you had in mind?

Yes, that's exactly it. Plus a similar replacement pass in serialization too.

I'm now uncertain if we should use the NormalizedString in the general models in the /models folder at all. The NormalizedString type is specific to the XML specification, it's not present in the JSON schema afaict.

I agree. I think we should not use NormalizedString in the models.

@justahero
Copy link
Contributor Author

I would like to close this PR in favour of #737

@Shnatsel
Copy link
Contributor

SGTM

@Shnatsel Shnatsel closed this Jun 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants