Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose (opt-in) unknown members during deserialization #593

Open
airbreather opened this issue Mar 4, 2021 · 5 comments
Open

Expose (opt-in) unknown members during deserialization #593

airbreather opened this issue Mar 4, 2021 · 5 comments

Comments

@airbreather
Copy link
Contributor

See also: #529, which is essentially this same feature request, but it was closed without a resolution.

DeserializerBuilder.IgnoreUnmatchedProperties() is a bit limiting: either unmatched properties trigger an exception, or they get ignored. What I'd like to see is a way for the value being deserialized to opt-in to receiving the extra values.

The model would implement something like this:

public interface ICanBeExtended
{
    void Extend(string propertyName, YamlNode extensionData);
}

public class MyExtensibleObject : ICanBeExtended
{
    public string KnownValue1 { get; set; }

    public Dictionary<string, YamlNode> Extensions { get; } = new Dictionary<string, YamlNode>();

    void ICanBeExtended.Extend(string propertyName, YamlNode extensionData)
    {
        Extensions.Add(propertyName, extensionData);
    }
}

The DeserializerBuilder could get a fluent configuration method like .UnmatchedPropertiesAreExtensions() or something.

Implementation suggestion would be to change this block to something like:

if (!(value is ICanBeExtended canBeExtended))
{
    parser.SkipThisAndNestedEvents();
    continue;
}

YamlNode node;
if (parser.Accept(out Scalar scalar))
{
    node = new YamlScalarNode();
}
else if (parser.Accept(out MappingStart mappingStart))
{
    node = new YamlMappingNode();
}
else if (parser.Accept(out SequenceStart sequenceStart))
{
    node = new YamlSequenceNode();
}
else
{
    // maybe we should throw YamlException here instead?
    parser.SkipThisAndNestedEvents();
    continue;
}

((IYamlConvertible)node).Read(parser, null, null);
canBeExtended.Extend(propertyName.Value, node);

Additional context
The use case I have here is a YAML-based file format that's being created primarily by a centralized team, but where external teams may be able to use plugin interfaces to add their own processing at certain well-defined extension points.

As things stand right now, I'm faced with a choice between a few suboptimal solutions:

  1. Deserialize by hand / use the YamlStream model (this is actually how I started, but it's getting really tedious).
  2. Those external teams just aren't allowed to access any data from the input file, unless it fits into a slot that the centralized team already put onto the CLR type.
  3. Reimplement a significant fraction of ObjectNodeDeserializer and its dependencies, or an equivalent like implementing IYamlTypeConverter or IYamlConvertible to take complete control over the deserialization... this is obviously the most flexible option, but it's a really frustrating amount of work reinventing the wheel to implement this tiny bit of extra functionality.
@airbreather
Copy link
Contributor Author

As things stand right now, I'm faced with a choice between a few suboptimal solutions:

Another workaround can be to taint the YAML file a bit, which is probably what I'm going to do.

What I wanted:

prop1: xyz
prop2: zyx
extra1: abc
// defined in YamlDotNet:
public interface IHaveExtraData
{
    bool TryReadExtraData(IParser parser, string propertyName, ObjectDeserializer nestedObjectDeserializer);

    void WriteExtraData(IEmitter emitter, ObjectSerializer nestedObjectSerializer);
}

// defined in my code:
public sealed class MyModelThing : IHaveExtraData
{
    public string Prop1 { get; set; }

    public string Prop2 { get; set; }

    public YamlMappingNode ExtraData { get; } = new YamlMappingNode();

    bool IHaveExtraData.TryReadExtraData(IParser parser, string propertyName, ObjectDeserializer nestedObjectDeserializer)
    {
        YamlNode node;
        if (parser.Accept(out Scalar _))
        {
            node = new YamlScalarNode();
        }
        else if (parser.Accept(out SequenceStart _))
        {
            node = new YamlSequenceNode();
        }
        else if (parser.Accept(out MappingStart _))
        {
            node = new YamlMappingNode();
        }
        else
        {
            return false;
        }

        ((IYamlConvertible)node).Read(parser, null, null);
        this.ExtraData.Add(propertyName, node);
        return true;
    }

    void IHaveExtraData.WriteExtraData(IEmitter emitter, ObjectSerializer nestedObjectSerializer)
    {
        foreach ((YamlNode key, YamlNode val) in this.ExtraData)
        {
            ((IYamlConvertible)key).Write(emitter, null);
            ((IYamlConvertible)val).Write(emitter, null);
        }
    }
}

Since I do control the format, I can get around it like this:

prop1: xyz
prop2: zyx
extraData:
  extra1: abc
public sealed class MyModelThing
{
    public string Prop1 { get; set; }

    public string Prop2 { get; set; }

    public YamlMappingNode ExtraData { get; set; } = new YamlMappingNode();
}

It's annoying to push this implementation detail all the way out into the YAML files, so I don't want to close this request for first-class support in YamlDotNet, but I did want to mention this other workaround.

@rcdailey
Copy link

@EdwardCooke Could you provide some guidance on design and implementation that would make a PR for this likely to be accepted? I'd like to do the work, but don't want to waste my time putting up a PR that won't pass muster.

Personally, I'd approach this exactly like Newtonsoft.Json does with their JsonExtensionData attribute:

public record MyType
{
    public string ExpectedProperty1 { get; init; }
    public string ExpectedProperty2 { get; init; }

    [YamlExtensionData]
    public IDictionary<string, object> ExtraYaml { get; init; }
}

Above, any properties found in the YAML during deserialization that are not present in MyType would have their property name and value deserialized into ExtraYaml. The name of the property itself is inconsequential.

I dug into the code a bit, which I am entirely unfamiliar with. It looks like ObjectNodeDeserializer might be the place to implement this:

  1. Search the type for any property having the YamlExtensionData attribute. Ensure that type implements IDictionary.
  2. If found, set ignoreUnmatched to true to avoid exceptions thrown in TypeInspectorSkeleton.GetProperty().
  3. If a property was found from step 1, add the property name and value to the dictionary.

I'm sure I'm missing some important aspects of this, but I think this is enough to determine if the approach I had in mind is suitable, and if not, which approach the maintainers are willing to accept a PR for.

Thank you.

@airbreather
Copy link
Contributor Author

Personally, I'd approach this exactly like Newtonsoft.Json does with their JsonExtensionData attribute:

From my perspective, this looks like a good idea too, I would just request that it work if the value type is YamlNode instead of only allowing object, since that's a more explicit way for me to say "this is coming from a YAML file, and I don't know which of these 3 types it is, but it's definitely one of those" instead of "it could literally be anything" like object says.

@EdwardCooke
Copy link
Collaborator

I do like the proposed solution with a couple changes. I would probably not change the value of the ignoreunmatched and instead change the method signature on get property to pass a bool to override that behavior only if a property has the yaml extension data attribute. Object would probably better to match the rest of the resulting object types when deserializing to a dictionary. Another comment would be that there is no guarantee the key will be a string. It can actually be anything including another object. If the option to attempt type conversion to the real type is enabled it could be int, short, datetime, bool, etc.

The reason I don’t think we should do yamlnode as the required value type is because then the consumer would have to deal with converting it to the correct object type which would just be annoying and add unnecessary complexity to the consuming application. Maybe do a check if the value type is yamlnode and set it with that may be a good way to go?

I can respond in greater detail later when I’m at a computer to really dig more into the node deserializers. That’s some decently complex code with it having nested serializers and things like that.

@rcdailey
Copy link

rcdailey commented Mar 12, 2023

Another comment would be that there is no guarantee the key will be a string

In these scenarios, I would expect an exception to be thrown, as if the YamlExtensionData attribute did not exist. I very much expect the extension data to only work with "key value pairs" (or, in other words, only work with keys that are strings) as I believe is representative of most situations. Honestly I have personally never used YAML in such a way that the key wouldn't be a string; so I can't really visualize or understand this corner case.

EDIT: Actually it seems like in ObjectNodeDeserializer, it's already assumed the key is a string because it maps to a property name. The property name that is resolved is essentially what I would use as the key.

I would probably not change the value of the ignoreunmatched and instead change the method signature on get property to pass a bool to override that behavior only if a property has the yaml extension data attribute

An example would help here. The reason I suggested setting it to true in the attribute case is because that is the override, we just aren't modifying the signature of GetProperty() to do it. At first blush, I considered adding the logic we're talking about to GetProperty() but that doesn't seem the right place for it. Keeping in mind I'm unfamiliar with the code, I say it's not the right place because its purpose is to find a property and return it, not concern itself with adding to a collection.

However this isn't a strong opinion of mine. I just need some examples and/or further explanation to understand your intent.

Thanks for the quick feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants