Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic type resolution during deserialization based on serialized content #605

Closed
nitz opened this issue Apr 11, 2021 · 6 comments
Closed

Comments

@nitz
Copy link

nitz commented Apr 11, 2021

Is your feature request related to a problem? Please describe.

This one is a bit of a doozy, and touches on several issues I've found are already open or related. To tag them quickly: The workflow I'm looking to support is related somewhat to the feature added in #601, and partially described or would be affected by the topics in #408, #459, and maybe tangent related to #529 and #593.

Before I get started here as well: I am perfectly ready to be told "you dummy, it's super easy and you're just missing it!" So please do point out if I'm just glazing over features I shouldn't have!

Essentially, I've got some YAML I don't control. Nodes are untagged, but differ slightly in what they may contain. Other applications using this YAML react to each node by handling the node as a specific type based on the appearance of specific keys in direct children of the node. This is my goal as well.

With (my understanding) of the numerous modifications I can make to the parse/deserialization process, it seems I'm very close to being able to react the way I'd like, but each method I've tried to solve it has had some various shortcomings ranging from not able to have enough information to construct the proper type, to probably being able to handle it but am left parsing out the entire object on my own. I've got a more concrete example in the section below.

Example problem

I apologize if this gets long winded or too verbose, just want to try and paint a complete picture. Please excuse any syntax errors, as I'm just typing this up all freeform! 🙂

I'll start with two example documents, and the data classes used to represent them.

First, my list of foos.

# My Foos
name: "Example A"
url: "http://example.com/a"
timeout: 30
max_tries: 3
# other properties follow...
---
name: "Example B"
url: "http://example.com/b"
# other properties follow...
---
name: "Example C"
ftp: "192.0.2.14"
# notice no `url` key, instead `ftp`.
# different properties follow...

Second, my foo settings (and other settings) file.

# My Foo Settings, each key under defaults would be 
# a specific concrete foo to hold all the related data to their types.
defaults:
  all:
    timeout: 30
  url:
    max_tries: 1
  ftp:
    timeout: 10
  # other keys here.
verbose: true
# other various settings.

And now, the .NET data model:

// I'd like to keep objects as immutable, but that can be a future problem ;)
interface IFoo 
{
    string Name { get; set; }
    int Timeout { get; set; }
}

class FooBase : IFoo
{
    /* assume public virtual IFoo impl. */
}

class UrlFoo : FooBase
{
    public virtual string Url { get; set; }
    public virtual int MaxTries { get; set; }
    /* other properties as needed */
}

class FtpFoo : FooBase
{
    public virtual string Ftp { get; set; }
    /* other properties as needed */
}

// other foo base impls to follow.

class Settings
{
    public Dictionary<string, IFoo> Defaults { get; set; }
    public bool Verbose { get; set; }
}

Then, using that model to deserialize the settings:

var deser = new DeserializerBuilder()
    .WithNamingConvention(UnderscoreNamingConvention.Instance)
    //.With{TypeResolver, NodeTypeResolver, ObjectFactory, TypeInspector, NodeDeserializer, TypeConverter}(...)
    .Build()
var defaultResult = deser.Deserialize<Settings>(settingsDocText);

And also using that model to deserialize the list of various foos:

var deser = new DeserializerBuilder()
    .WithNamingConvention(UnderscoreNamingConvention.Instance)
    //.With{TypeResolver, NodeTypeResolver, ObjectFactory, TypeInspector, NodeDeserializer, TypeConverter}(...)
    .Build()
var stream = new StringReader(fooDocText);
var parser = new Parser(stream);
var fooList = new List<IFoo>();

parser.Consume<StreamStart>();
while (parser.Accept<DocumentStart>(out var docStartEvent))
{
    var foo = deser.Desrialize<IFoo>(parser);
    fooList.Add(foo);
}

As it stands, each of the With{...} methods I've tried to use to augment the deserializing behavior falls short in the following ways:

  • TypeResolver — I thought at first this was the right path, but docs are a little scarse at the moment. Perhaps I'm just misreading or misusing it, but my custom type resolver never gets invoked.
  • NodeTypeResolver — This, I believe, is the closest to what I desire after seeing that TypeResolver probably wasn't what I wanted. The Resolve() method comes with the node event and the type it currently wants. So when I see something come in as an IFoo, then I can react and provide the proper type. The problem, however, is that the node event is the MappingStart (which makes sense, as that's when the object would be created...), but I can't investigate any details about the child nodes, so I can't provide the proper Type here.
  • ObjectFactory — This made the second most sense to me, as it's requested to create the type IFoo, so I can handle that and proxy everything else to a fallback factory, like the doc examples do. It's not quite as elegant as providing the type in the NodeTypeResolver, because this all but guarantees I can't have my objects be immutable. As well, it suffers from the same issue the NodeTypeResolver does, in that the only information provided to the factory about the object the deserializer wishes to create is the type itself. I once again can't inspect child nodes to react properly.
  • TypeInspector — I may be missing something on this one with brief or unfinished docs, but reading the code, this one seems like it's not designed to solve what I'm aiming for and is more for matching up the actual YAML to the named properties. TypeInspector does seem like the proper route to use in order to map to get/init properties so that I can still assign them after object creaton.
  • NodeDeserializer and IYamlTypeConverter — Both of these seem very related in what they accomplish, with just slight differences in API. NodeDeserializer felt the most capable as it came along with a nestedObjectDeserializer, as noted in TypeConverters should be able to deserialize complex types #459 (IYamlTypeConverter seems to fall short there.) As well, I implemented my NodeDeserializer as in the docs, bringing along an ObjectSerializer to pass on node types I was uninterested in handling. So now I'm in a place where I'm able to actually see the data, and know when I want to create an IFoo! The drawback now is I have to manually parse my IFoo and everything underneath it. What's made more difficult to manage is: I can't actually create the proper object type until I start parsing down, potentially going past 1 or more children nodes that I'm not able to use to make a determination of the proper type I need. So now I'm left juggling parsed or partially parsed events until I can create the proper type, and then I have to fill in the object myself. (Unless I'm missing how to use the nestedObjectDeserializer or my inner deserializer to leverage the rest of the deserialization process but while providing the proper object type.)

Okay, that was a lot of text. I tried to clean it up and make as much sense as I could, but it's still a mess. Please don't feel bad asking for clarification of my balderdash.

Describe the solution you'd like

I'd like a way, (probably through the INodeTypeResolver or IObjectFactory mechanisms, but where is not incredibly important) to be able to dynamically create a concrete .NET object that will populate an interface or base class in the resulting deserialized object. Ideally, allowing automatic parsing of the object (and it's children) after creating it's concrete instance.

Describe alternatives you've considered

As I'm also consuming JSON, Json.NET has a mechanism that allows for this fairly easy. Because the Json.NET parser can represent the current deserialization state as a completely tokenized object graph through it's JObject (et. al.) class, I'm able to solve my problem via a small JsonConverter. The reason it's able to work in this way, is because the converter is provided the entire node (and the entire hierarchy from that point downward,) allowing the typical deserialization consumption functions to be leveraged.

Assuming my data structures are similar to the example from above, my json converter looks something like this:

using Netwonsoft.Json;
using Newtonsoft.Json.Linq;

class JsonFooConverter : JsonConverter
{
    public override bool CanConvert(Type objectType)
        => typeof(IFoo).IsAssignableFrom(objectType);

    public override object ReadJson(JsonReader reader,
                                               Type objectType,
                                               object existingValue,
                                               JsonSerializer serializer)
    {
        // Load JObject from stream
        JObject jObject = JObject.Load(reader);

        // Create target object based on JObject
        IFoo deserialized= Create(jObject);

        // todo: provide contract resolver that will let me map to private setters/etc.

        // Populate the object's properties -- this is where the magic happens.
        serializer.Populate(jObject.CreateReader(), deserialized);

        return deserialized;
    }

    private IFoo Create(JObject jObject)
    {
        /* todo: a foo registry of sorts */
        if (jObject.TryGetValue("url", out JToken _) { return new UrlFoo(); }
        if (jObject.TryGetValue("ftp", out JToken _) { return new FtpFoo(); }
        /* other foo types to follow */
        return new FooBase();
    }
}

Since this method works, my current (yikes) workaround is converting to JSON, and then deserializing using it. 😬

Additional context

Okay, so that was a lot. But no one has ever said I don't attempt to be thorough.

I appreciate any thoughts or feedback, and am also very welcome to being shown the right way to do things on the (very likely) chance I missed it.

Thanks for reading!

@atruskie
Copy link
Contributor

@nitz, If I've understood you correctly, I've got a working solution to this using this set of modifications: https://gist.github.com/atruskie/bfb7e9ee3df954a29cbc17bdf12405f9

I can determine the concrete deserialization type based on the presence of a key/field name, or the value of field.

Original issue: #343
My comment: #343 (comment)

@nitz
Copy link
Author

nitz commented Apr 12, 2021

Woah, this... looks like it might work exactly like what I'm aiming for! Might be a bit wordier than I'd hope for, but is far and away cleaner than "converting to json, then parsing it with that" 😅

Also, I felt like I looked through the issues pretty good and didn't notice that one. Guess I glazed over when I read k8s.

I will give this a shot today, but I'm betting based on your explanation and how that thread sounds that this should be right where I want to be.

@nitz
Copy link
Author

nitz commented Apr 15, 2021

And in usual me fashion, somehow 'later today' becomes 3 days later. 🙃

First, I wanted to say thanks @atruskie for not only writing this a while back, but for taking the time to come share it with me.

Anyways, I got a chance to implement this for the most part in my project, trying to genericize it as I did. Would be better for my use case where the parser will be told at runtime about the types it needs to process, and I was thinking if it ends up doing what I like then to clean it up and submit it as a PR or the sort. (An aside: you didn't explicitly license the code in your gist, I wanted to make sure to ask you about that!)

I was able to get it working with cases like you demonstrated in your example easily, which is really 90% of the lifting I needed! The one roadbump I'm hitting now is a bit of what would be an edge case for the way it's written. I started to "dumb fix" it before I realized I should take a step back and come ask for opinions about the best way to approach this edge case.

The case I'm hitting is essentially a multi-document YAML file with 0 to N amounts of nodes that would be best described as the AggregateExpectation case. But here's the rub: the documents are all induvial mappings, so they don't have the segment_with scalar to switch off of. My first thought was to use a mapping in the aggregate that has an empty string as the target key but I don't quite make it that far today — I'm hitting an Expected 'SequenceStart' got 'MappingStart', I'm guessing because it's technically multi-document and not an array.

So once I got that worked out I'll circle back around to this bit. Super excited to see if I can get this going :) Thanks again!

@atruskie
Copy link
Contributor

atruskie commented Apr 15, 2021

@nitz updated gist with MIT license.

You'll need some sort of discriminator to choose the type. Avoiding the specific names I used for my examples (that's confusing), can you identify what your discriminator is?

  • the presence of some mapping key
  • or the value of some common mapping key

I don't think the multi-document aspect of this should matter. Working off of your example you'd want something like the following:

    public class FooTypeResolver : ITypeDiscriminator
    {
        private readonly Dictionary<string, Type> typeLookup;

        public FooTypeResolver(INamingConvention namingConvention)
        {
            typeLookup = new Dictionary<string, Type>() {
                { "url", typeof(UrlFoo) },
                { "ftp" , typeof(FtpFoo) },
               // more here...
            };
        }

        public Type BaseType => typeof(FooBase);

        public bool TryResolve(ParsingEventBuffer buffer, out Type suggestedType)
        {
            if (buffer.TryFindMappingEntry(
                scalar => typeLookup.ContainsKey(scalar.Value),
                out Scalar key,
                out ParsingEvent _))
            {
                suggestedType = typeLookup[key.Value];
                return true;
            }

            suggestedType = null;
            return false;
        }
    }

I'm not sure what the Expected 'SequenceStart' got 'MappingStart' is due to. I'd need a stack trace.

If we're to continue this discussion further, it might be good to do so in the gist comments, so we don't fill this repo with discussion unrelated to code that is not currently in the repo.

@nitz
Copy link
Author

nitz commented Apr 15, 2021

I'm not sure what the Expected 'SequenceStart' got 'MappingStart' is due to. I'd need a stack trace.

Yeah I think it's because I'm trying to call deserialize with IFoo[] as a type parameter. I'm in the bed at the moment looking at how to handle multiple documents. I should hopefully fix that up in the morning! I'll take the bit about this code over to the gist!

@EdwardCooke
Copy link
Collaborator

Since we added a feature a while back for this exact use case I’m closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants