Dynamic type resolution during deserialization based on serialized content #605

nitz · 2021-04-11T15:35:31Z

Is your feature request related to a problem? Please describe.

This one is a bit of a doozy, and touches on several issues I've found are already open or related. To tag them quickly: The workflow I'm looking to support is related somewhat to the feature added in #601, and partially described or would be affected by the topics in #408, #459, and maybe tangent related to #529 and #593.

Before I get started here as well: I am perfectly ready to be told "you dummy, it's super easy and you're just missing it!" So please do point out if I'm just glazing over features I shouldn't have!

Essentially, I've got some YAML I don't control. Nodes are untagged, but differ slightly in what they may contain. Other applications using this YAML react to each node by handling the node as a specific type based on the appearance of specific keys in direct children of the node. This is my goal as well.

With (my understanding) of the numerous modifications I can make to the parse/deserialization process, it seems I'm very close to being able to react the way I'd like, but each method I've tried to solve it has had some various shortcomings ranging from not able to have enough information to construct the proper type, to probably being able to handle it but am left parsing out the entire object on my own. I've got a more concrete example in the section below.

Example problem

I apologize if this gets long winded or too verbose, just want to try and paint a complete picture. Please excuse any syntax errors, as I'm just typing this up all freeform! 🙂

I'll start with two example documents, and the data classes used to represent them.

First, my list of foos.

# My Foos
name: "Example A"
url: "http://example.com/a"
timeout: 30
max_tries: 3
# other properties follow...
---
name: "Example B"
url: "http://example.com/b"
# other properties follow...
---
name: "Example C"
ftp: "192.0.2.14"
# notice no `url` key, instead `ftp`.
# different properties follow...

Second, my foo settings (and other settings) file.

# My Foo Settings, each key under defaults would be 
# a specific concrete foo to hold all the related data to their types.
defaults:
  all:
    timeout: 30
  url:
    max_tries: 1
  ftp:
    timeout: 10
  # other keys here.
verbose: true
# other various settings.

And now, the .NET data model:

// I'd like to keep objects as immutable, but that can be a future problem ;)
interface IFoo 
{
    string Name { get; set; }
    int Timeout { get; set; }
}

class FooBase : IFoo
{
    /* assume public virtual IFoo impl. */
}

class UrlFoo : FooBase
{
    public virtual string Url { get; set; }
    public virtual int MaxTries { get; set; }
    /* other properties as needed */
}

class FtpFoo : FooBase
{
    public virtual string Ftp { get; set; }
    /* other properties as needed */
}

// other foo base impls to follow.

class Settings
{
    public Dictionary<string, IFoo> Defaults { get; set; }
    public bool Verbose { get; set; }
}

Then, using that model to deserialize the settings:

var deser = new DeserializerBuilder()
    .WithNamingConvention(UnderscoreNamingConvention.Instance)
    //.With{TypeResolver, NodeTypeResolver, ObjectFactory, TypeInspector, NodeDeserializer, TypeConverter}(...)
    .Build()
var defaultResult = deser.Deserialize<Settings>(settingsDocText);

And also using that model to deserialize the list of various foos:

var deser = new DeserializerBuilder()
    .WithNamingConvention(UnderscoreNamingConvention.Instance)
    //.With{TypeResolver, NodeTypeResolver, ObjectFactory, TypeInspector, NodeDeserializer, TypeConverter}(...)
    .Build()
var stream = new StringReader(fooDocText);
var parser = new Parser(stream);
var fooList = new List<IFoo>();

parser.Consume<StreamStart>();
while (parser.Accept<DocumentStart>(out var docStartEvent))
{
    var foo = deser.Desrialize<IFoo>(parser);
    fooList.Add(foo);
}

As it stands, each of the With{...} methods I've tried to use to augment the deserializing behavior falls short in the following ways:

TypeResolver — I thought at first this was the right path, but docs are a little scarse at the moment. Perhaps I'm just misreading or misusing it, but my custom type resolver never gets invoked.
NodeTypeResolver — This, I believe, is the closest to what I desire after seeing that TypeResolver probably wasn't what I wanted. The Resolve() method comes with the node event and the type it currently wants. So when I see something come in as an IFoo, then I can react and provide the proper type. The problem, however, is that the node event is the MappingStart (which makes sense, as that's when the object would be created...), but I can't investigate any details about the child nodes, so I can't provide the proper Type here.
ObjectFactory — This made the second most sense to me, as it's requested to create the type IFoo, so I can handle that and proxy everything else to a fallback factory, like the doc examples do. It's not quite as elegant as providing the type in the NodeTypeResolver, because this all but guarantees I can't have my objects be immutable. As well, it suffers from the same issue the NodeTypeResolver does, in that the only information provided to the factory about the object the deserializer wishes to create is the type itself. I once again can't inspect child nodes to react properly.
TypeInspector — I may be missing something on this one with brief or unfinished docs, but reading the code, this one seems like it's not designed to solve what I'm aiming for and is more for matching up the actual YAML to the named properties. TypeInspector does seem like the proper route to use in order to map to get/init properties so that I can still assign them after object creaton.
NodeDeserializer and IYamlTypeConverter — Both of these seem very related in what they accomplish, with just slight differences in API. NodeDeserializer felt the most capable as it came along with a nestedObjectDeserializer, as noted in TypeConverters should be able to deserialize complex types #459 (IYamlTypeConverter seems to fall short there.) As well, I implemented my NodeDeserializer as in the docs, bringing along an ObjectSerializer to pass on node types I was uninterested in handling. So now I'm in a place where I'm able to actually see the data, and know when I want to create an IFoo! The drawback now is I have to manually parse my IFoo and everything underneath it. What's made more difficult to manage is: I can't actually create the proper object type until I start parsing down, potentially going past 1 or more children nodes that I'm not able to use to make a determination of the proper type I need. So now I'm left juggling parsed or partially parsed events until I can create the proper type, and then I have to fill in the object myself. (Unless I'm missing how to use the nestedObjectDeserializer or my inner deserializer to leverage the rest of the deserialization process but while providing the proper object type.)

Okay, that was a lot of text. I tried to clean it up and make as much sense as I could, but it's still a mess. Please don't feel bad asking for clarification of my balderdash.

Describe the solution you'd like

I'd like a way, (probably through the INodeTypeResolver or IObjectFactory mechanisms, but where is not incredibly important) to be able to dynamically create a concrete .NET object that will populate an interface or base class in the resulting deserialized object. Ideally, allowing automatic parsing of the object (and it's children) after creating it's concrete instance.

Describe alternatives you've considered

As I'm also consuming JSON, Json.NET has a mechanism that allows for this fairly easy. Because the Json.NET parser can represent the current deserialization state as a completely tokenized object graph through it's JObject (et. al.) class, I'm able to solve my problem via a small JsonConverter. The reason it's able to work in this way, is because the converter is provided the entire node (and the entire hierarchy from that point downward,) allowing the typical deserialization consumption functions to be leveraged.

Assuming my data structures are similar to the example from above, my json converter looks something like this:

using Netwonsoft.Json;
using Newtonsoft.Json.Linq;

class JsonFooConverter : JsonConverter
{
    public override bool CanConvert(Type objectType)
        => typeof(IFoo).IsAssignableFrom(objectType);

    public override object ReadJson(JsonReader reader,
                                               Type objectType,
                                               object existingValue,
                                               JsonSerializer serializer)
    {
        // Load JObject from stream
        JObject jObject = JObject.Load(reader);

        // Create target object based on JObject
        IFoo deserialized= Create(jObject);

        // todo: provide contract resolver that will let me map to private setters/etc.

        // Populate the object's properties -- this is where the magic happens.
        serializer.Populate(jObject.CreateReader(), deserialized);

        return deserialized;
    }

    private IFoo Create(JObject jObject)
    {
        /* todo: a foo registry of sorts */
        if (jObject.TryGetValue("url", out JToken _) { return new UrlFoo(); }
        if (jObject.TryGetValue("ftp", out JToken _) { return new FtpFoo(); }
        /* other foo types to follow */
        return new FooBase();
    }
}

Since this method works, my current (yikes) workaround is converting to JSON, and then deserializing using it. 😬

Additional context

Okay, so that was a lot. But no one has ever said I don't attempt to be thorough.

I appreciate any thoughts or feedback, and am also very welcome to being shown the right way to do things on the (very likely) chance I missed it.

Thanks for reading!

The text was updated successfully, but these errors were encountered:

atruskie · 2021-04-12T01:45:53Z

@nitz, If I've understood you correctly, I've got a working solution to this using this set of modifications: https://gist.github.com/atruskie/bfb7e9ee3df954a29cbc17bdf12405f9

I can determine the concrete deserialization type based on the presence of a key/field name, or the value of field.

Original issue: #343
My comment: #343 (comment)

nitz · 2021-04-12T13:08:23Z

Woah, this... looks like it might work exactly like what I'm aiming for! Might be a bit wordier than I'd hope for, but is far and away cleaner than "converting to json, then parsing it with that" 😅

Also, I felt like I looked through the issues pretty good and didn't notice that one. Guess I glazed over when I read k8s.

I will give this a shot today, but I'm betting based on your explanation and how that thread sounds that this should be right where I want to be.

nitz · 2021-04-15T03:56:04Z

And in usual me fashion, somehow 'later today' becomes 3 days later. 🙃

First, I wanted to say thanks @atruskie for not only writing this a while back, but for taking the time to come share it with me.

Anyways, I got a chance to implement this for the most part in my project, trying to genericize it as I did. Would be better for my use case where the parser will be told at runtime about the types it needs to process, and I was thinking if it ends up doing what I like then to clean it up and submit it as a PR or the sort. (An aside: you didn't explicitly license the code in your gist, I wanted to make sure to ask you about that!)

I was able to get it working with cases like you demonstrated in your example easily, which is really 90% of the lifting I needed! The one roadbump I'm hitting now is a bit of what would be an edge case for the way it's written. I started to "dumb fix" it before I realized I should take a step back and come ask for opinions about the best way to approach this edge case.

The case I'm hitting is essentially a multi-document YAML file with 0 to N amounts of nodes that would be best described as the AggregateExpectation case. But here's the rub: the documents are all induvial mappings, so they don't have the segment_with scalar to switch off of. My first thought was to use a mapping in the aggregate that has an empty string as the target key but I don't quite make it that far today — I'm hitting an Expected 'SequenceStart' got 'MappingStart', I'm guessing because it's technically multi-document and not an array.

So once I got that worked out I'll circle back around to this bit. Super excited to see if I can get this going :) Thanks again!

atruskie · 2021-04-15T04:11:26Z

@nitz updated gist with MIT license.

You'll need some sort of discriminator to choose the type. Avoiding the specific names I used for my examples (that's confusing), can you identify what your discriminator is?

the presence of some mapping key
or the value of some common mapping key

I don't think the multi-document aspect of this should matter. Working off of your example you'd want something like the following:

    public class FooTypeResolver : ITypeDiscriminator
    {
        private readonly Dictionary<string, Type> typeLookup;

        public FooTypeResolver(INamingConvention namingConvention)
        {
            typeLookup = new Dictionary<string, Type>() {
                { "url", typeof(UrlFoo) },
                { "ftp" , typeof(FtpFoo) },
               // more here...
            };
        }

        public Type BaseType => typeof(FooBase);

        public bool TryResolve(ParsingEventBuffer buffer, out Type suggestedType)
        {
            if (buffer.TryFindMappingEntry(
                scalar => typeLookup.ContainsKey(scalar.Value),
                out Scalar key,
                out ParsingEvent _))
            {
                suggestedType = typeLookup[key.Value];
                return true;
            }

            suggestedType = null;
            return false;
        }
    }

I'm not sure what the Expected 'SequenceStart' got 'MappingStart' is due to. I'd need a stack trace.

If we're to continue this discussion further, it might be good to do so in the gist comments, so we don't fill this repo with discussion unrelated to code that is not currently in the repo.

nitz · 2021-04-15T04:24:33Z

I'm not sure what the Expected 'SequenceStart' got 'MappingStart' is due to. I'd need a stack trace.

Yeah I think it's because I'm trying to call deserialize with IFoo[] as a type parameter. I'm in the bed at the moment looking at how to handle multiple documents. I should hopefully fix that up in the morning! I'll take the bit about this code over to the gist!

EdwardCooke · 2024-07-11T09:04:28Z

Since we added a feature a while back for this exact use case I’m closing this issue.

aaubry mentioned this issue Apr 29, 2022

This project is on hold #690

Closed

ecooke-macu mentioned this issue Oct 25, 2022

Deserializing subclasses of abstact class #731

Closed

EdwardCooke mentioned this issue Jan 15, 2023

Question about generic serialization #36

Closed

JJ11teen mentioned this issue Feb 2, 2023

Buffered deserialisation #774

Merged

jas88 mentioned this issue Apr 26, 2023

ii review doesn't support loading AllowlistRules SMI/IsIdentifiable#312

Open

EdwardCooke closed this as completed Jul 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamic type resolution during deserialization based on serialized content #605

Dynamic type resolution during deserialization based on serialized content #605

nitz commented Apr 11, 2021

atruskie commented Apr 12, 2021

nitz commented Apr 12, 2021

nitz commented Apr 15, 2021

atruskie commented Apr 15, 2021 •

edited

Loading

nitz commented Apr 15, 2021

EdwardCooke commented Jul 11, 2024

Dynamic type resolution during deserialization based on serialized content #605

Dynamic type resolution during deserialization based on serialized content #605

Comments

nitz commented Apr 11, 2021

Is your feature request related to a problem? Please describe.

Example problem

Describe the solution you'd like

Describe alternatives you've considered

Additional context

atruskie commented Apr 12, 2021

nitz commented Apr 12, 2021

nitz commented Apr 15, 2021

atruskie commented Apr 15, 2021 • edited Loading

nitz commented Apr 15, 2021

EdwardCooke commented Jul 11, 2024

atruskie commented Apr 15, 2021 •

edited

Loading