-
-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
de/serialize fails for mixed type series #103
Comments
Thanks for reporting the issue. Data-Forge isn't very smart with that kind of thing. It only inspects the item in a series to determine the type because it would be too expensive to inspect every item in the series. I can think of a way to fix this actually without even having to record the type of the column. It could be implemented like how MongoDB deserializes a JSON file, if a value is `{ "$date": "..." }" then the sub-value is parsed as a date, and then the date replaces it. That could work nicely here. What do you think? I think this could be made to be backward compatible. Would you be interested in working on it? |
Thanks for your answer. In my use case I expect more diverse types than just string, number, boolean, date. More like what is covered by frictionless, e.g. duration or geopoint. I now also need to serialize custom classes containing DataFrames. This exceeds the scope of the DataFrame serialization scheme. Thanks for the offer to work on it, but it seems I will not use the DataFrame serialization part after all as my scope has changed. I leave it up to you if you want to close the issue or not. PS: Great library of yours, it was the best I found (for TypeScript). |
Thanks for the info. Frictionless is interesting. If you have written your own serialization code would you mind sharing it with me? I might be able to integrate it into DF or maybe it will just give me inspiration as to what to do next. I'll keep this issue open until DF supports mixed types in a series. |
I created a reduced version of my serializer that focuses on arrays, without class serialization support. |
@crystalgreen that looks awesome. Will you make this available in an npm library or a GitHub repo by itself? I'd love to clone the project and try it out - try and see how I could fit it into Data-Forge. |
I currently do not plan to create a public library but this may change in the future. My full blown library with serialization support for classes and other JS type is still in early stages of development and not yet stable. I also haven't made any performance tests yet but think about adding a JIT compiler like in the very fast and sophisticated serialization lib @deepkit/type which makes it presumably one of the fastest TS serializers out there but that unfortunately does not fit my needs. Those libs have an emphasis on schema validation while I prefer the data to describe itself, especially for mixed type arrays. If you like, you can use my code in the above link and adapt it to your needs. I'm open for ideas of improvement. |
Mixed types in a series are not correctly deserialized. See example below.
The reason seems that the detected column type is serialized as well (even if there doesn't seem to be a public API to get that type) and during deserialization that type is applied to all values.
Idea for a fix: Column type is the most frequently found type. For values of other types, store [type name, value] instead of the value. The deserializer would detect these arrays and correctly deserialize these other types.
PS: The current values format is quite verbose. More compact would be an array of values per row.
Of course, this would be a breaking change for those that manually parse the serialized format.
The text was updated successfully, but these errors were encountered: