-
-
Notifications
You must be signed in to change notification settings - Fork 482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix JSON serialization for UTF-32 characters. #998
base: master
Are you sure you want to change the base?
Conversation
When serializing the data in JSON-compatible form, 4-byte UTF32 characters need to be split into two 2-byte code points.. This change fixes that by introducing new emitter setting `UseUtf16SurrogatePairs`, which is set when JSON-compatible builder is requested.
var obj = new { TestProperty = "Sea life \U0001F99E" }; | ||
|
||
SerializerBuilder.JsonCompatible().Build().Serialize(obj).Trim().Should() | ||
.Be(@"{""TestProperty"": ""Sea life \uD83E\uDD9E""}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently, lobster in UTF32 is different from lobster in UTF16. 🤷
Can you add that to the staticserializerbuilder as well? It’s used by people who need ahead of time compilation. |
To remove a breaking change can you move the new emitter setting constructor parameter to the end? |
Write(code.ToString("X04", CultureInfo.InvariantCulture)); | ||
Write('\\'); | ||
Write('u'); | ||
Write(((ushort)value[index + 1]).ToString("X04", CultureInfo.InvariantCulture)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should you check to make sure this won’t go out of bounds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure but do you need to advance the index by one to handle the case when utf32 character is in the middle of the string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't all of this already done on lines 1192 and 1208? The previous code worked somehow 🤷 we are just rendering it differently, not reading more or less than what was read before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m not at a computer, just a phone, so reviewing prs can be a bit tricky. If it’s already done then great. Ignore this comment.
Moved to the end, but consumers of the constructor would have to recompile their code anyway, as this is, in fact, a breaking change (changing signature of the constructor). You can't replace old binary with a new one after this change no matter where the parameter is, they are not compatible, unfortunately. We could make it non-breaking by adding an overload, though. |
They won’t get compiler errors due to it being an optional parameter. That’s what I was getting at. They can just update the nuget package, build, deploy. No code change. |
Hey! Are we getting this merged for the next release? |
Yes. I’ve had some stuff come up and wasn’t able to work yamldotnet over the weekend. Hopefully in the next day or 2 |
When serializing the data in JSON-compatible form, 4-byte UTF32 characters need to be split into two 2-byte code points.
This change fixes that by introducing new emitter setting
UseUtf16SurrogatePairs
, which is set when JSON-compatible builder is requested.Fixes #997