Fix JSON serialization for UTF-32 characters. #998

nahk-ivanov · 2024-10-24T03:37:16Z

When serializing the data in JSON-compatible form, 4-byte UTF32 characters need to be split into two 2-byte code points.

This change fixes that by introducing new emitter setting UseUtf16SurrogatePairs, which is set when JSON-compatible builder is requested.

Fixes #997

When serializing the data in JSON-compatible form, 4-byte UTF32 characters need to be split into two 2-byte code points.. This change fixes that by introducing new emitter setting `UseUtf16SurrogatePairs`, which is set when JSON-compatible builder is requested.

nahk-ivanov · 2024-10-24T04:00:15Z

YamlDotNet.Test/Serialization/SerializationTests.cs

+            var obj = new { TestProperty = "Sea life \U0001F99E" };
+
+            SerializerBuilder.JsonCompatible().Build().Serialize(obj).Trim().Should()
+                .Be(@"{""TestProperty"": ""Sea life \uD83E\uDD9E""}");


Apparently, lobster in UTF32 is different from lobster in UTF16. 🤷

https://www.unicodecharacter.org/U+1F99E

EdwardCooke · 2024-10-24T15:29:44Z

Can you add that to the staticserializerbuilder as well? It’s used by people who need ahead of time compilation.

EdwardCooke · 2024-10-24T15:32:35Z

To remove a breaking change can you move the new emitter setting constructor parameter to the end?

EdwardCooke · 2024-10-24T15:36:13Z

YamlDotNet/Core/Emitter.cs

+                                        Write(code.ToString("X04", CultureInfo.InvariantCulture));
+                                        Write('\\');
+                                        Write('u');
+                                        Write(((ushort)value[index + 1]).ToString("X04", CultureInfo.InvariantCulture));


Should you check to make sure this won’t go out of bounds?

Not sure but do you need to advance the index by one to handle the case when utf32 character is in the middle of the string?

Isn't all of this already done on lines 1192 and 1208? The previous code worked somehow 🤷 we are just rendering it differently, not reading more or less than what was read before.

I’m not at a computer, just a phone, so reviewing prs can be a bit tricky. If it’s already done then great. Ignore this comment.

nahk-ivanov · 2024-10-24T16:10:39Z

Moved to the end, but consumers of the constructor would have to recompile their code anyway, as this is, in fact, a breaking change (changing signature of the constructor). You can't replace old binary with a new one after this change no matter where the parameter is, they are not compatible, unfortunately.

We could make it non-breaking by adding an overload, though.

EdwardCooke · 2024-10-24T16:13:03Z

They won’t get compiler errors due to it being an optional parameter. That’s what I was getting at. They can just update the nuget package, build, deploy. No code change.

nahk-ivanov · 2024-10-28T15:49:53Z

Hey! Are we getting this merged for the next release?

EdwardCooke · 2024-10-30T19:49:33Z

Yes. I’ve had some stuff come up and wasn’t able to work yamldotnet over the weekend. Hopefully in the next day or 2

nahk-ivanov mentioned this pull request Oct 24, 2024

Incorrect UTF-32 character JSON serialization #997

Open

nahk-ivanov commented Oct 24, 2024

View reviewed changes

EdwardCooke reviewed Oct 24, 2024

View reviewed changes

Address comments

7333635

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix JSON serialization for UTF-32 characters. #998

Fix JSON serialization for UTF-32 characters. #998

nahk-ivanov commented Oct 24, 2024 •

edited

Loading

nahk-ivanov Oct 24, 2024

EdwardCooke commented Oct 24, 2024

EdwardCooke commented Oct 24, 2024

EdwardCooke Oct 24, 2024

EdwardCooke Oct 24, 2024

nahk-ivanov Oct 24, 2024 •

edited

Loading

EdwardCooke Oct 24, 2024

nahk-ivanov commented Oct 24, 2024 •

edited

Loading

EdwardCooke commented Oct 24, 2024

nahk-ivanov commented Oct 28, 2024

EdwardCooke commented Oct 30, 2024

Fix JSON serialization for UTF-32 characters. #998

Are you sure you want to change the base?

Fix JSON serialization for UTF-32 characters. #998

Conversation

nahk-ivanov commented Oct 24, 2024 • edited Loading

nahk-ivanov Oct 24, 2024

Choose a reason for hiding this comment

EdwardCooke commented Oct 24, 2024

EdwardCooke commented Oct 24, 2024

EdwardCooke Oct 24, 2024

Choose a reason for hiding this comment

EdwardCooke Oct 24, 2024

Choose a reason for hiding this comment

nahk-ivanov Oct 24, 2024 • edited Loading

Choose a reason for hiding this comment

EdwardCooke Oct 24, 2024

Choose a reason for hiding this comment

nahk-ivanov commented Oct 24, 2024 • edited Loading

EdwardCooke commented Oct 24, 2024

nahk-ivanov commented Oct 28, 2024

EdwardCooke commented Oct 30, 2024

nahk-ivanov commented Oct 24, 2024 •

edited

Loading

nahk-ivanov Oct 24, 2024 •

edited

Loading

nahk-ivanov commented Oct 24, 2024 •

edited

Loading