Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix JSON serialization for UTF-32 characters. #998

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

nahk-ivanov
Copy link

@nahk-ivanov nahk-ivanov commented Oct 24, 2024

When serializing the data in JSON-compatible form, 4-byte UTF32 characters need to be split into two 2-byte code points.

This change fixes that by introducing new emitter setting UseUtf16SurrogatePairs, which is set when JSON-compatible builder is requested.

Fixes #997

When serializing the data in JSON-compatible form, 4-byte UTF32 characters need to be split into two 2-byte code points..

This change fixes that by introducing new emitter setting `UseUtf16SurrogatePairs`, which is set when JSON-compatible builder is requested.
var obj = new { TestProperty = "Sea life \U0001F99E" };

SerializerBuilder.JsonCompatible().Build().Serialize(obj).Trim().Should()
.Be(@"{""TestProperty"": ""Sea life \uD83E\uDD9E""}");
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently, lobster in UTF32 is different from lobster in UTF16. 🤷

https://www.unicodecharacter.org/U+1F99E

@EdwardCooke
Copy link
Collaborator

Can you add that to the staticserializerbuilder as well? It’s used by people who need ahead of time compilation.

@EdwardCooke
Copy link
Collaborator

To remove a breaking change can you move the new emitter setting constructor parameter to the end?

Write(code.ToString("X04", CultureInfo.InvariantCulture));
Write('\\');
Write('u');
Write(((ushort)value[index + 1]).ToString("X04", CultureInfo.InvariantCulture));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should you check to make sure this won’t go out of bounds?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure but do you need to advance the index by one to handle the case when utf32 character is in the middle of the string?

Copy link
Author

@nahk-ivanov nahk-ivanov Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't all of this already done on lines 1192 and 1208? The previous code worked somehow 🤷 we are just rendering it differently, not reading more or less than what was read before.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not at a computer, just a phone, so reviewing prs can be a bit tricky. If it’s already done then great. Ignore this comment.

@nahk-ivanov
Copy link
Author

nahk-ivanov commented Oct 24, 2024

Moved to the end, but consumers of the constructor would have to recompile their code anyway, as this is, in fact, a breaking change (changing signature of the constructor). You can't replace old binary with a new one after this change no matter where the parameter is, they are not compatible, unfortunately.

We could make it non-breaking by adding an overload, though.

@EdwardCooke
Copy link
Collaborator

They won’t get compiler errors due to it being an optional parameter. That’s what I was getting at. They can just update the nuget package, build, deploy. No code change.

@nahk-ivanov
Copy link
Author

Hey! Are we getting this merged for the next release?

@EdwardCooke
Copy link
Collaborator

Yes. I’ve had some stuff come up and wasn’t able to work yamldotnet over the weekend. Hopefully in the next day or 2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect UTF-32 character JSON serialization
2 participants