Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

json_encode can use SIMD #17672

Open
mvorisek opened this issue Feb 2, 2025 · 12 comments · May be fixed by #17734
Open

json_encode can use SIMD #17672

mvorisek opened this issue Feb 2, 2025 · 12 comments · May be fixed by #17734

Comments

@mvorisek
Copy link
Contributor

mvorisek commented Feb 2, 2025

Description

The current json_encode implementation [1] iterates on every string character.

I belive there is a potential to utilize SIMD to copy multiple characters to the output string as long as they are not to-be-escaped.

Benchmark: https://3v4l.org/XeJTn/rfc#vgit.master

This can improve performace on applications that do a lot of JSON encoding.

[1] https://github.com/php/php-src/blob/php-8.4.3/ext/json/json_encoder.c#L411

@cmb69
Copy link
Member

cmb69 commented Feb 2, 2025

There is already https://pecl.php.net/package/simdjson.

@mvorisek
Copy link
Contributor Author

mvorisek commented Feb 2, 2025

That lib is for decoding JSON. This feature request is about encoding.

@cmb69
Copy link
Member

cmb69 commented Feb 2, 2025

That lib is for decoding JSON. This feature request is about encoding.

Oh, right.

@nielsdos
Copy link
Member

nielsdos commented Feb 2, 2025

I played a bit with this, and have this patch: https://gist.github.com/nielsdos/3b42ffaa4476bb5cb7eb498935072312 (somewhat dirty, would need some more work)

Without patch:

json_encode: 88.3 ms
serialize: 2.38 ms

Naive SSE2:

json_encode: 17.53 ms
serialize: 2.34 ms

Semi-naive SSE4.2:

json_encode: 8.88 ms
serialize: 2.37 ms

So 5x speedup on SSE2 and 9.9x speedup on SSE4.2. Not quite 20x (which would not be possible with SSE anyway because those vectors are 16 byte elements and you also have to account for the overhead); but still some win.
Maybe there's some obvious trick I'm missing in my patch especially regarding SSE2. Note also that this could maybe cause a slowdown when you have a lot of special characters next to each other that need escaping. Maybe there are also some articles about fast json encoding.

EDIT: by getting rid of the tail variable it's down to 6.94ms for SSE4.2

@nielsdos
Copy link
Member

nielsdos commented Feb 2, 2025

After further simplifying my patch (getting rid of tail handling and simplifying the range check)
I arrive at:

SSE4.2:

json_encode: 6.81 ms
serialize: 2.36 ms

SSE2:

json_encode: 14.87 ms
serialize: 2.34 ms

@nielsdos
Copy link
Member

nielsdos commented Feb 2, 2025

I also did a test with an adverserial input. The absolute worst case is a very long string containing only escape characters (or all non-printable ASCII). Because in that case, we have no benefit from the SIMD but do have its overhead.
Both the SSE2 & SSE4.2 have a comparable slowdown compared to not using SIMD in that case: 3.25 ± 0.06 times slower...

Test script for adverserial input:

$test = str_repeat('"', 64 * 100);
for ($i = 0; $i < 10000; $i++) {
    json_encode($test);
}

Ran using hyperfine.

@nielsdos
Copy link
Member

nielsdos commented Feb 2, 2025

BTW

I belive there is a potential to utilize SIMD to copy multiple characters to the output string as long as they are not to-be-escaped.

The copying itself happens via memcpy, which is optimized like crazy by the C library already. Still, this takes a lot of time as well but we can't really go around that unless we start doing some very advanced stuff like supporting rope strings everywhere, and even then we get the performance penalty once we need to materialize the rope.

@cmb69
Copy link
Member

cmb69 commented Feb 2, 2025

By the way, I still believe there is some general potential to improve the performance of smart_str(ing) by increasing by a factor (instead of increasing by a summand). I might be wrong, though.

@nielsdos
Copy link
Member

nielsdos commented Feb 2, 2025

Wouldn't affect this a lot here because we preallocate a buffer equal to the input length (well unless there's a lot of escaping, then it does matter).
But indeed, in general, using a factor should lead to a lower amortized time complexity. And ideally a factor lower than the golden ratio (e.g. something like 1.5) to have better memory chunk reusage.

We do increase per page though so the effect may only happen on very large string builds:

#define SMART_STR_NEW_LEN(len) \
(ZEND_MM_ALIGNED_SIZE_EX(len + SMART_STR_OVERHEAD, SMART_STR_PAGE) - SMART_STR_OVERHEAD)
ZEND_API void ZEND_FASTCALL smart_str_erealloc(smart_str *str, size_t len)
{
if (UNEXPECTED(!str->s)) {
str->a = len <= SMART_STR_START_LEN
? SMART_STR_START_LEN
: SMART_STR_NEW_LEN(len);

@mvorisek
Copy link
Contributor Author

mvorisek commented Feb 2, 2025

The speedup you nailed is awesome 😍 and will greatly improve apps that serve large blobs in JSON. Thank you!

slowdown

If I understand the impl. well, the computed "mask" is currently discarded once "to-be-escaped" character is found. What about reusing the computed knowledge for the remaining characters?

@nielsdos
Copy link
Member

nielsdos commented Feb 2, 2025

Indeed. What happens now is that we just move the "pos" to the first character that must be escaped.
In theory, we know based on the set bits in the "mask" which characters need escaping. How to leverage this efficiently is something I'd have to think about more sometime in the upcoming week.

@nielsdos
Copy link
Member

nielsdos commented Feb 3, 2025

By reusing the mask information I got the overhead of the worse case down to about 2.07x.
Development PR here if anyone's interested: nielsdos#120, still quite in a WIP state.

@nielsdos nielsdos linked a pull request Feb 8, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants