-
Notifications
You must be signed in to change notification settings - Fork 7.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
json_encode
can use SIMD
#17672
Comments
There is already https://pecl.php.net/package/simdjson. |
That lib is for decoding JSON. This feature request is about encoding. |
Oh, right. |
I played a bit with this, and have this patch: https://gist.github.com/nielsdos/3b42ffaa4476bb5cb7eb498935072312 (somewhat dirty, would need some more work) Without patch:
Naive SSE2:
Semi-naive SSE4.2:
So 5x speedup on SSE2 and 9.9x speedup on SSE4.2. Not quite 20x (which would not be possible with SSE anyway because those vectors are 16 byte elements and you also have to account for the overhead); but still some win. EDIT: by getting rid of the tail variable it's down to 6.94ms for SSE4.2 |
After further simplifying my patch (getting rid of tail handling and simplifying the range check) SSE4.2:
SSE2:
|
I also did a test with an adverserial input. The absolute worst case is a very long string containing only escape characters (or all non-printable ASCII). Because in that case, we have no benefit from the SIMD but do have its overhead. Test script for adverserial input: $test = str_repeat('"', 64 * 100);
for ($i = 0; $i < 10000; $i++) {
json_encode($test);
} Ran using hyperfine. |
BTW
The copying itself happens via memcpy, which is optimized like crazy by the C library already. Still, this takes a lot of time as well but we can't really go around that unless we start doing some very advanced stuff like supporting rope strings everywhere, and even then we get the performance penalty once we need to materialize the rope. |
By the way, I still believe there is some general potential to improve the performance of |
Wouldn't affect this a lot here because we preallocate a buffer equal to the input length (well unless there's a lot of escaping, then it does matter). We do increase per page though so the effect may only happen on very large string builds: Lines 27 to 35 in 6ae1209
|
The speedup you nailed is awesome 😍 and will greatly improve apps that serve large blobs in JSON. Thank you!
If I understand the impl. well, the computed " |
Indeed. What happens now is that we just move the "pos" to the first character that must be escaped. |
By reusing the mask information I got the overhead of the worse case down to about 2.07x. |
Description
The current
json_encode
implementation [1] iterates on every string character.I belive there is a potential to utilize SIMD to copy multiple characters to the output string as long as they are not to-be-escaped.
Benchmark: https://3v4l.org/XeJTn/rfc#vgit.master
This can improve performace on applications that do a lot of JSON encoding.
[1] https://github.com/php/php-src/blob/php-8.4.3/ext/json/json_encoder.c#L411
The text was updated successfully, but these errors were encountered: