-
Notifications
You must be signed in to change notification settings - Fork 656
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: ability to directly write the gRPC frame to the ServerWritableStream #2652
Conversation
|
bef964a
to
f338988
Compare
f338988
to
666a374
Compare
I think the biggest problem here is going to be that it is already valid to define a service handler that sends Buffers, which would then be passed to the serializer and framed as usual. This change would create a pitfall causing buffers with a specific byte pattern to go through a different code path and send the wrong message. I would prefer that we consider other approaches for optimizing. A CPU profile would be helpful, but I can see a few possible approaches:
|
As I understand grpc principles, developers are free to choose the type of payload and create a serializer for it themselves. The main thing is that the client and server know how to serialize and deserialize it. I understand that I am proposing not only to allow the developer to pass an arbitrary buffer as a payload, which still more or less satisfies the requirements of gRPC, but to allow the developer to independently generate and transmit a gRPC frame. In order to abstract from the problems in my application, I wrote a small service and a client for it and published them. Based on these, I made some performance testing to confirm that my proposal makes sense. I implemented 3 ways of writing:
Could you please look at the benchmark conditions and results described in the readme? Thank you! P.S. If you suddenly want to dive into the code: |
I'll take it as a given that you see a substantial performance improvement from caching framed messages vs unframed serialized messages. However, there is still more potential nuance here, because there are two operations in framing that are potentially costly: allocating the buffer to contain the full framed message, and copying the serialized buffer into the framed message buffer. I would prefer to try out optimizations to that process before going in the direction of API changes. In addition, I don't think you quite understood my objection to this change. The freedom to choose the payload type is the cause of the problem, because |
I thought in this direction and seems to have found a suitable solution. interface Http2ServerCallStream<RequestType, ResponseType> {
...
serializeMessage(value: ResponseType | Buffer): Buffer
write(chunk: Buffer): boolean | undefined
...
} to: interface Http2ServerCallStream<RequestType, ResponseType> {
...
serializeMessage(value: ResponseType | Buffer): { infoBytes: Buffer; payload: Buffer }
write(chunk: { infoBytes: Buffer; payload: Buffer }): boolean | undefined
...
} So in that case we can implement class Http2ServerCallStream {
. . .
serializeMessage(value: ResponseType) : { infoBytes: Buffer; payload: Buffer } {
const payload = this.handler.serialize(value);
const infoBytes = Buffer.allocUnsafe(5);
infoBytes.writeUInt8(0, 0);
infoBytes.writeUInt32BE(payload.byteLength, 1);
return { infoBytes, payload };
}
. . .
write(chunk: { infoBytes: Buffer; payload: Buffer } ) {
if (this.checkCancelled()) {
return;
}
const { infoBytes, payload } = chunk;
if (
this.maxSendMessageSize !== -1 &&
(payload.length + infoBytes.length) > this.maxSendMessageSize
) {
this.sendError({
code: Status.RESOURCE_EXHAUSTED,
details: `Sent message larger than max (${payload.length + infoBytes.length} vs. ${this.maxSendMessageSize})`,
});
return;
}
this.sendMetadata();
this.emit('sendMessage');
const d1 = this.stream.write(infoBytes);
// ignoring highWaterMark and drain event because for library user there must be only one write
const d2 = this.stream.write(payload);
return d1 && d2;
}
. . .
} I implemented this solution as hack in 4th branch and made performance testing. Combination of this solution and caching unframed serialized messages showed performance very close to the case with caching framed messages. If you have no objection to this decision, I will rename this or create a new pull request with this solution, and the need Thank you!
P.S.
Thanks for clarification, now that's clear for me |
Yes, that is the kind of optimization I was talking about. I want to point out that #2650 removed and replaced |
We can continue in #2658 |
Hello!
For some high-load purposes, I suggest implementing the ability to directly write the protobuf message or a gRPC frame to the ServerWritableStream.
My usecase:
I have a service that stores data in memory, shares it with its clients, and keeps their cache consistent with updates notification like etcd watch .
Its feature is the thousands of clients that can often make a "watch" request (server streaming rpc) with a large first response and infrequent small responses after it.
Since data updates happen quite infrequently, it makes sense to cache gRPC response frames instead of recalculating them each time.
For now I have implemented this as a hack in my code and it has saved me a lot of CPU time and RAM, especially
arrayBuffers
. Response time has been reduced tenfold for large messages. (Along with serialization and memory allocations, I got rid of the long iterative process of collecting the requested data from my cache)a bit of code prompted me to create this pull request
I used this extension of available types and created a function to skip serialization at runtime.
Using a serializer generated from protobufjs and the serializeMessage method on ServerWritableStream I got this monster:
It has only one buffer allocation instead of two when the gRPC frame serializer grpc and the protobuf message serializer are separated.
So now we can change serializer and write buffer instead of JS object.
So, I tried to make it less hacky. I am not entirely sure whether it is worth allowing not only a gRPC frame, but also a protobuf message (or any other format implemented in a buffer) to be passed to the write method, since they are difficult to distinguish from each other - checking the first five bytes does not look like definition for sure.
Thanks!