From 4a98389b387942bfe4b21f0db4ca3424796e255d Mon Sep 17 00:00:00 2001 From: Lijun Wang <83639177+lijunwangs@users.noreply.github.com> Date: Thu, 14 Nov 2024 12:36:09 -0800 Subject: [PATCH 1/6] Added SIMD for tpu vote using QUIC --- proposals/0034-tpu-vote-using-quic.md | 107 ++++++++++++++++++++++++++ 1 file changed, 107 insertions(+) create mode 100644 proposals/0034-tpu-vote-using-quic.md diff --git a/proposals/0034-tpu-vote-using-quic.md b/proposals/0034-tpu-vote-using-quic.md new file mode 100644 index 000000000..24b6766d0 --- /dev/null +++ b/proposals/0034-tpu-vote-using-quic.md @@ -0,0 +1,107 @@ +--- +simd: '0034' +title: TPU Vote using QUIC +authors: + - Lijun Wang +category: Standard +type: Core +status: Draft +created: 2024-11-13 +development: + - Anza - WIP + - Firedancer - Not started +--- + +## Summary + +Use QUIC for transporting TPU votes among Solana validators. This requires +supporing receiving QUIC based vote TPU packets on the server side and sending +QUIC-based TPU vote packets on the client side. + + +## Motivation + +As timely vote credits are awarded to validators, they might be incentived to +increase the TPU vote traffic to ensure their votes are received in a timely +manner. This could cause congestions and impact overall TPU vote processing +effectiveness. The concurrent UDP based TPU vote does not have any flow control +mechanism. + +We propose to apply the pattern taken for TPU transaction processing to TPU vote +processing -- by utlizing the flow control mechanism which were developed including +built-in QUIC protocol level flow control, and application-level rate limiting on +connections and packets. + +## Alternatives Considered + +There is no readily-available alternative to QUIC which addresses some of the +requirements such as security (reliability when applying QOS), low latency and +flow control. We could solve the security and flow control with TLS over TCP +the concern is with the latency and head-of-line problems. We could also +customize and build our own rate limiting mechanism based on the UDP directly, +this is non-trivial and cannot solve the security problem without also rely on +some sort of crypto handshaking. + + +## Detailed Design + +On the server side, the validator will bind to a new QUIC endpoint. Its +corresponding port will be published to the network in the ContactInfo via +Gossip. The client side will use the TPU vote QUIC port published by the server +to connect to the server. + +The TPU vote will be using the same QUIC implementation used by regular +transaction transportation. The client and server both uses their validator's +identity key to sign the certificate which is used to validate the validator's +identity especially on the server side for the purpose of provding QOS based on +the client's stakes by checking the client's Pubkey -- stake weighted QOS. + +Once a QUIC connection is established, the client can send vote transaction +using QUIC UNI streams. In this design, a stream is used to send one single Vote +transaction. After that the stream is closed. + +The server only supports connections from the nodes which has stakes who can vote. +Connections from unstaked nodes are rejected with `disallowed` code. + +The following QOS mechanisms are employed: + +* Connection Rate Limiting from all clients +* Connection Rate Limiting from a particular IpAddress +* Total concurrent connections from all clients -- this is set to 2500 +* Max concurrent connections from a client Pubkey -- this is set to 1 for votes. +* Max concurrent streams per connection -- this is allocated based on the ratio +of the validator's stake over the total stakes of the network. +* Maximum of vote transactions per unit time which is also stake weighted + +When the server processes a stream and its chunk, it may timeout and close the stream +if it does not receive the data in configured timeout window (2s). + +The validator also uses gossip to pull votes from other validator. This proposed change +does not change the transport for that which will remain to be UDP based. As the gossip +based votes are pulled by the validator, the concern with increased votes traffic is +lessened. + +## Impact + + QUIC compared with UDP is connection based. There is an extra overhead to establish + the connections when sending a vote. To minimize this, the client side can employ + connection caching and pre-cache warmer mechanism based on the leader schedule. + +## Security Considerations + +The are no net new security vulnerability as QUIC TPU transaction has already been in-place. +Similar DoS attack can be targeted against the new QUIC port used by TPU vote. The connection +rate limiting is one tool to fend off such attacks. + +## Backwards Compatibility + +Care need to taken to ensure a smooth transition into using QUIC for TPU votes from UDP. + +Phase 1. The server side will support both UDP and QUIC for TPU votes. No clients send +TPU votes via QUIC. + +Phase 2. After all staked nodes are upgraded with support of receiving TPU votes via QUIC, +restart the validators with configuration to send TPU votes via QUIC. + +Phase 3. Turn off UDP based TPU votes listener on the server side once all staked nodes +complete phase 2. \ No newline at end of file From f4378d5d50d41b14a52c30f525766f2a6300ba6b Mon Sep 17 00:00:00 2001 From: Lijun Wang <83639177+lijunwangs@users.noreply.github.com> Date: Thu, 14 Nov 2024 12:44:52 -0800 Subject: [PATCH 2/6] updated for md formatting --- proposals/0034-tpu-vote-using-quic.md | 50 ++++++++++++++------------- 1 file changed, 26 insertions(+), 24 deletions(-) diff --git a/proposals/0034-tpu-vote-using-quic.md b/proposals/0034-tpu-vote-using-quic.md index 24b6766d0..57c8b9fa6 100644 --- a/proposals/0034-tpu-vote-using-quic.md +++ b/proposals/0034-tpu-vote-using-quic.md @@ -28,9 +28,9 @@ effectiveness. The concurrent UDP based TPU vote does not have any flow control mechanism. We propose to apply the pattern taken for TPU transaction processing to TPU vote -processing -- by utlizing the flow control mechanism which were developed including -built-in QUIC protocol level flow control, and application-level rate limiting on -connections and packets. +processing -- by utlizing the flow control mechanism which were developed +including built-in QUIC protocol level flow control, and application-level rate +limiting on connections and packets. ## Alternatives Considered @@ -60,8 +60,8 @@ Once a QUIC connection is established, the client can send vote transaction using QUIC UNI streams. In this design, a stream is used to send one single Vote transaction. After that the stream is closed. -The server only supports connections from the nodes which has stakes who can vote. -Connections from unstaked nodes are rejected with `disallowed` code. +The server only supports connections from the nodes which has stakes who can +vote. Connections from unstaked nodes are rejected with `disallowed` code. The following QOS mechanisms are employed: @@ -73,35 +73,37 @@ The following QOS mechanisms are employed: of the validator's stake over the total stakes of the network. * Maximum of vote transactions per unit time which is also stake weighted -When the server processes a stream and its chunk, it may timeout and close the stream -if it does not receive the data in configured timeout window (2s). +When the server processes a stream and its chunk, it may timeout and close the +stream if it does not receive the data in configured timeout window (2s). -The validator also uses gossip to pull votes from other validator. This proposed change -does not change the transport for that which will remain to be UDP based. As the gossip -based votes are pulled by the validator, the concern with increased votes traffic is -lessened. +The validator also uses gossip to pull votes from other validator. This proposed +change does not change the transport for that which will remain to be UDP based. +As the gossip based votes are pulled by the validator, the concern with +increased votes traffic is lessened. ## Impact - QUIC compared with UDP is connection based. There is an extra overhead to establish - the connections when sending a vote. To minimize this, the client side can employ - connection caching and pre-cache warmer mechanism based on the leader schedule. + QUIC compared with UDP is connection based. There is an extra overhead to + establish the connections when sending a vote. To minimize this, the client + side can employ connection caching and pre-cache warmer mechanism based on the + leader schedule. ## Security Considerations -The are no net new security vulnerability as QUIC TPU transaction has already been in-place. -Similar DoS attack can be targeted against the new QUIC port used by TPU vote. The connection -rate limiting is one tool to fend off such attacks. +The are no net new security vulnerability as QUIC TPU transaction has already +been in-place. Similar DoS attack can be targeted against the new QUIC port used +by TPU vote. The connection rate limiting is one tool to fend off such attacks. ## Backwards Compatibility -Care need to taken to ensure a smooth transition into using QUIC for TPU votes from UDP. +Care need to taken to ensure a smooth transition into using QUIC for TPU votes +from UDP. -Phase 1. The server side will support both UDP and QUIC for TPU votes. No clients send -TPU votes via QUIC. +Phase 1. The server side will support both UDP and QUIC for TPU votes. No +clients send TPU votes via QUIC. -Phase 2. After all staked nodes are upgraded with support of receiving TPU votes via QUIC, -restart the validators with configuration to send TPU votes via QUIC. +Phase 2. After all staked nodes are upgraded with support of receiving TPU votes +via QUIC, restart the validators with configuration to send TPU votes via QUIC. -Phase 3. Turn off UDP based TPU votes listener on the server side once all staked nodes -complete phase 2. \ No newline at end of file +Phase 3. Turn off UDP based TPU votes listener on the server side once all +staked nodes complete phase 2. \ No newline at end of file From 30d05b925e5b303e2863be74c04a60d253e45413 Mon Sep 17 00:00:00 2001 From: Lijun Wang <83639177+lijunwangs@users.noreply.github.com> Date: Thu, 14 Nov 2024 12:46:40 -0800 Subject: [PATCH 3/6] updated for md formatting --- proposals/0034-tpu-vote-using-quic.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/proposals/0034-tpu-vote-using-quic.md b/proposals/0034-tpu-vote-using-quic.md index 57c8b9fa6..ecb668f30 100644 --- a/proposals/0034-tpu-vote-using-quic.md +++ b/proposals/0034-tpu-vote-using-quic.md @@ -42,6 +42,9 @@ customize and build our own rate limiting mechanism based on the UDP directly, this is non-trivial and cannot solve the security problem without also rely on some sort of crypto handshaking. +## New Terminology + +None ## Detailed Design From c78b3990736b28b5468c421f0b48fb1bdea7a089 Mon Sep 17 00:00:00 2001 From: Lijun Wang <83639177+lijunwangs@users.noreply.github.com> Date: Thu, 14 Nov 2024 12:47:50 -0800 Subject: [PATCH 4/6] updated for md formatting --- proposals/0034-tpu-vote-using-quic.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/0034-tpu-vote-using-quic.md b/proposals/0034-tpu-vote-using-quic.md index ecb668f30..e5aa468d7 100644 --- a/proposals/0034-tpu-vote-using-quic.md +++ b/proposals/0034-tpu-vote-using-quic.md @@ -5,7 +5,7 @@ authors: - Lijun Wang category: Standard type: Core -status: Draft +status: Review created: 2024-11-13 development: - Anza - WIP From 8fe3640cd8a55b9f1d1f4b3b4833a983aeeca97b Mon Sep 17 00:00:00 2001 From: Lijun Wang <83639177+lijunwangs@users.noreply.github.com> Date: Thu, 14 Nov 2024 17:50:27 -0800 Subject: [PATCH 5/6] change simd number --- ...{0034-tpu-vote-using-quic.md => 0195-tpu-vote-using-quic.md} | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) rename proposals/{0034-tpu-vote-using-quic.md => 0195-tpu-vote-using-quic.md} (99%) diff --git a/proposals/0034-tpu-vote-using-quic.md b/proposals/0195-tpu-vote-using-quic.md similarity index 99% rename from proposals/0034-tpu-vote-using-quic.md rename to proposals/0195-tpu-vote-using-quic.md index e5aa468d7..72355fea5 100644 --- a/proposals/0034-tpu-vote-using-quic.md +++ b/proposals/0195-tpu-vote-using-quic.md @@ -1,5 +1,5 @@ --- -simd: '0034' +simd: '0195' title: TPU Vote using QUIC authors: - Lijun Wang From 3928ca98bd999a2fa7dd46337c51eaa98bbe21f8 Mon Sep 17 00:00:00 2001 From: Lijun Wang <83639177+lijunwangs@users.noreply.github.com> Date: Tue, 26 Nov 2024 15:19:55 -0800 Subject: [PATCH 6/6] changes some wording and removed implementation specifics --- proposals/0195-tpu-vote-using-quic.md | 24 ++++++++++++++---------- 1 file changed, 14 insertions(+), 10 deletions(-) diff --git a/proposals/0195-tpu-vote-using-quic.md b/proposals/0195-tpu-vote-using-quic.md index 72355fea5..190f2ae52 100644 --- a/proposals/0195-tpu-vote-using-quic.md +++ b/proposals/0195-tpu-vote-using-quic.md @@ -44,7 +44,9 @@ some sort of crypto handshaking. ## New Terminology -None +In this document we define the following, +Server -- the validator receiving the TPU votes +Client -- the validator sending the TPU votes. ## Detailed Design @@ -53,8 +55,8 @@ corresponding port will be published to the network in the ContactInfo via Gossip. The client side will use the TPU vote QUIC port published by the server to connect to the server. -The TPU vote will be using the same QUIC implementation used by regular -transaction transportation. The client and server both uses their validator's +The TPU vote can use the same QUIC implementation used by regular transaction +transportation. The client and server both uses their validator's identity key to sign the certificate which is used to validate the validator's identity especially on the server side for the purpose of provding QOS based on the client's stakes by checking the client's Pubkey -- stake weighted QOS. @@ -66,18 +68,18 @@ transaction. After that the stream is closed. The server only supports connections from the nodes which has stakes who can vote. Connections from unstaked nodes are rejected with `disallowed` code. -The following QOS mechanisms are employed: +The following QOS mechanisms can be employed by the server: * Connection Rate Limiting from all clients * Connection Rate Limiting from a particular IpAddress -* Total concurrent connections from all clients -- this is set to 2500 -* Max concurrent connections from a client Pubkey -- this is set to 1 for votes. +* Total concurrent connections from all clients +* Max concurrent connections from a client Pubkey. * Max concurrent streams per connection -- this is allocated based on the ratio of the validator's stake over the total stakes of the network. -* Maximum of vote transactions per unit time which is also stake weighted +* Maximum of vote transactions per unit time which is also stake weighted. When the server processes a stream and its chunk, it may timeout and close the -stream if it does not receive the data in configured timeout window (2s). +stream if it does not receive the data in configurable timeout window. The validator also uses gossip to pull votes from other validator. This proposed change does not change the transport for that which will remain to be UDP based. @@ -89,13 +91,15 @@ increased votes traffic is lessened. QUIC compared with UDP is connection based. There is an extra overhead to establish the connections when sending a vote. To minimize this, the client side can employ connection caching and pre-cache warmer mechanism based on the - leader schedule. + leader schedule. Similarly the server side should maintain a sufficiently + large enough conneciton cache for actively used connections to reduce + connection churning and overall overhead. ## Security Considerations The are no net new security vulnerability as QUIC TPU transaction has already been in-place. Similar DoS attack can be targeted against the new QUIC port used -by TPU vote. The connection rate limiting is one tool to fend off such attacks. +by TPU vote. The connection rate limiting can be used to fend off such attacks. ## Backwards Compatibility