Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(transport): don't pre-allocate mtu on max_datagram_size #2086

Merged
merged 2 commits into from
Sep 25, 2024

Conversation

mxinden
Copy link
Collaborator

@mxinden mxinden commented Sep 3, 2024

neqo_transport::Connection::max_datagram_size creates an Encoder, writes a packet header and a packet number and determines how many bytes of the mtu are left.

let encoder = Encoder::with_capacity(mtu);
let (_, mut builder) = Self::build_packet_header(
&path.borrow(),
cspace,
encoder,
tx,
&self.address_validation,
version,
false,
);
_ = Self::add_packet_number(
&mut builder,
tx,
self.loss_recovery
.largest_acknowledged_pn(PacketNumberSpace::ApplicationData),
);
let data_len_possible =
u64::try_from(mtu.saturating_sub(tx.expansion() + builder.len() + 1))?;

The Encoder only has to hold the packet header and the packet number. Yet it is initialized with Encoder::with_capacity(mtu), where mtu can be up to 65535 bytes.

let encoder = Encoder::with_capacity(mtu);

Note that PacketBuilder::short and PacketBuilder::long called by Self::build_packet_header read the Encoder::capacity through PacketBuilder::infer_limit.

pub fn short(mut encoder: Encoder, key_phase: bool, dcid: Option<impl AsRef<[u8]>>) -> Self {
let mut limit = Self::infer_limit(&encoder);
let header_start = encoder.len();
// Check that there is enough space for the header.
// 5 = 1 (first byte) + 4 (packet number)
if limit > encoder.len()
&& 5 + dcid.as_ref().map_or(0, |d| d.as_ref().len()) < limit - encoder.len()
{
encoder
.encode_byte(PACKET_BIT_SHORT | PACKET_BIT_FIXED_QUIC | (u8::from(key_phase) << 2));
if let Some(dcid) = dcid {
encoder.encode(dcid.as_ref());
}
} else {
limit = 0;
}
Self {
encoder,
pn: u64::MAX,
header: header_start..header_start,
offsets: PacketBuilderOffsets {
first_byte_mask: PACKET_HP_MASK_SHORT,
pn: 0..0,
len: 0,
},
limit,
padding: false,
}
}

pub fn long(
mut encoder: Encoder,
pt: PacketType,
version: Version,
mut dcid: Option<impl AsRef<[u8]>>,
mut scid: Option<impl AsRef<[u8]>>,
) -> Self {
let mut limit = Self::infer_limit(&encoder);
let header_start = encoder.len();
// Check that there is enough space for the header.
// 11 = 1 (first byte) + 4 (version) + 2 (dcid+scid length) + 4 (packet number)
if limit > encoder.len()
&& 11
+ dcid.as_ref().map_or(0, |d| d.as_ref().len())
+ scid.as_ref().map_or(0, |d| d.as_ref().len())
< limit - encoder.len()
{
encoder.encode_byte(PACKET_BIT_LONG | PACKET_BIT_FIXED_QUIC | pt.to_byte(version) << 4);
encoder.encode_uint(4, version.wire_version());
encoder.encode_vec(1, dcid.take().as_ref().map_or(&[], AsRef::as_ref));
encoder.encode_vec(1, scid.take().as_ref().map_or(&[], AsRef::as_ref));
} else {
limit = 0;
}
Self {
encoder,
pn: u64::MAX,
header: header_start..header_start,
offsets: PacketBuilderOffsets {
first_byte_mask: PACKET_HP_MASK_LONG,
pn: 0..0,
len: 0,
},
limit,
padding: false,
}
}

But PacketBuilder::infer_limit falls back to 2048 if the capacity is below 64, which will be the case when using Encoder::default() instead of Encoder::with_capacity(mtu). 2048 should be plenty enough for the packet header and the packet number.

fn infer_limit(encoder: &Encoder) -> usize {
if encoder.capacity() > 64 {
encoder.capacity()
} else {
2048
}
}

This commit prevents the wasted allocation by using Encoder::default() instead of Encoder::with_capacity(mtu). The former is backed by an empty Vec.


Feel free to ignore if you don't think the reduction in memory allocation is worth the complexity in reasoning described above.

The `neqo_transport::Connection::max_datagram_size` creates an `Encoder`, writes
a packet header and a packet number and determines how many bytes of the mtu are left.

https://github.com/mozilla/neqo/blob/28f60bd0ba3209ecba4102eec123859a3a8afd45/neqo-transport/src/connection/mod.rs#L3408-L3427

The `Encoder` only has to hold the packet header and the packet number. Yet it
is initialized with `Encoder::with_capacity(mtu)`.

https://github.com/mozilla/neqo/blob/28f60bd0ba3209ecba4102eec123859a3a8afd45/neqo-transport/src/connection/mod.rs#L3408

Note that `PacketBuilder::short` and `PacketBuilder::long` read the
`Encoder::capacity` through `PacketBuilder::infer_limit`. But
`PacketBuilder::infer_limit` falls back to `2048` if the capacity is below `64`,
which will be the case when using `Encoder::default()` instead of
`Encoder::with_capacity(mtu)`. `2048` should be plenty enough for the packet
header and the packet number.

https://github.com/mozilla/neqo/blob/28f60bd0ba3209ecba4102eec123859a3a8afd45/neqo-transport/src/packet/mod.rs#L152-L180

https://github.com/mozilla/neqo/blob/28f60bd0ba3209ecba4102eec123859a3a8afd45/neqo-transport/src/packet/mod.rs#L188-L225

https://github.com/mozilla/neqo/blob/28f60bd0ba3209ecba4102eec123859a3a8afd45/neqo-transport/src/packet/mod.rs#L135-L141

This commit prevents the wasted allocation by using `Encoder::default()` instead
of `Encoder::with_capacity(mtu)`.
Copy link

github-actions bot commented Sep 3, 2024

Failed Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

All results

Succeeded Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

Unsupported Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

Copy link

codecov bot commented Sep 3, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.35%. Comparing base (d513712) to head (c3e5999).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2086   +/-   ##
=======================================
  Coverage   95.35%   95.35%           
=======================================
  Files         112      112           
  Lines       36335    36335           
=======================================
  Hits        34648    34648           
  Misses       1687     1687           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

github-actions bot commented Sep 3, 2024

Benchmark results

Performance differences relative to d513712.

coalesce_acked_from_zero 1+1 entries: No change in performance detected.
       time:   [98.912 ns 99.221 ns 99.533 ns]
       change: [-0.6624% -0.2427% +0.2050%] (p = 0.28 > 0.05)

Found 11 outliers among 100 measurements (11.00%)
9 (9.00%) high mild
2 (2.00%) high severe

coalesce_acked_from_zero 3+1 entries: No change in performance detected.
       time:   [116.71 ns 117.06 ns 117.44 ns]
       change: [-7.1034% -2.5302% +0.1590%] (p = 0.28 > 0.05)

Found 15 outliers among 100 measurements (15.00%)
1 (1.00%) low severe
1 (1.00%) high mild
13 (13.00%) high severe

coalesce_acked_from_zero 10+1 entries: No change in performance detected.
       time:   [116.60 ns 117.17 ns 117.81 ns]
       change: [-0.5054% -0.0210% +0.4710%] (p = 0.94 > 0.05)

Found 17 outliers among 100 measurements (17.00%)
4 (4.00%) low severe
1 (1.00%) low mild
3 (3.00%) high mild
9 (9.00%) high severe

coalesce_acked_from_zero 1000+1 entries: No change in performance detected.
       time:   [97.471 ns 97.578 ns 97.698 ns]
       change: [-1.7988% -0.1792% +1.1365%] (p = 0.84 > 0.05)

Found 12 outliers among 100 measurements (12.00%)
3 (3.00%) high mild
9 (9.00%) high severe

RxStreamOrderer::inbound_frame(): Change within noise threshold.
       time:   [111.49 ms 111.61 ms 111.77 ms]
       change: [+0.2977% +0.4140% +0.5750%] (p = 0.00 < 0.05)

Found 13 outliers among 100 measurements (13.00%)
11 (11.00%) low mild
2 (2.00%) high severe

transfer/pacing-false/varying-seeds: No change in performance detected.
       time:   [26.516 ms 27.702 ms 28.914 ms]
       change: [-5.3154% +0.3008% +6.1247%] (p = 0.92 > 0.05)

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild

transfer/pacing-true/varying-seeds: No change in performance detected.
       time:   [33.422 ms 35.001 ms 36.607 ms]
       change: [-12.155% -5.9755% +0.5737%] (p = 0.08 > 0.05)

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild

transfer/pacing-false/same-seed: No change in performance detected.
       time:   [26.032 ms 26.871 ms 27.705 ms]
       change: [-4.2351% -0.0324% +4.5431%] (p = 0.99 > 0.05)

Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) low mild
1 (1.00%) high mild

transfer/pacing-true/same-seed: No change in performance detected.
       time:   [41.561 ms 43.521 ms 45.509 ms]
       change: [-2.4773% +4.1611% +11.082%] (p = 0.22 > 0.05)
1-conn/1-100mb-resp (aka. Download)/client: No change in performance detected.
       time:   [113.00 ms 113.54 ms 114.16 ms]
       thrpt:  [875.97 MiB/s 880.77 MiB/s 884.94 MiB/s]
change:
       time:   [-1.0371% -0.3431% +0.3115%] (p = 0.34 > 0.05)
       thrpt:  [-0.3106% +0.3442% +1.0480%]

Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) low mild
1 (1.00%) high severe

1-conn/10_000-parallel-1b-resp (aka. RPS)/client: No change in performance detected.
       time:   [313.38 ms 317.07 ms 320.68 ms]
       thrpt:  [31.183 Kelem/s 31.538 Kelem/s 31.910 Kelem/s]
change:
       time:   [-2.0956% -0.4070% +1.2455%] (p = 0.63 > 0.05)
       thrpt:  [-1.2302% +0.4087% +2.1404%]
1-conn/1-1b-resp (aka. HPS)/client: No change in performance detected.
       time:   [33.693 ms 33.906 ms 34.133 ms]
       thrpt:  [29.297  elem/s 29.493  elem/s 29.680  elem/s]
change:
       time:   [-1.2479% -0.3761% +0.5256%] (p = 0.42 > 0.05)
       thrpt:  [-0.5228% +0.3775% +1.2637%]

Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) high mild
1 (1.00%) high severe

Client/server transfer results

Transfer of 33554432 bytes over loopback.

Client Server CC Pacing Mean [ms] Min [ms] Max [ms] Relative
msquic msquic 122.5 ± 54.3 90.2 338.7 1.00
neqo msquic reno on 223.4 ± 14.8 203.4 248.4 1.00
neqo msquic reno 218.2 ± 15.4 205.3 258.9 1.00
neqo msquic cubic on 220.8 ± 13.5 206.8 244.5 1.00
neqo msquic cubic 231.4 ± 39.8 201.0 321.5 1.00
msquic neqo reno on 133.3 ± 65.2 83.3 338.4 1.00
msquic neqo reno 95.9 ± 17.5 83.3 152.9 1.00
msquic neqo cubic on 147.3 ± 75.6 84.4 344.2 1.00
msquic neqo cubic 101.9 ± 15.4 83.3 138.1 1.00
neqo neqo reno on 220.5 ± 126.8 135.3 536.0 1.00
neqo neqo reno 161.8 ± 63.5 122.9 400.1 1.00
neqo neqo cubic on 195.0 ± 77.9 125.9 416.3 1.00
neqo neqo cubic 170.8 ± 32.6 126.4 236.4 1.00

⬇️ Download logs

Copy link

github-actions bot commented Sep 3, 2024

Firefox builds for this PR

The following builds are available for testing. Crossed-out builds did not succeed.

@larseggert larseggert added this pull request to the merge queue Sep 25, 2024
Merged via the queue into mozilla:main with commit 198c006 Sep 25, 2024
59 of 62 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants