sync: uncontended mutex / batch_semaphore is slower compared to other implementations (eg. async-mutex) #2555

cynecx · 2020-05-21T18:36:30Z

cynecx
May 21, 2020

I usually don't care much about micro/synthetic benchmarks but I am quite curious why the uncontended case is much slower compared to other implementations (eg. async-mutex).

The async-mutex repo also contains various benchmarks which produces the following results on my machine:

     Running target/release/deps/async_mutex-d713d0862d82353e

running 3 tests
test contention    ... bench:     157,859 ns/iter (+/- 3,155)
test create        ... bench:           0 ns/iter (+/- 0)
test no_contention ... bench:     150,098 ns/iter (+/- 17,606)

test result: ok. 0 passed; 0 failed; 0 ignored; 3 measured; 0 filtered out

     Running target/release/deps/async_std-29ba22f35c128d6a

running 3 tests
test contention    ... bench:     176,233 ns/iter (+/- 3,961)
test create        ... bench:           7 ns/iter (+/- 0)
test no_contention ... bench:     157,708 ns/iter (+/- 40,607)

test result: ok. 0 passed; 0 failed; 0 ignored; 3 measured; 0 filtered out

     Running target/release/deps/futures-3d8c84ddbb131f9d

running 3 tests
test contention    ... bench:     497,329 ns/iter (+/- 70,921)
test create        ... bench:          28 ns/iter (+/- 5)
test no_contention ... bench:     458,971 ns/iter (+/- 97,086)

test result: ok. 0 passed; 0 failed; 0 ignored; 3 measured; 0 filtered out

     Running target/release/deps/futures_intrusive-9e1bf864edfeb280

running 3 tests
test contention    ... bench:   1,346,206 ns/iter (+/- 124,443)
test create        ... bench:           8 ns/iter (+/- 0)
test no_contention ... bench:     753,340 ns/iter (+/- 581,638)

test result: ok. 0 passed; 0 failed; 0 ignored; 3 measured; 0 filtered out

     Running target/release/deps/tokio-69c2d7ab79244d1a

running 3 tests
test contention    ... bench:   2,072,034 ns/iter (+/- 35,372)
test create        ... bench:          25 ns/iter (+/- 0)
test no_contention ... bench:   1,481,260 ns/iter (+/- 1,084,731)

Specs:

i7 6700k (4/8 phy/log Cores)
32GB DDR4 RAM
Linux 5.6.14-arch1-1
rustc 1.45.0-nightly (a74d1862d 2020-05-14)

EDIT:

The huge difference in the contended case might be that tokio's batch_semaphore is basically a fifo-queue which guarantees fairness.

On the other hand, it seems like async-mutex's implemention also guarantees fairness through event-listener, however I only took a quick glance through the code, thus I am not sure sure.

Answered by carllerche

May 21, 2020

This micro benchmark is unrelated to how you would use an async mutex in practice. In this scenario, you would want to use a regular mutex. When the critical section does not span yield points, we direct users to use the parking_lot mutex directly. You should compare against that.

An async mutex only makes sense when the critical section spans yield points. In that case, you want fairness. Without fairness, under contention, you will easily end up w/ tasks that get blocked indefinitely. This results in very large latency distributions which is something you really want to avoid in production.

View full answer

Darksonn · 2020-05-21T18:58:08Z

Darksonn
May 21, 2020
Maintainer

The order of magnitude of difference in that kind of benchmark is indeed usually due to fairness. Tokio and futures-intrusive have fair locks, async-std and futures provide unfair locks, and I have no idea what async-mutex does.

Note that the linked benchmark doesn't turn on Tokio's parking_lot feature that replaces the std locks with those from parking_lot, which should increase performance somewhat.

0 replies

cynecx · 2020-05-21T19:06:29Z

cynecx
May 21, 2020
Author

Whoa. Since when does GitHub have this new discussions feature O.o

The order of magnitude of difference in that kind of benchmark is indeed usually due to fairness

I can imagine the contended case being affected by this, however the uncontended case is quite surprising to me.

Results with parking-lot:

     Running target/release/deps/tokio-0503c97a28886756

running 3 tests
test contention    ... bench:   1,946,402 ns/iter (+/- 66,851)
test create        ... bench:           4 ns/iter (+/- 0)
test no_contention ... bench:     643,494 ns/iter (+/- 278,838)

0 replies

carllerche · 2020-05-21T19:11:34Z

carllerche
May 21, 2020
Maintainer

This micro benchmark is unrelated to how you would use an async mutex in practice. In this scenario, you would want to use a regular mutex. When the critical section does not span yield points, we direct users to use the parking_lot mutex directly. You should compare against that.

An async mutex only makes sense when the critical section spans yield points. In that case, you want fairness. Without fairness, under contention, you will easily end up w/ tasks that get blocked indefinitely. This results in very large latency distributions which is something you really want to avoid in production.

0 replies

Matthias247 · 2020-05-27T21:56:46Z

Matthias247
May 27, 2020
Collaborator

I can imagine the contended case being affected by this, however the uncontended case is quite surprising to me.

Fairness also makes a difference in the uncontended case. Without fairness, an implementation can just grab the lock if its available - which is a single atomic operation.

With fairness that doesn't work anymore, since one also needs to check if there are additional waiters that should get the lock before. This will always require multiple atomic operations.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sync: uncontended mutex / batch_semaphore is slower compared to other implementations (eg. async-mutex) #2555

{{title}}

Replies: 4 comments

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

sync: uncontended mutex / batch_semaphore is slower compared to other implementations (eg. async-mutex) #2555

cynecx May 21, 2020

Replies: 4 comments

Darksonn May 21, 2020 Maintainer

cynecx May 21, 2020 Author

carllerche May 21, 2020 Maintainer

Matthias247 May 27, 2020 Collaborator

cynecx
May 21, 2020

Darksonn
May 21, 2020
Maintainer

cynecx
May 21, 2020
Author

carllerche
May 21, 2020
Maintainer

Matthias247
May 27, 2020
Collaborator