sync: uncontended mutex / batch_semaphore is slower compared to other implementations (eg. async-mutex) #2555
-
I usually don't care much about micro/synthetic benchmarks but I am quite curious why the uncontended case is much slower compared to other implementations (eg. async-mutex). The async-mutex repo also contains various benchmarks which produces the following results on my machine:
Specs:
EDIT: The huge difference in the contended case might be that tokio's batch_semaphore is basically a fifo-queue which guarantees fairness. On the other hand, it seems like async-mutex's implemention also guarantees fairness through event-listener, however I only took a quick glance through the code, thus I am not sure sure. |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments
-
The order of magnitude of difference in that kind of benchmark is indeed usually due to fairness. Tokio and futures-intrusive have fair locks, async-std and futures provide unfair locks, and I have no idea what async-mutex does. Note that the linked benchmark doesn't turn on Tokio's |
Beta Was this translation helpful? Give feedback.
-
Whoa. Since when does GitHub have this new discussions feature O.o
I can imagine the contended case being affected by this, however the uncontended case is quite surprising to me. Results with parking-lot:
|
Beta Was this translation helpful? Give feedback.
-
This micro benchmark is unrelated to how you would use an async mutex in practice. In this scenario, you would want to use a regular mutex. When the critical section does not span yield points, we direct users to use the An async mutex only makes sense when the critical section spans yield points. In that case, you want fairness. Without fairness, under contention, you will easily end up w/ tasks that get blocked indefinitely. This results in very large latency distributions which is something you really want to avoid in production. |
Beta Was this translation helpful? Give feedback.
-
Fairness also makes a difference in the uncontended case. Without fairness, an implementation can just grab the lock if its available - which is a single atomic operation. With fairness that doesn't work anymore, since one also needs to check if there are additional waiters that should get the lock before. This will always require multiple atomic operations. |
Beta Was this translation helpful? Give feedback.
This micro benchmark is unrelated to how you would use an async mutex in practice. In this scenario, you would want to use a regular mutex. When the critical section does not span yield points, we direct users to use the
parking_lot
mutex directly. You should compare against that.An async mutex only makes sense when the critical section spans yield points. In that case, you want fairness. Without fairness, under contention, you will easily end up w/ tasks that get blocked indefinitely. This results in very large latency distributions which is something you really want to avoid in production.