Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LibOS] Use RW locks in the VMA tree #1795

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

dimakuv
Copy link

@dimakuv dimakuv commented Mar 5, 2024

Description of the changes

Multi-threaded workloads with many syscalls stress the VMA subsystem of LibOS, because almost all syscalls verify their buffers for read/write access using the functions is_user_memory_readable(), is_user_memory_writable(), etc. All these functions end up in VMA-specific is_in_adjacent_user_vmas() that grabs a global VMA lock. On some multi-threaded apps like MongoDB, this lock contention becomes the performance bottleneck.

This commit tries to remove this bottleneck by switching from a spinlock to the Read-Write (RW) lock. The intuition is that most of the time, a read-only is_in_adjacent_user_vmas() func is called, which now uses the read lock.

Fixes #1794.

How to test this PR?

CI for functionality testing. Manual benchmarks for perf testing.

My quick tests on Memcached, Blender and iperf show no visible change in performance. This makes sense: these workloads use no more than 4 threads, so almost no contention.

WIP: I asked to run big workloads.

UPDATE 1: @jkr0103 reported these results (thanks!):

  • MongoDB stress workload:
    • Gramine-SGX without this PR: 46,441 ops/sec
    • Gramine-SGX with this PR: 92,810 ops/sec
    • Gramine-SGX with check_invalid_pointers = false: 104,694 ops/sec
  • MySQL stress workload:
    • Gramine-SGX without this PR: 55,874 QPS (latency 22.9ms)
    • Gramine-SGX with this PR: 253,891 QPS (latency 5.04ms)
    • Gramine-SGX with check_invalid_pointers = false: 258,343 QPS (latency 4.95ms)

This change is Reviewable

Copy link
Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, all discussions resolved, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel)


libos/src/bookkeep/libos_vma.c line 106 at r1 (raw file):

static struct avl_tree vma_tree = {.cmp = vma_tree_cmp};
static struct libos_rwlock vma_tree_lock;
static bool vma_tree_lock_created = false;

Technically all these variables must have the prefix g_. But I didn't want to add this unrelated change in this PR.


libos/src/bookkeep/libos_vma.c line 108 at r1 (raw file):

static bool vma_tree_lock_created = false;

static inline void vma_rwlock_read_lock(struct libos_rwlock* l) {

It's important to use these wrappers because at the very startup, we don't have the lock because it wasn't yet created. But at startup we have only one thread, so the lock would be redundant anyway.

Note that we can't create the lock as the very first step, because creating the lock itself requires the memory subsystem (VMA) to be fully initialized. So we disable the locking first, init the VMA subsystem, then create the lock and only then the VMA can be used in thread-safe manner.


libos/src/bookkeep/libos_vma.c line 146 at r1 (raw file):

#endif

/* VMA code is supposed to use the vma_* wrappers of RW lock; hide the actual RW lock funcs */

Not sure if these define tricks are needed -- I wanted to make sure that future developers won't accidentally use rwlock_ functions but only wrappers.


libos/src/bookkeep/libos_vma.c line 1267 at r1 (raw file):

    vma_rwlock_read_lock(&vma_tree_lock);
    bool is_continuous = _traverse_vmas_in_range(begin, end, adj_visitor, &ctx);

FYI: This is the main perf optimization (hopefully).

Copy link
Contributor

@vasanth-intel vasanth-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, 1 unresolved discussion, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @dimakuv)

a discussion (no related file):
We don't see any problems with the PR and it improves perf for some workloads (MySQL and MariaDB) and doesn't degrade perf for other workloads like NginX, Tensorflow, SpecPower, Tensorflow Serving and Openvino(Latency).


Copy link
Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dismissed @vasanth-intel from a discussion.
Reviewable status: 0 of 1 files reviewed, all discussions resolved, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel)

a discussion (no related file):

Previously, vasanth-intel wrote…

We don't see any problems with the PR and it improves perf for some workloads (MySQL and MariaDB) and doesn't degrade perf for other workloads like NginX, Tensorflow, SpecPower, Tensorflow Serving and Openvino(Latency).

Thank you @vasanth-intel for the performance evaluation!

We're done with internal testing (on the Intel side); this PR is moved from Draft to Ready for Review.


@dimakuv dimakuv marked this pull request as ready for review March 8, 2024 09:32
@dimakuv
Copy link
Author

dimakuv commented Mar 8, 2024

Jenkins, test this please (just for sanity)

@dimakuv dimakuv force-pushed the dimakuv/libos-vma-rwlock branch from c208c3c to b27f24e Compare July 30, 2024 12:43
Copy link
Contributor

@kailun-qin kailun-qin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r2, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @dimakuv)


libos/src/bookkeep/libos_vma.c line 108 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

It's important to use these wrappers because at the very startup, we don't have the lock because it wasn't yet created. But at startup we have only one thread, so the lock would be redundant anyway.

Note that we can't create the lock as the very first step, because creating the lock itself requires the memory subsystem (VMA) to be fully initialized. So we disable the locking first, init the VMA subsystem, then create the lock and only then the VMA can be used in thread-safe manner.

can we add these explanations into the comments?

Copy link
Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, 1 unresolved discussion, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @kailun-qin)


libos/src/bookkeep/libos_vma.c line 108 at r1 (raw file):

Previously, kailun-qin (Kailun Qin) wrote…

can we add these explanations into the comments?

Done.

kailun-qin
kailun-qin previously approved these changes Jul 31, 2024
Copy link
Contributor

@kailun-qin kailun-qin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r3, all commit messages.
Reviewable status: all files reviewed, all discussions resolved, not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners

Copy link
Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners

a discussion (no related file):
@jkr0103 Could you point to the instructions on how to run those MongoDB and/or MySQL experiments that you did?


@jkr0103
Copy link
Contributor

jkr0103 commented Oct 1, 2024

Instructions for MySQL:

sudo chown -R $USER:$USER /var/lib/mysql-files
sudo systemctl stop mysql.service
sudo mkdir /var/run/mysqld && sudo chown -R $USER:$USER /var/run/mysqld
sudo mkdir /var/run/mysql-data && sudo chown -R $USER:$USER /var/run/mysql-data
mysqld --initialize-insecure --datadir=/var/run/mysql-data

sudo ln -s /etc/apparmor.d/usr.sbin.mysqld /etc/apparmor.d/disable/
sudo apparmor_parser -R /etc/apparmor.d/usr.sbin.mysqld
Check status if MySQL is still loaded: sudo aa-status

sudo vim /etc/security/limits.conf

  • soft nofile 65535
  • hard nofile 65535

gramine-sgx mysqld --skip-log-bin --datadir /var/run/mysql-data numactl -N 0,1 -l gramine-sgx mysqld --skip-log-bin --datadir /var/run/mysql-data

Sysbench Run:

sudo apt install -y sysbench
sudo mysqladmin -h 127.0.0.1 -P 3306 create sbtest

sysbench --db-driver=mysql --mysql-host=127.0.0.1 --mysql-port=3306 --mysql-user=root --mysql-db=sbtest --time=90
--report-interval=5 oltp_read_write --tables=2 --table_size=100000 prepare

sysbench --db-driver=mysql --mysql-host=127.0.0.1 --mysql-port=3306 --mysql-user=root --mysql-db=sbtest --time=300
--report-interval=5 oltp_read_write --tables=2 --table_size=100000 --threads=64 run

======================================================================
MongoDB Run:

git clone https://github.com/mongodb/mongo-perf.git
cd mongo-perf

wget https://repo.mongodb.org/apt/ubuntu/dists/focal/mongodb-org/5.0/multiverse/binary-amd64/mongodb-org-shell_5.0.21_amd64.deb

sudo dpkg -i mongodb-org-shell_5.0.21_amd64.deb

echo "deb http://security.ubuntu.com/ubuntu focal-security main" | sudo tee /etc/apt/sources.list.d/focal-security.list
sudo apt-get update

sudo apt-get install libssl1.1

gramine-sgx mongod --nounixsocket --dbpath /var/run/db

Benchmark Run:

~/examples/mongodb/mongo-perf$ python3 benchrun.py -f testcases/complex_update.js -t 64

Copy link
Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkr0103 Where can we take the MySQL and MongoDB makefiles/manifests for Gramine?

Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners

@jkr0103
Copy link
Contributor

jkr0103 commented Oct 1, 2024

Copy link
Contributor

@chiache chiache left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 4 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv)


libos/src/bookkeep/libos_vma.c line 144 at r3 (raw file):

    rwlock_write_unlock(l);
}

Since only vma_tree_lock will be locked and unlocked here, why not remove the parameter to these functions and dedicate them to vma_tree_lock?


libos/src/bookkeep/libos_vma.c line 149 at r3 (raw file):

    if (!vma_tree_lock_created)
        return true;
    return rwlock_is_read_locked(l);

To me, if a thread is holding the write lock, it should be able to perform a read-only operation atomically. So, IMO, the condition should be `rwlock_is_read_locked(l) or rwlock_is_write_locked(l).


libos/src/bookkeep/libos_vma.c line 158 at r3 (raw file):

}
#endif

Shouldn't vma_rwlock_is_read_locked and vma_rwlock_is_write_locked be defined even if DEBUG is not?

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 4 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @chiache and @dimakuv)


libos/src/bookkeep/libos_vma.c line 149 at r3 (raw file):

Previously, chiache (Chia-Che Tsai) wrote…

To me, if a thread is holding the write lock, it should be able to perform a read-only operation atomically. So, IMO, the condition should be `rwlock_is_read_locked(l) or rwlock_is_write_locked(l).

If we decide so, then the function name will require changing, otherwise it's misleading.


libos/src/bookkeep/libos_vma.c line 158 at r3 (raw file):

Previously, chiache (Chia-Che Tsai) wrote…

Shouldn't vma_rwlock_is_read_locked and vma_rwlock_is_write_locked be defined even if DEBUG is not?

No, these functions are inherently race'y and should be used only inside asserts (i.e. only in debug builds), I don't think there's any legitimate production use for them.

Copy link
Contributor

@kailun-qin kailun-qin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r4, all commit messages.
Reviewable status: all files reviewed, 4 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ), "fixup! " found in commit messages' one-liners (waiting on @chiache and @mkow)


libos/src/bookkeep/libos_vma.c line 144 at r3 (raw file):

Previously, chiache (Chia-Che Tsai) wrote…

Since only vma_tree_lock will be locked and unlocked here, why not remove the parameter to these functions and dedicate them to vma_tree_lock?

Done.


libos/src/bookkeep/libos_vma.c line 149 at r3 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

If we decide so, then the function name will require changing, otherwise it's misleading.

pls check if this is something you'd expect

@kailun-qin kailun-qin force-pushed the dimakuv/libos-vma-rwlock branch from 6e40925 to e79d27d Compare January 7, 2025 08:43
Copy link
Contributor

@kailun-qin kailun-qin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, 4 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @chiache and @mkow)


a discussion (no related file):
I have to rebase to revive the CI (hopefully this won't disrupt your reviews).

@kailun-qin kailun-qin force-pushed the dimakuv/libos-vma-rwlock branch from e79d27d to 143fb04 Compare January 7, 2025 09:01
Multi-threaded workloads with many syscalls stress the VMA subsystem of
LibOS, because almost all syscalls verify their buffers for read/write
access using the functions `is_user_memory_readable()`,
`is_user_memory_writable()`, etc. All these functions end up in
VMA-specific `is_in_adjacent_user_vmas()` that grabs a global VMA lock.
On some multi-threaded apps like MongoDB, this lock contention becomes
the performance bottleneck.

This commit tries to remove this bottleneck by switching from a spinlock
to the Read-Write (RW) lock. The intuition is that most of the time,
a read-only `is_in_adjacent_user_vmas()` func is called, which now uses
the read lock.

Signed-off-by: Dmitrii Kuvaiskii <[email protected]>
@kailun-qin kailun-qin force-pushed the dimakuv/libos-vma-rwlock branch from 143fb04 to 50d5246 Compare January 7, 2025 09:20
Copy link
Contributor

@efu39 efu39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, 6 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @chiache, @dimakuv, @kailun-qin, and @mkow)


libos/src/bookkeep/libos_vma.c line 697 at r6 (raw file):

    assert(1 + idx == ARRAY_SIZE(init_vmas));

    vma_rwlock_write_lock();

rwlock_create() is not yet done here.

Code quote:

vma_rwlock_write_lock();

libos/src/bookkeep/libos_vma.c line 775 at r6 (raw file):

        return -ENOMEM;
    }
    vma_tree_lock_created = true;

Should these be moved to front before line 697 where vma_rwlock_write_lock() is invoked earlier?

Code quote:

    if (!rwlock_create(&vma_tree_lock)) {
        return -ENOMEM;
    }
    vma_tree_lock_created = true;

Copy link
Contributor

@kailun-qin kailun-qin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, 6 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (2 more required, approved so far: ) (waiting on @chiache, @dimakuv, @efu39, and @mkow)


libos/src/bookkeep/libos_vma.c line 697 at r6 (raw file):

Previously, efu39 (Erica Fu) wrote…

rwlock_create() is not yet done here.

Yes, I think this is what Dmitrii's comments were describing:

* It is important to use the below wrappers instead of raw `rwlock_*_lock()` functions. This is
* because at LibOS startup, the lock `vma_tree_lock` is not yet created. Fortunately, at LibOS
* startup there is only one thread, so the lock would be redundant anyway.
*
* We cannot create `vma_tree_lock` at the very beginning of LibOS startup, because creating this
* lock itself requires the memory subsystem (VMA) to be fully initialized. So we start with VMA
* locking disabled first, then init the VMA subsystem, and only then create the lock. At this point
* the VMA subsystem can be used in thread-safe manner.

Well, upon rereading this, I don't understand the limitation and don't recall why we cannot create vma_tree_lock at the very beginning of the LibOS startup or why creating this lock requires the memory subsystem to be fully initialized. I'll need to double-check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Working on it
Development

Successfully merging this pull request may close these issues.

[LibOS] Use RW locks in the VMA tree
7 participants