-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
with cache full (but clean), seeing very high number of pass-through requests #1566
Comments
Hi @aneagoe! Could please you send output of |
I've switched all nodes to a single ioclass (unclassified) and the serviced request ratio went to 95%+. Disabling sequential cutoff achieved 100% (just experimental, doesn't make sense to leave on). This is consistent across all nodes in the dev cluster and I've done the same to one node in prod cluster with same results.
|
Hi @aneagoe, thanks for the quick response. Could you please rerun your workload on a new cache instance with your original ioclass config and cleaning policy set to NOP ( |
@mmichal10 I can adjust on one of the nodes as recommended. But can you please clarify the scope of this experiment? In my understanding, if we switch to nop we're basically not cleaning the cache, so once it gets full, we can expect it to switch to write-through, or? |
@aneagoe once the cache is 100% dirty the write requests will be redirected into pass-through and flushing of a small portion of dirty blocks will be triggered. We want to make sure that cache gets 100% dirty before the writes are pass-through'ed and we want to see occupancy and how many writes got pass-through'ed for each ioclass |
I've attached snapshots taken at 5 minute intervals. Triggered some pool resizes to get some more traffic going and it seems the cache never got to 100% dirty (but some 99.8%) before the flushing happens. Perhaps not very easy to go through, if you need different format/layout, I'm happy to re-run this experiment and accommodate, just let me know. |
Hi @aneagoe, just a quick question for you: I'm trying to reproduce what you're seeing on our side and I am seeing some passthrough traffic, but not as much as in your stats. Can you share a bit more info on your testing setup? For me it's ((HDD->single SSD CAS cache)*12osds)*3nodes and on top of that I create 60 RBD images, fill the caches with large writes and run some read-intensive workload or rwmix (IO done by fio with librbd). Maybe if we'd align a bit more it would help us see exactly the same issues you're seeing. |
Hey @jfckm. I have a dev cluster where I tend to test any changes before I apply to prod cluster. The dev cluster is built on top of PROXMOX with its own CEPH cluster. There are 3 OSD VMs, each with 6 x 1TB volumes and also 2 x NVME drive to be partly used by opencas (shared with OS, we assign a 43G LV to be used by opencas). From the same NVME drive, we also carve two LVs to be used as direct NVMe OSDs. On the NVMe backed storage we place things like rgw index bucket and cephfs_metadata pools. Some lsblk below to show the overall setup:
On top of it, we configure RBD (consumed by a dev k8s cluster), CEPHFS (some dev workloads) and object storage (also used by some dev workloads). The cluster is barely used as the workloads are minimal. The behaviour was observed naturally (and also shows up on our prod cluster) and during last test I had to trigger some pool reorg (bumped up/down number of PGs for some pools). OSD df below:
I can definitely run a more aligned test by spinning up workloads in k8s pods that will consume RBDs. I can take similar approach of filling the cache and then running a more mixed workload of both read and write. What I'm really trying to understand here is why am I seeing this large amount of PT writes when the cache is literally clean and it should just evict. |
Thanks! This will be helpful for sure. For now the data you've provided should suffice, no need for additional experiments - it's me who wanted to reproduce your environment as closely as I can. We're taking a two-pronged approach here: @robertbaldyga and @mmichal10 are trying to come up with something with static code analysis and in the meantime I'm trying to reproduce it on live system. We will keep you posted. |
Thanks a lot for you help. I'm now working on aligning all PROD nodes to the latest opencas (we observe the same behaviour on PROD cluster across various master releases). On the PROD cluster I had to use master branch and rebuild nodes one by one as we've switched to EL9 (CentOS stream9). We start to run into performance issues only when the cache gets full and unfortunately there's no way to force evict (I need to basically stop OSDs, stop and remove cache then force init the cache). |
Hi @aneagoe, You're experiencing the issue on Centos9 with OpenCAS v22.6.3 and v24.09, is that correct? Did the problem appear on Centos8? We tried to reproduce the problem but unfortunately the only PT requests we observed were due to cache being 100% dirty. I've prepared a script that collects cache stats in CVS format. Could you please run it for an hour or two on a node that is affected by excessive pass-through rate and then share the output?
|
I will run this in production cluster node against v24.09 (release). It's hard to say for sure if the problem was present on CentOS8, but I'm inclined to say yes. I will try to dig up some data from Prometheus and see if I can prove that. |
It looks like the issue doesn't manifest itself quite like before. I've collected stats from one of the nodes, where the serviced requests rate fluctuates between 60% and 95%+. Some dashboard stats below: It's quite hard to draw conclusions now. The prod servers that were most severely affected by this, I have upgraded to latest release version (v24.09). Also, if I go back in time for this particular host/cache instance, it looks like before 17th of October it used to have consistently low rates of serviced requests. A few more screenshots below showing also OS and OpenCAS version timelines. On 17th of October different settings were made to ioclass (basically remove and leave single class), which led to 100% serviced requests (though this also resulted in the cache never being cleaned properly). This was kept only temporarily.
During cluster upgrades and when the issue manifested itself the most, we've gone through massive rebalances in the cluster, as each server was upgraded from CentOS8 to CentOS9.
|
@aneagoe how did you solve this problem? Seems related to eviction we've seen also #1588 (comment) |
@jvinolas I did not quite solve it. It looks like it depends on workload, under normal circumstances it seems to behave. But when there are a lot of rebalances in the cluster, it starts to act up. I have not had a lot of time on my hands to try and fully reproduce this, but I hope to get back to it early next year. If you are able to reproduce it easily, then please post all information here, I'm sure it will help get to the bottom of this. |
Question
On a dev ceph cluster, I'm using opencas in front of the HDDs. Went through some upgrades (centos8 to centos9, upgrade to latest opencas 24.09.0.0900.release) and noticed that the rate if serviced requests is extremely low (ie below 10%). This seems to happen when the cache gets full, despite the fact that the cache is clean. With the current settings in place, I cannot explain why the requests are not serviced and go directly to the core.
The issue is very similar to #1397 which for some reason became stale.
Note that in the output below (from one of the servers), the seq-cutoff has been configured to never (set to full on the other nodes) in an attempt to see if it would change something. It had no effect unfortunately.
Motivation
I need to improve performance and then replicate to our prod cluster. Performance is now degraded as most operations skip the cache entirely.
Your Environment
The text was updated successfully, but these errors were encountered: