-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read prefetched buffers from L2ARC #15451
Conversation
Prefetched buffers are currently read from L2ARC if, and only if, l2arc_noprefetch is set to non-default value of 0. This means that a streaming read which can be served from L2ARC will instead engage the main pool. For example, consider what happens when a file is sequentially read: - application requests contiguous data, engaging the prefetcher; - ARC buffers are initially marked as prefetched but, as the calling application consumes data, the prefetch tag is cleared; - these "normal" buffers become eligible for L2ARC and are copied to it; - re-reading the same file will *not* engage L2ARC even if it contains the required buffers; - main pool has to suffer another sequential read load, which (due to most NCQ-enabled HDDs preferring sequential loads) can drammatically increase latency for uncached random reads. In other words, current behavior is to write data to L2ARC (wearing it) without using the very same cache when reading back the same data. This was probably useful many years ago to preserve L2ARC read bandwidth but, with current SSD speed/size/price, it is vastly sub-optimal. Setting l2arc_noprefetch=1, while enabling L2ARC to serve these reads, means that even prefetched but unused buffers will be copied into L2ARC, further increasing wear and load for potentially not-useful data. This patch enable prefetched buffer to be read from L2ARC even when l2arc_noprefetch=1 (default), increasing sequential read speed and reducing load on the main pool without polluting L2ARC with not-useful (ie: unused) prefetched data. Moreover, it clear users confusion about L2ARC size increasing but not serving any IO when doing sequential reads. Signed-off-by: Gionatan Danti <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes good sense to me.
I'd say it makes sense in some cases, but in general "it depends". L2ARC is not expected and often is not faster than main pool on sequential deeply prefetched reads. I still think sending those reads to main pool may be beneficial. Ideally it would be the best to balance depending on load in specific time, but it is more complicated. I agree that it is a waste to write L2ARC and not read it, but I was thinking about opposite approach -- not marking as L2ARC-eligible buffers that were speculatively prefetched and for which prefetch read completed before demand request (i.e. ARC "hit", not "iohit" in my new terms), same as now done if demand request never arrives. I.e. if speculative prefetcher done its job well first time, hopefully it will be able to do it again next time and we do not need L2ARC. But if not, then reading from L2ARC may have sense even if it is prefetched. It may actually give us an interesting effect when first blocks of each file will be written to and read from L2ARC, but by the time prefetcher rump up the following data can be read from main pool. :) |
Well, sure, it depends on pool geometry. However, a single SATA SSD is 4-8x faster (depending on request size, LBA and fragmentation) than an HDD for sequential reads. A single NVMe drive is 20-50x faster. Moreover, a mirrored vdev is not going to double sequential read speed. How many times one has a 12 vdev main pool backed by a single SATA drive? Or a 50 vdev main pool with only a single NVMe drive? My personal experience is that large main pools command large (both in number and in size) L2ARC devices. Even smaller pool generally follow this rule (ie: I have multiple 4-6 vdev systems with 2x SATA SSD L2ARC or 1x NVMe drive).
It would be an interesting approach, useful for very large main pools with limited L2ARC capabilities, but I don't think it re-pays the added complexity. HDDs are so slow that reads should not touch them if data exist on fast SSD storage. It is not only about peak read speed, where HDDs are somewhat competitive. It really is about most HDD prioritizing sequential (and easy-to-coalesce) reads, in spite of both writes and random reads (whose latency greatly spikes). So leaving the main pool free to serve writes and uncached reads leads to better performance and should be the main target of L2ARC. Regards. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L2ARC is always limited in capacity and/or performance, otherwise it would be a primary storage. Most of L2ARC-class devices are just unable to sustain full-speed write stream for years. The fact that ZFS throttles L2ARC writes saves their life. But it means we have to prioritize what data worth writing to L2ARC and what are not. May be we could use different policies depending on how full is L2ARC -- blind unprioritized overwrite of already full L2ARC makes no sense. My proposal should not be so hard to implement. Unless somebody else get to it first I may do it one day. At that point the condition in arc_read() indeed would not have much sense -- if we never write trash to L2ARC then we should have no trash hits there. From that perspective, hoping for further development, I approve this commit.
First, thank both you and @behlendorf for accepting this commit.
My point is that current behavior is the worst possible because:
This patch fixes the "not used for reads" part without increasing write load on L2ARC. My personal experience is that L2ARC is often abundantly fast - it is a cache (and not a primary storage) mainly due to limited capacity and resiliency. I understand that big HDD pools can sustain high streaming performance, but so can 2-4x NVMe cache drive. Regards. |
Prefetched buffers are currently read from L2ARC if, and only if, l2arc_noprefetch is set to non-default value of 0. This means that a streaming read which can be served from L2ARC will instead engage the main pool. For example, consider what happens when a file is sequentially read: - application requests contiguous data, engaging the prefetcher; - ARC buffers are initially marked as prefetched but, as the calling application consumes data, the prefetch tag is cleared; - these "normal" buffers become eligible for L2ARC and are copied to it; - re-reading the same file will *not* engage L2ARC even if it contains the required buffers; - main pool has to suffer another sequential read load, which (due to most NCQ-enabled HDDs preferring sequential loads) can dramatically increase latency for uncached random reads. In other words, current behavior is to write data to L2ARC (wearing it) without using the very same cache when reading back the same data. This was probably useful many years ago to preserve L2ARC read bandwidth but, with current SSD speed/size/price, it is vastly sub-optimal. Setting l2arc_noprefetch=1, while enabling L2ARC to serve these reads, means that even prefetched but unused buffers will be copied into L2ARC, further increasing wear and load for potentially not-useful data. This patch enable prefetched buffer to be read from L2ARC even when l2arc_noprefetch=1 (default), increasing sequential read speed and reducing load on the main pool without polluting L2ARC with not-useful (ie: unused) prefetched data. Moreover, it clear users confusion about L2ARC size increasing but not serving any IO when doing sequential reads. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Gionatan Danti <[email protected]> Closes openzfs#15451
Prefetched buffers are currently read from L2ARC if, and only if, l2arc_noprefetch is set to non-default value of 0. This means that a streaming read which can be served from L2ARC will instead engage the main pool. For example, consider what happens when a file is sequentially read: - application requests contiguous data, engaging the prefetcher; - ARC buffers are initially marked as prefetched but, as the calling application consumes data, the prefetch tag is cleared; - these "normal" buffers become eligible for L2ARC and are copied to it; - re-reading the same file will *not* engage L2ARC even if it contains the required buffers; - main pool has to suffer another sequential read load, which (due to most NCQ-enabled HDDs preferring sequential loads) can dramatically increase latency for uncached random reads. In other words, current behavior is to write data to L2ARC (wearing it) without using the very same cache when reading back the same data. This was probably useful many years ago to preserve L2ARC read bandwidth but, with current SSD speed/size/price, it is vastly sub-optimal. Setting l2arc_noprefetch=1, while enabling L2ARC to serve these reads, means that even prefetched but unused buffers will be copied into L2ARC, further increasing wear and load for potentially not-useful data. This patch enable prefetched buffer to be read from L2ARC even when l2arc_noprefetch=1 (default), increasing sequential read speed and reducing load on the main pool without polluting L2ARC with not-useful (ie: unused) prefetched data. Moreover, it clear users confusion about L2ARC size increasing but not serving any IO when doing sequential reads. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Gionatan Danti <[email protected]> Closes openzfs#15451
IMHO this is perfectly sensible and it's how I've run my pools for years... |
Hi, thanks for your work Has this PR been added to 2.2.7 release? I see it is in the patch-set but not in the release notes Thanks, |
Prefetched buffers are currently read from L2ARC if, and only if, l2arc_noprefetch is set to non-default value of 0. This means that a streaming read which can be served from L2ARC will instead engage the main pool.
For example, consider what happens when a file is sequentially read:
In other words, current behavior is to write data to L2ARC (wearing it) without using the very same cache when reading back the same data. This was probably useful many years ago to preserve L2ARC read bandwidth but, with current SSD speed/size/price, it is vastly sub-optimal.
Setting l2arc_noprefetch=1, while enabling L2ARC to serve these reads, means that even prefetched but unused buffers will be copied into L2ARC, further increasing wear and load for potentially not-useful data.
This patch enable prefetched buffer to be read from L2ARC even when l2arc_noprefetch=1 (default), increasing sequential read speed and reducing load on the main pool without polluting L2ARC with not-useful (ie: unused) prefetched data. Moreover, it clear users confusion about L2ARC size increasing but not serving any IO when doing sequential reads.
Motivation and Context
Description
How Has This Been Tested?
Types of changes
Checklist:
Signed-off-by
.