-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase L2ARC write rate and headroom #15457
Conversation
|
This patch is in draft form not because it does anything complex (it just changes two constants used for default values), but because I would like to get feedback from others. I deployed increased
VMs deployed on such servers "feel" much more like SSDs-backed even after an host reboot. However, these KVM hosts have 64-192 GB of RAM only - so I don't really know if Thanks. |
I think we've already discussed that before. L2ARC was designed to cache data that are going to be evicted from ARC. Headroom controls how much more data we expect to be evicted from ARC per second, that L2ARC should care. If the first in the row of eviction are data of some other pool or data that are not L2ARC-eligible (IIRC we've discussed that during ARC warmup too much data were not eligible due to ongoing prefetch), then L2ARC does not need to write anything and should not need to look deeper, it should stop. This logic is still valid when ARC is warm. It can be discussed how good idea it is to write L2ARC while ARC is not full yet and what should we do with prefetched data and headroom in that case, but the fix here would likely be not a blind headroom disable, but some changes to the code logic. That is your feedback from me. |
Just on a level of ideas: in case persistent L2ARC is enabled, while ARC is still cold and L2ARC is not full, L2ARC could write only MFU buffers and without headroom. It would give persistent L2ARC a boost of the most useful data in case of reboot. After ARC warmed up operation could return to original algorithm, including heardoom. |
I find a plain TBW value to be overly pessimistic, as the cache device is not going to write at full-speed all the time. At the current 8 MB/s, a worst case estimate is 8 * 86400 * 365 = 240 TB/year, while the SSDs of one KVM server (2x 500 GB Samsung 850 EVO) are 6 years old and each has written a total of ~60.5 TB (10 TB/year only). Since last reboot, 9 days ago:
As a side note, on this server
I share that concern, even if on these 64-192 GB servers I did not see anything wrong. Maybe because I am using 128K recordsize? Anyway anything scanning 1-4 GB ARC should be ok as
Maybe in #15201?
Is this the current logic? I don't remember the feed thread doing that (stopping after some ineligible buffers are found).
I agree. At the same time, I remember this very useful comment #15201 (comment) stating that the ARC sublists only contains eligible buffers, so the feed thread should not really scan the entire ARC. Thanks. |
Feed thread scans up to headroom, but skips ineligible buffers. If none of scanned buffers are eligible -- nothing will be written.
The sublists contain buffers eligible for eviction. It does not mean they all are eligible for L2ARC -- some may already be in L2ARC, some may belong to a different pool, some are from dataset with disabled secondarycache, some are prefetches. |
Ok, sure, I misunderstood the previous post.
You are right. I agree that completely disabling headroom limit can be too much. At the same time, I am somewhat surprised that I did never see the feed thread to cause any significant load even on servers with What about setting Thanks. |
With the new write limit it would mean up to 1GB/s of scanned buffers, or up to 4GB/s considering boosts due to compressed and cold ARC, or up to 16GB/s considering all traversed lists. Sure such write speeds are reachable in real life, but not by every system. Also not every system has so much ARC in general. This value would not be completely insane, but feels quite aggressive. But before it I would prefer some code review/cleanup to be done there. I am not getting sense of l2arc_headroom_boost these days. I think in case of compressed ARC we should just measure the headroom in terms of HDR_GET_PSIZE(), not HDR_GET_LSIZE(). That would match both how much do we write to the L2ARC and how much do we evict from ARC. Doing better math we could reduce headroom by dropping compression boost and only adjusting the general one. |
Yes, it would be remains quite aggressive. Maybe a safer approach is the simpler one - as I increased
I think the general idea was "if compression is enabled, consider a 2x data reduction rate". Better math would be fine, but as an hand-wave rule I find it quite reasonable. As current values are so undersized, I am upgrading this PR with Thanks. |
If ARC is compressed, then we write the data to L2ARC exactly as they are in ARC. We do not need to guess, we know the exact physical size.
I have no objections. |
I just updated the man page.
EDIT: no, I'm wrong, |
Current L2ARC write rate and headroom parameters are very conservative: l2arc_write_max=8M and l2arc_headroom=2 (ie: a full L2ARC writes at 8 MB/s, scanning 16/32 MB of ARC tail each time; a warming L2ARC runs at 2x these rates). These values were selected 15+ years ago based on then-current SSDs size, performance and endurance. Todays we have multi-TB, fast and cheap SSDs which can sustain much higher read/write rates. For this reason, this patch increases l2arc_write_max to 32M and l2arc_headroom to 8 (4x increase for both). Signed-off-by: Gionatan Danti <g.danti@assyoma.it>
I see some CI tests failing... can the failures be related to this patch? |
It looks like it could be due to the pool layouts for some of the test cases. I do see the following warning in the CI console logs before the failures. Although, based on the log message it should have capped this to something safe.
|
Interesting. Do you think it is an issue with the test suite, or should I implement a cap for Thanks. |
It's surprising. I've resubmitting those CI runs, let see how reproducible it is. |
PR openzfs#15457 exposed weird logic in L2ARC write sizing. If it appeared bigger than device size, instead of liming write it reset all the system-wide tunables to their default. Aside of being excessive, it did not actually help with the problem, still allowing infinite loop to happen. This patch removes the tunables reverting logic, but instead limits L2ARC writes (or at least eviction/trim) to 1/4 of the capacity. Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc.
PR openzfs#15457 exposed weird logic in L2ARC write sizing. If it appeared bigger than device size, instead of liming write it reset all the system-wide tunables to their default. Aside of being excessive, it did not actually help with the problem, still allowing infinite loop to happen. This patch removes the tunables reverting logic, but instead limits L2ARC writes (or at least eviction/trim) to 1/4 of the capacity. Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc.
PR openzfs#15457 exposed weird logic in L2ARC write sizing. If it appeared bigger than device size, instead of liming write it reset all the system-wide tunables to their default. Aside of being excessive, it did not actually help with the problem, still allowing infinite loop to happen. This patch removes the tunables reverting logic, but instead limits L2ARC writes (or at least eviction/trim) to 1/4 of the capacity. Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc.
PR openzfs#15457 exposed weird logic in L2ARC write sizing. If it appeared bigger than device size, instead of liming write it reset all the system-wide tunables to their default. Aside of being excessive, it did not actually help with the problem, still allowing infinite loop to happen. This patch removes the tunables reverting logic, but instead limits L2ARC writes (or at least eviction/trim) to 1/4 of the capacity. Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc.
PR #15457 exposed weird logic in L2ARC write sizing. If it appeared bigger than device size, instead of liming write it reset all the system-wide tunables to their default. Aside of being excessive, it did not actually help with the problem, still allowing infinite loop to happen. This patch removes the tunables reverting logic, but instead limits L2ARC writes (or at least eviction/trim) to 1/4 of the capacity. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Amanakis <gamanakis@gmail.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #15519
Current L2ARC write rate and headroom parameters are very conservative: l2arc_write_max=8M and l2arc_headroom=2 (ie: a full L2ARC writes at 8 MB/s, scanning 16/32 MB of ARC tail each time; a warming L2ARC runs at 2x these rates). These values were selected 15+ years ago based on then-current SSDs size, performance and endurance. Today we have multi-TB, fast and cheap SSDs which can sustain much higher read/write rates. For this reason, this patch increases l2arc_write_max to 32M and l2arc_headroom to 8 (4x increase for both). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Gionatan Danti <g.danti@assyoma.it> Closes openzfs#15457
PR openzfs#15457 exposed weird logic in L2ARC write sizing. If it appeared bigger than device size, instead of liming write it reset all the system-wide tunables to their default. Aside of being excessive, it did not actually help with the problem, still allowing infinite loop to happen. This patch removes the tunables reverting logic, but instead limits L2ARC writes (or at least eviction/trim) to 1/4 of the capacity. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Amanakis <gamanakis@gmail.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes openzfs#15519
PR openzfs#15457 exposed weird logic in L2ARC write sizing. If it appeared bigger than device size, instead of liming write it reset all the system-wide tunables to their default. Aside of being excessive, it did not actually help with the problem, still allowing infinite loop to happen. This patch removes the tunables reverting logic, but instead limits L2ARC writes (or at least eviction/trim) to 1/4 of the capacity. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Amanakis <gamanakis@gmail.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes openzfs#15519
PR #15457 exposed weird logic in L2ARC write sizing. If it appeared bigger than device size, instead of liming write it reset all the system-wide tunables to their default. Aside of being excessive, it did not actually help with the problem, still allowing infinite loop to happen. This patch removes the tunables reverting logic, but instead limits L2ARC writes (or at least eviction/trim) to 1/4 of the capacity. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: George Amanakis <gamanakis@gmail.com> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #15519
IMHO this is not only sensible but perhaps even still too conservative. But a definite start! |
Current L2ARC write rate and headroom parameters are very conservative: l2arc_write_max=8M and l2arc_headroom=2 (ie: a full L2ARC writes at 8 MB/s, scanning 16/32 MB of ARC tail each time; a warming L2ARC runs at 2x these rates). These values were selected 15+ years ago based on then-current SSDs size, performance and endurance. Today we have multi-TB, fast and cheap SSDs which can sustain much higher read/write rates. For this reason, this patch increases l2arc_write_max to 32M and l2arc_headroom to 8 (4x increase for both). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Gionatan Danti <g.danti@assyoma.it> Closes openzfs#15457
Current L2ARC write rate and headroom parameters are very conservative: l2arc_write_max=8M and l2arc_headroom=2 (ie: a full L2ARC writes at 8 MB/s, scanning 16/32 MB of ARC tail each time; a warming L2ARC runs at 2x these rates). These values were selected 15+ years ago based on then-current SSDs size, performance and endurance. Today we have multi-TB, fast and cheap SSDs which can sustain much higher read/write rates. For this reason, this patch increases l2arc_write_max to 32M and l2arc_headroom to 8 (4x increase for both). Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Gionatan Danti <g.danti@assyoma.it> Closes openzfs#15457
Current L2ARC write rate and headroom parameters are very conservative:
l2arc_write_max=8M and l2arc_headroom=2 (ie: a full L2ARC writes at
8 MB/s, scanning 16/32 MB of ARC tail each time; a warming L2ARC runs
at 2x these rates).
These values were selected 15+ years ago based on then-current SSDs
size, performance and endurance. Todays we have multi-TB, fast and
cheap SSDs which can sustain much higher read/write rates.
For this reason, this patch increases l2arc_write_max to 32M and
l2arc_headroom to 8 (4x increase for both).
Motivation and Context
Description
How Has This Been Tested?
Types of changes
Checklist:
Signed-off-by
.