-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a diagnostic kstat for obtaining pool status #16484
Conversation
module/nvpair/nvpair.c
Outdated
#define JPRINTF(start, end, ...) \ | ||
do { \ | ||
if (start < end) \ | ||
start += snprintf(start, end - start, __VA_ARGS__); \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be safe we need to check that the return value is not negative here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, updated.
This kstat output does not require taking the spa_namespace lock, as in the case for 'zpool status'. It can be used for investigations when pools are in a hung state while holding global locks required for a traditional 'zpool status' to proceed. This kstat is not safe to use in conditions where pools are in the process of configuration changes (i.e., adding/removing devices). Therefore, this kstat is not intended to be a general replacement or alternative to using 'zpool status'. Sponsored-by: Wasabi Technology, Inc. Sponsored-By: Klara Inc. Co-authored-by: Don Brady <don.brady@klarasystems.com> Signed-off-by: Don Brady <don.brady@klarasystems.com>
This commit updates the kstat for pool status and simplifies by creating an nvlist that contains the pool status. This nvlist is then printed to provided buffer in JSON format. The redundant parts of code have also been removed. Signed-off-by: Umer Saleem <usaleem@ixsystems.com>
@usaleem-ix is the main goal of this to get the JSON into a kstat so it's lockless? Or was there another use case for wanting the JSON specifically in the kstats? I ask because I'm working on some prototype code that would remove |
@tonyhutter yes, the main goal of this is to get Also, instead of executing |
@tonyhutter For TrueNAS, we have a few primary reasons for wanting zpool status in procfs.
EDIT: |
@usaleem-ix @yocalebo thanks for the info.
$ sudo cat /proc/spl/kstat/zfs/tank/status.json | jq
{
"status_json_version": 4,
"scl_config_lock": true,
"scan_error": 2,
"scan_stats": {
"func": "NONE",
"state": "NONE"
},
... It also API-ifys the config nvlist, which is something I think we should avoid.
Overall, I think putting the JSON functionality in libzfs may be the better route, even if it also means fixing whatever thread safety issues we have in the library. It's just nice to have all the JSON generation done in userspace. Alternatively, if you want to go the fork+exec |
This is great. Removing that lock for zpool status will benefit everyone (not just selfish developers like myself 😄 )
I tend to agree this isn't that big of a deal. The linux kernel does yaml formatting for certain nfs statistics (and we actually found it wasn't escaping characters properly) and was producing invalid yaml. Not quite the same argument you're making about special chars, over/under flowing etc, but procfs is pretty resilient and has a ton of information from all kinds of esoteric subsystems these days.
I agree with you on this point. I see no benefit in exposing this type of information. I was under the assumption this would essentially mirror the
One of the biggest gripes I have is that libzfs isn't versioned and is hard to utilize 😄 I've got no strong opinion on this side of the argument though.
Yeah, this is a problem and I agree with you 100% here. Some subtle differences are okay but not matching at a large percentage is bad.
Agree with you 100% on this too. No reason to add unnecessary complexity like this.
That's a valid point.
Yeah, I'm all for improving libzfs. We've had a couple community members try to user our python libzfs bindings and they have run into issues with non-reentrant calls being made in libzfs (and IIRC, there were some global memory objects wreaking havoc at some point.) Anyways, one of the community members actually tried to help improve thread-safety by opening a PR here. There were quite a few follow-up commits to make the lib thread-safe. Maybe it is thread-safe (for the most part) and we just need to test it.
We already do this with a process pool. However, eventually the child processes get reaped and new ones get forked. This all becomes moot, however, if the library is indeed thread-safe. More tests need to be done on our side I guess. |
@tonyhutter I don't see a big problem in points 2 and 5, since it would be a code not requiring much maintenance, once written and forgotten. The rest of your points are valid to me. The question is how much work would be to handle those now and maintain it later. I see there ~300 lines of |
@amotin another issue is that I'm not convinced the pool kstats are taking the proper locks to deal with device removal/export. When I tested this back in September I was able to panic the kernel by running these two commands in parallel:
Maybe this has been fixed - I haven't re-tested it since then. |
I wonder if this issue is not specific to the pool status but is a general bug in Linux kstat implementation, not waiting for the ongoing calls to complete before returning from destruction. |
I had been working on addressing the review comments, mainly trying to make the kstat output look similar to When kstat output gets large for fairly large zpool, the number of times we construct the output and go back and forth for allocating buffer of suitable size to contain the kstat output, it actually becomes slower than While all these problems can be looked into and fixed, but this increases the effort required here to address these issues and those that have been highlighted above in earlier comments. After evaluating the cost and value this brings, we have decided to not pursue this further. Although, anybody who is interested in this work is welcome to try and continue it in future. |
Motivation and Context
This PR is an updated version of previous #16026
In the original PR, JSON was written into the buffer directly and nvlists were also converted to JSON, which was redundant.
Description
This PR creates an output nvlist and later nvlist is printed in JSON format to provided buffer. Spares and l2cache devices were also not showing up with previous #16026. This is also fixed now.
How Has This Been Tested?
Manually tested in different pool configurations.
Types of changes
Checklist:
Signed-off-by
.