-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Remove spa_namespace_lock from zpool status #16507
Conversation
spa_t * | ||
spa_lookup_lite(const char *name) | ||
{ | ||
static spa_t search; /* spa_t is large; don't allocate on stack */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this burn the metaphorical house down on running multiple calls to it at once, or do I not understand how static declarations work in the kernel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose, less obliquely.
What does a second lock buy you that just making spa_namespace_lock a rwlock doesn't? I get "it's too big to fix all at once", but if you just declare them all writers except for the codepaths you're only taking lite on here, it seems to produce the same outcomes, no?
And if not, I would predict similar confusion on people trying to consume this interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this burn the metaphorical house down on running multiple calls to it at once
It's ok in this case since it's protected by spa_namespace_lite_lock.
declare them all writers except for the codepaths you're only taking lite on here, it seems to produce the same outcomes, no?
Unfortunately, if one of the pools gets hosed up while holding spa_namespace_lock as writer, zpool status will hang trying to acquire the reader lock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't that still true on the lite lock, then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that you're making me think about it, we may not actually need a separate spa_namespace_lite_avl
tree. We may just need the new lock. The two separate trees make it conceptually easier to understand, but I don't know if it's functionally needed, since the trees must always be exactly the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't that still true on the lite lock, then?
Technically yes, but realistically no. The lite lock is only held as writer when adding/removing from the AVL tree (which is a very short, non-blocking operation):
rw_enter(&spa_namespace_lite_lock, RW_WRITER);
avl_remove(&spa_namespace_avl, spa);
rw_exit(&spa_namespace_lite_lock);
...
rw_enter(&spa_namespace_lite_lock, RW_WRITER);
avl_add(&spa_namespace_avl, spa);
rw_exit(&spa_namespace_lite_lock);
All the other times the lite lock is taken as reader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the two separate trees save you anything, since I think the only time they'd be useful is if you wanted to access the ro tree while the rw tree had a lock held, and that isn't allowed with this dance anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the lite tree and my local testing worked. The lite tree is removed in my latest push.
f701b68
to
9df9d5e
Compare
This commit removes spa_namespace_lock from the zpool status codepath. This means that zpool status will not hang if a pool fails while holding the spa_namespace_lock. Background: The spa_namespace_lock was originally meant to protect the spa_namespace_avl AVL tree. The spa_namespace_avl tree held the mappings from pool names to spa_t's. So if you wanted to lookup the spa_t for the "tank" pool, you would do an AVL search for "tank" while holding spa_namespace_lock. Over time though the spa_namespace_lock was re-purposed to protect other critical codepaths in the spa subsystem as well. In many cases we don't know what the original authors meant to protect with it, or if they needed it for read-only or read-write protection. It is simply "too big and risky to fix properly". The workaround is to add a new lightweight version of the spa_namespace_lock called spa_namespace_lite_lock. spa_namespace_lite_lock only protects the AVL tree, and nothing else. It can be used for read-only access to the AVL tree without requiring the spa_namespace_lock. Calls to spa_lookup_lite() and spa_next_lite() only need to acquire a reader lock on spa_namespace_lite_lock; they do not need to also acquire the old spa_namespace_lock. This allows us to still run zpool status even if the zfs module has spa_namespace_lock held. Note that these AVL tree locks only protect the tree, not the actual spa_t contents. Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Testing is showing that not all users of the The code pattern looks like this: // protect the AVL tree... and in our imagination, also protect the spa_t we get back...
mutex_enter(&spa_namespace_lock)
spa = spa_lookup(poolname)
< possibly call spa_config_enter(spa, ...) to lock certain fields in spa, possibly not... >
< do stuff with spa>
< possibly call spa_config_exit(spa, ...), possibly not... >
mutex_exit(&spa_namespace_lock) So we're going to need more than simply the addition of the // Protect the AVL tree from modification
rw_enter(&spa_namespace_lite_lock, RW_READER)
// lookup our spa_t
spa = spa_lookup_lite(poolname)
// protect our spa_t
rw_enter(&spa->namespace_legacy_lock, RW_READER/RW_WRITER)
// our spa_t is protected, remove the lock on the AVL tree
rw_exit(&spa_namespace_lite_lock)
< do stuff with spa>
// all done with our spa_t
rw_exit(&spa->namespace_legacy_lock) I'll try to put a prototype of this together and see how it shakes out. |
It's going to be some time before I'll be able to get back to this. I'll close it for now so it's not a zombie PR. |
Motivation and Context
Prevent
zpool status
from hanging if the zfs module is holding thespa_namespace_lock
.Description
This commit removes
spa_namespace_lock
from thezpool status
codepath. This means that zpool status will not hang if a pool fails while holding thespa_namespace_lock
.Background:
The
spa_namespace_lock
was originally meant to protect thespa_namespace_avl
AVL tree. Thespa_namespace_avl
tree holds the mappings from pool names to theirspa_t
. So if you wanted to lookup thespa_t
for the "tank" pool, you would do an AVL search for "tank" while holdingspa_namespace_lock
.Over time though the
spa_namespace_lock
was re-purposed to protect other critical codepaths in the spa subsystem. In many cases we don't know what the original authors meant to protect with it, or if they needed it for read-only or read-write protection. It is simply "too big and risky to fix properly".The workaround is to add a new lightweight version of the
spa_namespace_lock
calledspa_namespace_lite_lock
.spa_namespace_lite_lock
only protects the AVL tree, and nothing else. It can be used for read-only access to the AVL tree without requiring thespa_namespace_lock
. Calls tospa_lookup_lite()
andspa_next_lite()
only need to acquire a reader lock onspa_namespace_lite_lock
; they do not need to also acquire the oldspa_namespace_lock
. This allows us to still runzpool status
even if the zfs module hasspa_namespace_lock
held. Note that these AVL tree locks only protect the tree, not the actualspa_t
contents.How Has This Been Tested?
I added a new module param
spa_namespace_delay_ms
to introduce an artificial delay right after acquiring thespa_namespace_lock
. I added a one-second delay and could seezpool create
,zpool destory
, andzpool import
take at least one second longer. At the same time, I could seezpool status
,zpool get
andzpool list
run instantaneously. I added a test case for this as well.Marking this as WIP since I want to do some more manual testing
Types of changes
Checklist:
Signed-off-by
.