-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vdev expansion doesn't work #808
Comments
Found it. There are blatant mistakes in |
Work in progress: https://github.com/dechamps/zfs/compare/expand Even with these fixes, I'm still getting errors:
This means the Also, |
I looked for another EBUSY problem (read through issue #250 and issue #440 and also issue #4) some months ago, but with no success too. You may try this with files as vdevs, it seems like it does work then, at least for the zpool replace vdev with spare command. It may be related. It will be great, if there will be a solution for both cases. |
This might be caused by the fact that ZFS opens all of the block devices O_EXCL to prevent accidental concurrent access. Several other open issues as mentioned by pyavdr may be caused by the same root cause. A good solution for this needs to be found. As for autoexpand not working that's because we don't have a zeventd in place yet to handle the expand event and invoke Still your initial fixes looks like a good start! |
It's worse than that, actually: no matter if I just wrote a working fix which moves the It seems I'm hitting yet another issue though: the algorithm in |
Mmm... regarding
I really have no idea why it thinks In the mean time, I'm making good progress on |
It looks as is zprop_name_to_prop() will use the column name as an alias for the property name when called in user space by the zpool command. See propname_match(). That's likely why both As for getting |
The error handling code around zpool_relabel_disk() is either inexistent or wrong. The function call itself is not checked, and zpool_relabel_disk() is generating error messages from an unitialized buffer. Before: # zpool online -e homez sdb; echo $? `: cannot relabel 'sdb1': unable to open device: 2 0 After: # zpool online -e homez sdb; echo $? cannot expand sdb: cannot relabel 'sdb1': unable to open device: 2 1 Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#808
Currently, zpool_vdev_online() calls zpool_relabel_disk() with a short partition device name, which is obviously wrong because (1) zpool_relabel_disk() expects a full, absolute path to use with open() and (2) efi_write() must be called on an opened disk device, not a partition device. With this patch, zpool_relabel_disk() gets called with a full disk device path. The path is determined using the same algorithm as zpool_find_vdev(). Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#808
Commit e5dc681 changed EFI_NUMPAR from 9 to 128. This means that the on-disk EFI label has efi_nparts = 128 instead of 9. The index of the reserved partition, however, is still 8. This breaks efi_use_whole_disk(), which uses efi_nparts-1 as the index of the reserved partition. This commit fixes efi_use_whole_disk() when the index of the reserved partition is not efi_nparts-1. It rewrites the algorithm and makes it more robust by using the order of the partitions instead of their numbering. It assumes that the last non-empty partition is the reserved partition, and that the non-empty partition before that is the data partition. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#808
When the zettacache does an index merge, it also adds the new and changed entries to the index entry cache. In certain configurations, manipulating the index entry cache can be very time consuming and also have a big impact on the performance of concurrent zettacache activities. This is especially noticeable when the zettacache doesn't have a lot of data in it, e.g. when initially filling up. Additionally, the current data structure used for the index entry cache is not very efficient with memory; a lot of memory is used by its internal overheads. This commit changes the data structure used by the index entry cache to be a sharded 16-way associative roughly-LRU cache. Each entry can be stored in any of 16 "slots", which are searched when doing a lookup. When inserting and all 16 slots are full, the slot whose IndexValue has the oldest Atime is evicted. Each shard of the index is locked separately, allowing concurrent access to the overall entry cache. This improves performance in several ways: The index entry cache can be updated concurrently with lookups, so zettacache lookup/insert performance is not impacted as much by merging. On a workload of random reads causing inserts to the zettacache via sibling block ingestion, without this commit a merge causes insertion performance to drop to ~45% (420,000 -> 190,000 inserts/sec). The time to update the index entry cache is reduced, so the overall time to do a merge is reduced. The time to perform a merge when the index size is small, is reduced to 20% (5x improvement, 93 -> 19 seconds). The number of entries that can be cached in the given RAM budget is roughly trippled. The new memory usage per entry is 37% of previous (65 -> 24 bytes per entry; the IndexEntry size is 23 bytes).
…ndex run (openzfs#845) The zettacache index cache is updated as part of merging the PendingChanges into the on-disk index. The merge task sends the updates to the checkpoint task, as part of a `MergeProgress` message. The index cache updates are then made from a spawned blocking (CPU-bound) task. The updates are completed (waited for) before the next checkpoint completes. During the merge, it's expected that lookups can see IndexEntry's from the old index, either from reading the old index itself, or from the index entry cache. These stale entries are "corrected" by either `PendingChanges::update()`'s call to `Remap::remap()`, or `MergeState::entry_disposition()`'s check of `PendingChanges::freeing()`. When the `MergeMessage::Complete` is received it calls `Locked::rotate_index()` which deletes the old on-disk index, and calls `PendingChanges::set_remap(None)` and `Locked::merge.take()`. This ends the stale entry "corrections" mentioned above, which are no longer necessary because we can no longer see stale entries from the old on-disk index. The problem occurs when the `MergeMessage::Complete` is received and processed before the spawned blocking task completes. In this case, we end the stale entry "corrections", but we can still see stale entries from the index cache. This PR addresses the problem by waiting for the index cache updates to complete before processing the `MergeMessage::Complete`. The problem was introduced by openzfs#808.
Using latest SPL/ZFS from master on Debian squeeze (6.0.5) with linux 3.2.22, neither
autoexpand=on
,expand=on
norzpool online -e
work:(...expanded sdb from 5G to 10G...)
Note the serious WTF with
zpool online -e
behaving differently depending on whether I am running it from/dev
or not.A comment from 10 months ago seems to indicate that it worked at the time for at least one person, so I guess this is a regression?
The text was updated successfully, but these errors were encountered: