forked from openzfs/zfs
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
DLPX-84701 DOSE Migration: Add Object-Store vdev in running ZFS Pool (o…
…penzfs#796) = Description This commit allows us to add an object store bucket as a vdev in an existing pooll and it is the first part of the DOSE Migration project. = Note: Forcing Addition From `zpool add` Attempting to add an object-store vdev without `-f` yields the following error message: ``` $ sudo zpool add -o object-endpoint=etc.. testpool s3 cloudburst-data-2 invalid vdev specification use '-f' to override the following errors: mismatched replication level: pool uses disk and new vdev is objectstore ``` This is done on purpose for now. Adding an objects-store vdev to a pool is an irreversible operation and should be handled with caution. = Note: Syncing Labels & The Uberblock When starting from a block-based pool and we add an object-store vdev there is a point where we have an object-store vdev in our config but that vdev is not accepting allocations and therefore we can sync the config to it. That point is the exact TXG where the vdev is added and we need to sync its config changes to the labels & uberblock of our block-based vdevs. For this reason, I adjusted all the codepaths under `vdev_config_sync()` to be able to handle the update of the labels and uberblock of the local devices even when there is an object-store vdev. This way, if the next TXG fails we have the new vdev somewhere on our config. For all TXGs from that point on, we always sync the object store's config first. This is also the config that we always look at first when opening the pool. With the above changes in `vdev_config_sync()` changes the behavior of existing pure (e.g. non-hybrid) object-store pools to occasionally update the labels of their slog devices (e.g. every time we dirty the pool's config). This should not really have any negative effect in existing pure object-store pools. On the contrary it should keep their labels up to date and potentially fix any extreme corner cases in pool import. = Note: ZIL allocations When the pool is in a hybrid state (e.g. backed by both an object store and block devices) with no slog devices, we could make zil allocations fall back to the embedded slog or normal class. I left that functionality as future work. This is not a prerequisite for DOSE migration as customers are expected to add zettacache(+ slog) devices as the first part of migration and therefore their VMs will always have at least one slog device when the object store vdev is added. = Note: Storage Pool Checkpoint Pure object-based pools (e.g. not hybrid ones) do the checkpoint rewinding process in the object agent. This is a different mechanism from the storage pool checkpoint in block-based pools. Until we have the need to make those two mechanism work well with each other we avoid any migrations to the object store while a zpool checkpoint is in effect. See `spa_ld_checkpoint_rewind()` usage in `spa_load_impl()` for more info. = Note: Ordering of import paths To import a hybrid pool we need to specify two import paths: (1) the path of the local block devices (e.g. `/dev/..etc`) and (2) the name of the bucket in the object store. Unfortunately given how `zpool_find_import_agent()` is implemented, importing hybrid pools only works if we specify (2) first and then (1) but not the opposite. Doing the opposite results in the zpool command hanging (again this is because of the current XXX in the aforementioned function). = Note: Testing Lossing Power Mid-Addition of the Object Store vdev I tested that by manually introducing panics like so: ``` diff --git a/module/zfs/spa.c b/module/zfs/spa.c index 5b55bb275..a82fab841 100644 --- a/module/zfs/spa.c +++ b/module/zfs/spa.c @@ -6969,7 +6969,9 @@ spa_vdev_add(spa_t *spa, nvlist_t *nvroot) * if we lose power at any point in this sequence, ... * steps will be completed the next time we load ... */ + ASSERT(B_FALSE); // <--- panic before config sync (void) spa_vdev_exit(spa, vd, txg, 0); + ASSERT(B_FALSE); // <--- panic before config sync mutex_enter(&spa_namespace_lock); spa_config_update(spa, SPA_CONFIG_UPDATE_POOL); ``` Importing the pool after the first panic we come back without the object store vdev as expected. Importing the pool after the second panic, we come back with the object store but don't allocate from it nor change the spa_pool_type until we create its metaslabs on disk. = Next Steps I'm planning to implement the augmented device removal logic which evacuates data from the block devices to the object-store. Once that is done, I'm planning to work on avoiding the insertion of all the evacuated data in the zettacache. Signed-off-by: Serapheim Dimitropoulos <serapheim@delphix.com>
- Loading branch information
Showing
27 changed files
with
815 additions
and
91 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.