-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rebuild related changes based on metadata #28
Conversation
mynktl
commented
Mar 26, 2018
- Handling of rebuild IO from other replica according to metadata - Removing code based on txg number Signed-off-by: mayank <mayank.patel@cloudbyte.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Mayank,
I have not looked in to changes yet. Just want to check if we missing data API will work on given range ?
Idea is that dowgrade replica may pass <io_seq, start offset of scan, len/end offset of scan> to healthy replica,
asking healthy replica to just scan data between and <len/end offset of scan>.
This is required where we want to rebuild a downgraded replica from multiple healthy replicas for different ranges in LUN.
One usecase where it can make sense is assume, a new replica has been added to cluster in that case, instead of rebuilding from one healthy replica, we can partition LUN in smaller ranges/chunks and can rebuild from different healthy replicas.
I am not sure if but if fresh replica added, we may need an optimized way of copying data from healthy replicas to this newly added replica by avoiding some of the checks.
lib/libzpool/uzfs_io.c
Outdated
len, (void **)&chunk_io); | ||
if (!count) | ||
goto exit_with_error; | ||
if ((zv->zv_status & ZVOL_REBUILDING_IN_PROGRESS) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
zv_status is a variable to maintain state not flags. For improving readability better to use "=="/"logical equal" to operator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
Hi @satbirchhikara In case of fresh replica, if we allows write IO's in fresh replica then we need to check for overlaps in uzfs_write_data. This may consumes more times since we are allowing writes. |
Snapshot created with provided metadata will be removed if (offset + length) == (end of metadata) - Replaced dsl_pool_hold with dmu_object_own to hold snapshot in uzfs_get_io_diff Since, it was causing deadlock - Functionality test for partial read from uzfs_get_io_diff Signed-off-by: mayank <mayank.patel@cloudbyte.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good as usual
include/sys/uzfs_zvol.h
Outdated
uint64_t rebuild_bytes; | ||
avl_tree_t *incoming_io_tree; /* incoming io tree */ | ||
} zvol_rebuild_data_t; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this structure be removed? (there is just one number inside and stale comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing out this.
zvol_rebuild_data_t
was meant to hold all rebuild related variables. So better to keep as it is.
As of now, there are two variables we are using for rebuild purpose rebuild_bytes
and zv_rebuild_status
. I will move zv_rebuild_status
to zvol_rebuild_data_t
structure.
I will also remove stale comment.
lib/libzpool/uzfs_rebuilding.c
Outdated
last_lun_offset = lun_offset; | ||
} | ||
diff_count++; | ||
} else if (!ret) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should never happen in our code path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not required if the single write has single io number.
I will merge this case with on_disk_io_number > rebuild io number.
lib/libzpool/uzfs_rebuilding.c
Outdated
metadatasize; | ||
ret = compare_blk_metadata(rd_metadata, metadata); | ||
if (ret == -1) { | ||
// old io number < new io number |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on_disk io number < incoming IO number
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. that makes sense. I will update comment.
lib/libzpool/uzfs_rebuilding.c
Outdated
ret = compare_blk_metadata(rd_metadata, metadata); | ||
if (ret == -1) { | ||
// old io number < new io number | ||
if (diff_count == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need of curly braces..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
include/uzfs_rebuilding.h
Outdated
* than w_metadata then that part of IO's will be discarded else it will be | ||
* added to list. | ||
*/ | ||
int uzfs_search_nonoverlapping_io(zvol_state_t *zv, uint64_t offset, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uzfs_get_condensed_old_ondisk_blks..?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
lib/libzpool/uzfs_io.c
Outdated
@@ -310,3 +308,52 @@ uzfs_zvol_get_rebuild_status(zvol_state_t *zv) | |||
{ | |||
return (zv->zv_rebuild_status); | |||
} | |||
|
|||
/* | |||
* This assumes snapshot, so, no need of range lock |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't seems to be the case all the time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. Actually, range lock for metadata is not needed since all the operation on metadata will go through uzfs_read_data
and uzfs_write_data
.
In uzfs_get_io_diff, we are calling uzfs_read_metadata but that is on snapshot so no need of range lock.
I will remove range lock for metadata.
lib/libzpool/uzfs_io.c
Outdated
uint64_t blocksize = zv->zv_volmetablocksize; | ||
uint64_t len_in_first_aligned_block, bytes, read = 0; | ||
uint64_t end = offset + len; | ||
uint64_t metaobjectsize = P2ALIGN(zv->zv_volsize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't seems to be correct.. we need to care of metadatasize also..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. I will update this.
bytes = metaobjectsize - offset; | ||
|
||
ret = dmu_read(zv->zv_objset, ZVOL_META_OBJ, offset, bytes, | ||
buf + read, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we also have a check that we are not writing more than allocated space?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think - not required.. this should be good
lib/libzpool/uzfs_rebuilding.c
Outdated
|
||
int | ||
uzfs_get_io_diff(zvol_state_t *zv, blk_metadata_t *low, | ||
uzfs_get_io_diff_cb_t *func, off_t zvol_offset, size_t zvol_len, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make zvol_offset, zvol_len as lun_offset, lun_len
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok.
lib/libzpool/uzfs_rebuilding.c
Outdated
} | ||
|
||
strfree(snap_name); | ||
ret = dmu_objset_own(dataset, DMU_OST_ANY, B_TRUE, snap_zv, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about using uzfs_open_dataset_init, which creates and initializes zv..?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. we can use it. I will try that and update changes.
lib/libzpool/uzfs_rebuilding.c
Outdated
if (end > metaobjectsize) | ||
end = metaobjectsize; | ||
|
||
ret = get_metadata_snapshot_info(zv, low, &snap_zv); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_snapshot_zv
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. I will make this function generic as get_snapshot_zv(zv, snap_name, snap_zv)
lib/libzpool/uzfs_rebuilding.c
Outdated
last_index = i; | ||
} | ||
diff_count++; | ||
if (last_md != NULL && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check on last_md is not required. add some comments for this logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
include/uzfs_io.h
Outdated
/* | ||
* API to read metadata | ||
*/ | ||
extern int uzfs_read_metadata(void *zv, char *buf, uint64_t offset, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make zv as zvol_state_t
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as these APIs are being used in production also, having strict check on arguments will be good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
include/sys/uzfs_zvol.h
Outdated
typedef int (uzfs_txg_diff_traverse_cb_t)(off_t offset, size_t len, | ||
uint64_t blkid, void *arg); | ||
typedef int (uzfs_get_io_diff_cb_t)(off_t offset, size_t len, | ||
blk_metadata_t *metadata, objset_t *obj, void *arg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about passing zvol_state_t and using uzfs_read_data rather than dmu_read in callback implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall, looks good.
metaobjectsize issue in uzfs_read_metadata should have caught in unit testing code. probably, higher active data size may be required.
- code changes according to review comments Signed-off-by: mayank <mayank.patel@cloudbyte.com>
@@ -48,14 +49,9 @@ | |||
|
|||
#define WRITE_METADATA(zv, metablk, mdata, tx) \ | |||
do { \ | |||
rl_t *mrl; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will be good if offset and len are multiple of zv_metavolblocksize. And, uzfs also recevies IOs in same way from tgt.
So, please add VERIFY in read and write APIs for offset & len to be multiple of zv_metavolblocksize
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
dataset = kmem_asprintf("%s@%s", strchr(zv->zv_name, '/') + 1, | ||
snap_name); | ||
|
||
ret = uzfs_open_dataset(zv->zv_spa, dataset, snap_zv); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uzfs_open_dataset calls uzfs_open_dataset_init.. please change uzfs_open_dataset_init as uzfs_own_dataset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
lib/libzpool/uzfs_rebuilding.c
Outdated
// same as last one. | ||
diff_count++; | ||
|
||
if (compare_blk_metadata((blk_metadata_t *) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
diff_count++ should go into 'else' case, and, last_lun_offset/last_md/last_index need to be set in 'if' after EXECUTE_... macro
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
lun_offset += zv->zv_metavolblocksize; | ||
} | ||
|
||
if (diff_count) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
&&!ret
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is not needed here
- changes according to review comments - uzfs_open_dataset_init changed to uzfs_own_dataset for better readability Signed-off-by: mayank <mayank.patel@cloudbyte.com>
- setting metavolblocksize to 512 bytes when creating volume through CLI Signed-off-by: mayank <mayank.patel@cloudbyte.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good
Due to conflicts on this PR, same changes raised in PR31. Hence, closing this. |
Due to conflicts on this PR, same changes raised in PR31. Hence, closing this. |