Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rebuild related changes based on metadata #28

Closed
wants to merge 7 commits into from

Conversation

mynktl
Copy link
Member

@mynktl mynktl commented Mar 26, 2018

- API to get modified data from metadata
- Handling of rebuild IO from other replica according to metadata
- Removing old rebuilding code (txg base rebuilding)

- Handling of rebuild IO from other replica according to metadata
- Removing code based on txg number

Signed-off-by: mayank <mayank.patel@cloudbyte.com>
Copy link
Collaborator

@satbirchhikara satbirchhikara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Mayank,

I have not looked in to changes yet. Just want to check if we missing data API will work on given range ?

Idea is that dowgrade replica may pass <io_seq, start offset of scan, len/end offset of scan> to healthy replica,
asking healthy replica to just scan data between and <len/end offset of scan>.

This is required where we want to rebuild a downgraded replica from multiple healthy replicas for different ranges in LUN.

One usecase where it can make sense is assume, a new replica has been added to cluster in that case, instead of rebuilding from one healthy replica, we can partition LUN in smaller ranges/chunks and can rebuild from different healthy replicas.

I am not sure if but if fresh replica added, we may need an optimized way of copying data from healthy replicas to this newly added replica by avoiding some of the checks.

len, (void **)&chunk_io);
if (!count)
goto exit_with_error;
if ((zv->zv_status & ZVOL_REBUILDING_IN_PROGRESS) &&
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zv_status is a variable to maintain state not flags. For improving readability better to use "=="/"logical equal" to operator.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@mynktl
Copy link
Member Author

mynktl commented Mar 27, 2018

Hi @satbirchhikara
Current implementation of API doesn't support reading missing data from specified range of volume. I will add this support in API to get missing data from specified range only.

In case of fresh replica, if we allows write IO's in fresh replica then we need to check for overlaps in uzfs_write_data. This may consumes more times since we are allowing writes.
But if we don't allow write IO's in fresh replica then we can use snapshot from healthy replica and then using rebuild, we can get fresh replica to healthy state.

  Snapshot created with provided metadata will be removed if (offset + length) == (end of metadata)
- Replaced dsl_pool_hold with dmu_object_own to hold snapshot in uzfs_get_io_diff
  Since, it was causing deadlock
- Functionality test for partial read from uzfs_get_io_diff

Signed-off-by: mayank <mayank.patel@cloudbyte.com>
Copy link

@gila gila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good as usual

uint64_t rebuild_bytes;
avl_tree_t *incoming_io_tree; /* incoming io tree */
} zvol_rebuild_data_t;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this structure be removed? (there is just one number inside and stale comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out this.
zvol_rebuild_data_t was meant to hold all rebuild related variables. So better to keep as it is.

As of now, there are two variables we are using for rebuild purpose rebuild_bytes and zv_rebuild_status. I will move zv_rebuild_status to zvol_rebuild_data_t structure.

I will also remove stale comment.

last_lun_offset = lun_offset;
}
diff_count++;
} else if (!ret) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should never happen in our code path

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not required if the single write has single io number.
I will merge this case with on_disk_io_number > rebuild io number.

metadatasize;
ret = compare_blk_metadata(rd_metadata, metadata);
if (ret == -1) {
// old io number < new io number

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on_disk io number < incoming IO number

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. that makes sense. I will update comment.

ret = compare_blk_metadata(rd_metadata, metadata);
if (ret == -1) {
// old io number < new io number
if (diff_count == 0) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need of curly braces..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

* than w_metadata then that part of IO's will be discarded else it will be
* added to list.
*/
int uzfs_search_nonoverlapping_io(zvol_state_t *zv, uint64_t offset,
Copy link

@vishnuitta vishnuitta Mar 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uzfs_get_condensed_old_ondisk_blks..?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@@ -310,3 +308,52 @@ uzfs_zvol_get_rebuild_status(zvol_state_t *zv)
{
return (zv->zv_rebuild_status);
}

/*
* This assumes snapshot, so, no need of range lock

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't seems to be the case all the time

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. Actually, range lock for metadata is not needed since all the operation on metadata will go through uzfs_read_data and uzfs_write_data.
In uzfs_get_io_diff, we are calling uzfs_read_metadata but that is on snapshot so no need of range lock.
I will remove range lock for metadata.

uint64_t blocksize = zv->zv_volmetablocksize;
uint64_t len_in_first_aligned_block, bytes, read = 0;
uint64_t end = offset + len;
uint64_t metaobjectsize = P2ALIGN(zv->zv_volsize,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't seems to be correct.. we need to care of metadatasize also..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. I will update this.

bytes = metaobjectsize - offset;

ret = dmu_read(zv->zv_objset, ZVOL_META_OBJ, offset, bytes,
buf + read, 0);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also have a check that we are not writing more than allocated space?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think - not required.. this should be good


int
uzfs_get_io_diff(zvol_state_t *zv, blk_metadata_t *low,
uzfs_get_io_diff_cb_t *func, off_t zvol_offset, size_t zvol_len,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make zvol_offset, zvol_len as lun_offset, lun_len

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok.

}

strfree(snap_name);
ret = dmu_objset_own(dataset, DMU_OST_ANY, B_TRUE, snap_zv,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about using uzfs_open_dataset_init, which creates and initializes zv..?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. we can use it. I will try that and update changes.

if (end > metaobjectsize)
end = metaobjectsize;

ret = get_metadata_snapshot_info(zv, low, &snap_zv);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_snapshot_zv

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. I will make this function generic as get_snapshot_zv(zv, snap_name, snap_zv)

last_index = i;
}
diff_count++;
if (last_md != NULL &&

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check on last_md is not required. add some comments for this logic.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

/*
* API to read metadata
*/
extern int uzfs_read_metadata(void *zv, char *buf, uint64_t offset,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make zv as zvol_state_t

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as these APIs are being used in production also, having strict check on arguments will be good

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

typedef int (uzfs_txg_diff_traverse_cb_t)(off_t offset, size_t len,
uint64_t blkid, void *arg);
typedef int (uzfs_get_io_diff_cb_t)(off_t offset, size_t len,
blk_metadata_t *metadata, objset_t *obj, void *arg);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about passing zvol_state_t and using uzfs_read_data rather than dmu_read in callback implementation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link

@vishnuitta vishnuitta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall, looks good.
metaobjectsize issue in uzfs_read_metadata should have caught in unit testing code. probably, higher active data size may be required.

mayank and others added 2 commits April 2, 2018 18:27
- code changes according to review comments

Signed-off-by: mayank <mayank.patel@cloudbyte.com>
@@ -48,14 +49,9 @@

#define WRITE_METADATA(zv, metablk, mdata, tx) \
do { \
rl_t *mrl; \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will be good if offset and len are multiple of zv_metavolblocksize. And, uzfs also recevies IOs in same way from tgt.
So, please add VERIFY in read and write APIs for offset & len to be multiple of zv_metavolblocksize

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

dataset = kmem_asprintf("%s@%s", strchr(zv->zv_name, '/') + 1,
snap_name);

ret = uzfs_open_dataset(zv->zv_spa, dataset, snap_zv);
Copy link

@vishnuitta vishnuitta Apr 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uzfs_open_dataset calls uzfs_open_dataset_init.. please change uzfs_open_dataset_init as uzfs_own_dataset

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

// same as last one.
diff_count++;

if (compare_blk_metadata((blk_metadata_t *)
Copy link

@vishnuitta vishnuitta Apr 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

diff_count++ should go into 'else' case, and, last_lun_offset/last_md/last_index need to be set in 'if' after EXECUTE_... macro

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

lun_offset += zv->zv_metavolblocksize;
}

if (diff_count) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

&&!ret

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is not needed here

mayank added 3 commits April 3, 2018 12:48
- changes according to review comments
- uzfs_open_dataset_init changed to uzfs_own_dataset for better
  readability
Signed-off-by: mayank <mayank.patel@cloudbyte.com>
- setting metavolblocksize to 512 bytes when creating volume through CLI

Signed-off-by: mayank <mayank.patel@cloudbyte.com>
Copy link

@vishnuitta vishnuitta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

@vishnuitta
Copy link

Due to conflicts on this PR, same changes raised in PR31. Hence, closing this.

@vishnuitta vishnuitta closed this Apr 5, 2018
@vishnuitta vishnuitta reopened this Apr 5, 2018
@vishnuitta
Copy link

Due to conflicts on this PR, same changes raised in PR31. Hence, closing this.

@vishnuitta vishnuitta closed this Apr 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants