Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When triggering tree repairs - repair each vnode once #26

Merged
merged 1 commit into from
Mar 28, 2024

Conversation

martinsumner
Copy link

Each vnode keeps track of the ID if the last repair it triggered. If that ID changes, it assumes it has been triggered again and will prompt repair (as long as it is not the filter vnode on the query).

This means that for each call to riak_kv_ttaaefs_manager:trigger_tree_repairs/0, each vnode should only repair the tree once.

If any faulty segment is not repaired - the next time this node performs a full-sync, the repair will be re-triggered, and each vnode should repair once (and once only) again.

Note with N nodes in M clusters, when there is a faulty segment there will be N X M fetch_clocks_nval queries for every local full-sync event (and hence trigger of read repairs). Should sync_state become true after a locally-prompted full-sync, repairs will be disabled.

To force repairs as an operator - call riak_kv_ttaaefs_manager:trigger_tree_repairs() from remote_console. This will force a read repair only once for each vnode (unless it is disabled by a local full-sync where sync_state=true). Do NOT set the riak_kv/aae_fetchclocks_repair environment variable directly.

The additional environment variable riak_kv/aae_fetchclocks_repair_force is no longer used - a sync_state=true full-sync will always disable local tree repair

Each vnode keeps track of the ID if the last repair it triggered.  If that ID changes, it assumes it has been triggered again and will prompt repair (as long as it is not the filter vnode on the query).

This means that for each call to riak_kv_ttaaefs_manager:trigger_tree_repairs/0, each vnode should only repair the tree once.

If any faulty segment is not repaired - the next time this node performs a full-sync, the repair will be re-triggered, and each vnode should repair once (and once only again).

Note with N nodes in M clusters, when there is a faulty segment there will be N X M fetch_clocks_nval queries for every local full-sync event (and hence trigger of read repairs).  Should sync_state become true, repairs will be disabled.

to force repairs as an operator - call riak_kv_ttaaefs_manager:trigger_tree_repairs() from remote_console.  This will force a read repair  only once for each vnode (unless it is disabled by a local full-sync where sync_state=true).
@martinsumner
Copy link
Author

#25

@martinsumner
Copy link
Author

OpenRiak/riak_test#14

@martinsumner martinsumner merged commit daa619b into nhse-develop Mar 28, 2024
3 checks passed
@martinsumner martinsumner deleted the nhse-d32-nhskv.i25 branch March 28, 2024 15:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants