Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add schema to allow further controls of tictacaae #46

Open
wants to merge 1 commit into
base: openriak-3.2
Choose a base branch
from

Conversation

martinsumner
Copy link

This allows:

  • setting of log_level for AAE Backend;
  • setting of log_level for TictacAAE overall (hidden);
  • setting of compression method (support switch to better performing zstd).

Previously all AAE backends had the database ID 2 ^16 - 1. They now have an ID 2 ^ 16 + Partition Number e.g. with a ring size of 1024 the partition numbers will be 0 to 1023). This allow leveled logs in parallel mode AAE to be distinguished from leveled vnode logs, and also distinguished between vnodes.

This allows:

- setting of log_level for AAE Backend;
- setting of log_level for TictacAAE overall (hidden);
- setting of compression method (support switch to better performing zstd).

Previously all AAE backends had the database ID 2 ^16 - 1.  They now have an ID 2 ^ 16 + Partition Number e.g. with a ring size of 1024 the partition numbers will be 0 to 1023).  This allow leveled logs in parallel mode AAE to be distinguished from leveled vnode logs, and also distinguished between vnodes.
@martinsumner
Copy link
Author

As part of the change the rebuild wait/delay defaults have also been increased.

There was never any science behind this initial setting. The purpose is risk mitigation, for an incredibly unlikely series of events, that might occur strung out over a long period of time (i.e. data is silently corrupted on all 3 nodes without detection, despite CRC checks, and without the data being read at any time between the first and the third corruption). On production systems, even with ephemeral disks, the frequency of data corruption is incredibly rare. So rebuilding all trees every month to protect against this is just unnecessary work.

Rebuilding every 3 months seems more proportionate to the risks. The operator still has the option of choosing more aggressive defaults should they see increased corruption within their environment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants