Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow options to be passed to GlobalIndexUidAggregator #1091

Conversation

brianloss
Copy link
Contributor

Currently, the only way to change the max number of UIDs kept in the
GlobalIndexUidAggregator's protocol buffer is to change the code and
recompile. It would be nice to be able to configure this on the iterator
using the shell.

This changes modifies PropogatingIterator [sic] to support passing
options to the configured combiner(s).

Note that there is a side-effect of this change that stems from the
fact that the aggregators used by PropogatingIterator aren't really
used as Combiners (init is never called, for example) but rather as
the old removed Aggregator. The side effect is that if you wish to
configure the max UID list size on GlobalIndexUidAggregator then you
must also set the "all" option on GlobalIndexUidAggregator since it is
a Combiner (even though the all option won't actually be used, the
options validation requires that it or a specific column list be set).

Another consideration for anyone using this feature is the impact of
changing the max UID list size after data has been loaded. Decreasing
the max size will likely cause UID lists to be purged as they now
exceed the max UID count. Increasing also won't quite behave as you'd
expect for any lists that had already exceeded the previous max count
and had their lists cleared. Really the main use for this is when
setting up a new cluster.

Currently, the only way to change the max number of UIDs kept in the
GlobalIndexUidAggregator's protocol buffer is to change the code and
recompile. It would be nice to be able to configure this on the iterator
using the shell.

This changes modifies PropogatingIterator [sic] to support passing
options to the configured combiner(s).

Note that there is a side-effect of this change that stems from the
fact that the aggregators used by PropogatingIterator aren't really
used as Combiners (init is never called, for example) but rather as
the old removed Aggregator. The side effect is that if you wish to
configure the max UID list size on GlobalIndexUidAggregator then you
must also set the "all" option on GlobalIndexUidAggregator since it is
a Combiner (even though the all option won't actually be used, the
options validation requires that it or a specific column list be set).

Another consideration for anyone using this feature is the impact of
changing the max UID list size after data has been loaded. Decreasing
the max size will likely cause UID lists to be purged as they now
exceed the max UID count. Increasing also won't quite behave as you'd
expect for any lists that had already exceeded the previous max count
and had their lists cleared. Really the main use for this is when
setting up a new cluster.
keith-ratcliffe
keith-ratcliffe previously approved these changes Mar 4, 2021
@ivakegg
Copy link
Collaborator

ivakegg commented Mar 24, 2021

My gut reaction here is that it will become too easy to totally screw up an existing index on a system. I am wondering if we can update the aggregator to not be destructive: if there are already N UIDs in the list, and the configuration is M where M < N, then the UIDs will remain for that list.

@brianloss
Copy link
Contributor Author

My gut reaction here is that it will become too easy to totally screw up an existing index on a system. I am wondering if we can update the aggregator to not be destructive: if there are already N UIDs in the list, and the configuration is M where M < N, then the UIDs will remain for that list.

Sure, that could be done, but it would lead to some inconsistencies. Any new terms would "follow the rules" and their protocol buffers would stop keeping UIDs at the new max count whereas existing terms whose count were over the new max would have more UIDs. I suppose old terms whose count was below the new max would also behave like new terms.

Just trying to think of other options. What if you had another parameter that has to be set to true in order for the maxUIDs parameter to be honored? It's like a confirmation at that point, and not really different from "droptable shard". :) Of course there are plenty of ways to screw up a system if you don't know what you're doing. "deleterows -t shard -f" doesn't require confirmation...

@ivakegg ivakegg changed the base branch from master to feature/accumulo-2.0 July 15, 2021 19:06
@ivakegg ivakegg deleted the branch NationalSecurityAgency:integration January 31, 2024 18:03
@ivakegg ivakegg closed this Jan 31, 2024
@ivakegg ivakegg reopened this Feb 2, 2024
@ivakegg ivakegg changed the base branch from feature/accumulo-2.0 to integration February 2, 2024 15:19
@ivakegg ivakegg dismissed keith-ratcliffe’s stale review February 2, 2024 15:19

The base branch was changed.

@hlgp hlgp closed this Oct 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants