Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify MultiSnapshot#SeqNoset #27547

Merged
merged 8 commits into from
Dec 3, 2017
Merged

Simplify MultiSnapshot#SeqNoset #27547

merged 8 commits into from
Dec 3, 2017

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Nov 27, 2017

Today, we maintain two sets in a SeqNoSet: ongoing sets and completed sets. We can remove the completed sets and use only the ongoing sets by releasing the internal bitset of a CountedBitSet when all its bits are set. This behaves like two sets but simpler. This commit also makes CountedBitSet as a drop-in replacement for BitSet.

Relates #27268

Today, we maintain two sets in a SeqNoSet: ongoing sets and completed
sets. We can remove the completed sets by releasing the internal bitset
of a CountedBitSet when all its bits are set.

Relates elastic#27268
Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The basics look good. Left one comment suggestion a slight variant on the approach

@@ -99,60 +100,62 @@ public void close() throws IOException {

boolean getAndSet(int index) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need the getAndSet semantics any more. We can just have set and if people want to pre-get, they can. This way this can inherit from BitSet and be used as a drop-in replacement for it (although we can implement the clear() other methods, I think it's fine to throw an UnsupportedOperationException until we need them)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This way this can inherit from BitSet and be used as a drop-in replacement for it.

I agree. I move this class to the common package and implement methods from BitSet.

I don't think we need the getAndSet semantics any more. We can just have set and if people want to pre-get, they can.

The method getAndSet is optimized to avoid looking an entry twice, but I am fine to remove it.

@bleskes
Copy link
Contributor

bleskes commented Nov 30, 2017 via email

@dnhatn
Copy link
Member Author

dnhatn commented Nov 30, 2017

@bleskes, I have addressed your suggestion. Could you please take another look? Thank you.

@dnhatn dnhatn changed the title Simplify MultiSnapshot#SeqNoSet Move CountedBitSet to the common package Nov 30, 2017
Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a question.

* when all bits are set to reduce memory usage. This structure can work well for sequence numbers
* from translog as these numbers are likely to form contiguous ranges (eg. filling all bits).
*/
public final class CountedBitSet extends BitSet {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only has one consumer, does it really need to be in a common package? What's the motivation for moving this out of the translog package?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Boaz suggested to make it as a drop-in replacement for BitSet. I thought about the LocalCheckpointTracker and moved it to the common package. I am happy to move it back to the translog package.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, the description of the PR is "Move CountedBitSet to the common package" yet it appears the changes are larger than that; it's best keep the summary and the changes aligned. Also, I think moving and changing the implementation should be separate PRs and I think the reason for the move should be explained in the commit message.

With that out of the way: if the intention is to reuse this in the local checkpoint tracker I still do not think these needs to be in common, it can be in the sequence number package org.elasticseach.index.seqno since the only uses relate to sequence numbers. It's a shame this class has to be public in that package even.

My resistance is that the common package is already a bit of a dumping ground and at least in the sequence number package it's slightly clearer than it's an internal implementation rather than for public consumption (e.g., in plugins). When we modularize (JDK modules) we will be in a better position to make clear what is part of the public API and what is not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, the description of the PR is "Move CountedBitSet to the common package" yet it appears the changes are larger than that; it's best keep the summary and the changes aligned.

My bad, I should have updated the description more carefully when addressing feedbacks.

Also, I think moving and changing the implementation should be separate PRs

I agree. I will move this class back to the translog package in this PR. If we agree to use it in the LocalCheckpointTracker, we can make another PR to move it the seqno package later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed 0df0474

Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a minor suggestion. LGTM.


@Override
public int length() {
return bitset == null ? onBits : bitset.length();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

smart

}
final int index = Math.toIntExact(value % BIT_SET_SIZE);
final boolean wasOn = bitset.get(index);
bitset.set(index);
return wasOn;
}

// For testing
long completeSetsSize() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we abstract the underlying storage away, I don't think we care about whether underlying sets are completed or not. All we care about is functionality. I think we can remove completedSetSize and ongoingSetSize and their tests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed 14aca62

final CountedBitSet countedBitSet = new CountedBitSet((short) numBits);

final int iterations = iterations(1000, 20000);
for (int i = 0; i < iterations; i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a great tests but I think we can simplify it by randomly setting keys and check at the end that the result is the same for all position (it will also show nothing change for keys after they were set).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed 0e6a31e

@bleskes
Copy link
Contributor

bleskes commented Dec 3, 2017

retest this please.

@bleskes
Copy link
Contributor

bleskes commented Dec 3, 2017

Test failure is unrelated. The last commits look good to me. Let's give @jasontedor some time to respond whether the expected usage in LocalCheckpointTracker is justifying putting this in common.

@jasontedor
Copy link
Member

jasontedor commented Dec 3, 2017

Let's give @jasontedor some time to respond whether the expected usage in LocalCheckpointTracker is justifying putting this in common.

Thanks @bleskes, I responded inline where the discussion is taking place.

@dnhatn dnhatn changed the title Move CountedBitSet to the common package Simplify MultiSnapshot#SeqNoset Dec 3, 2017
@dnhatn
Copy link
Member Author

dnhatn commented Dec 3, 2017

Thanks @bleskes and @jasontedor for the reviews.

@dnhatn dnhatn merged commit 49df50f into elastic:master Dec 3, 2017
@dnhatn dnhatn deleted the seqno-set branch December 3, 2017 20:21
Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few observations.

private FixedBitSet bitset;

public CountedBitSet(short numBits) {
assert numBits > 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be a hard illegal argument exception.


@Override
public long ramBytesUsed() {
throw new UnsupportedOperationException("Not implemented yet");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need an implementation for this method however I think this could be RamUsageEstimator.shallowSizeOfInstance(CountedBitSet.class) + (bitset == null ? 0 : bitset.ramBytesUsed());. You could even fold RamUsageEstimator.shallowSizeOfInstance(CountedBitSet.class) into a static final constant.


@Override
public void clear(int startIndex, int endIndex) {
throw new UnsupportedOperationException("Not implemented yet");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think "Not implemented yet" adds anything other the exception type (and could be misleading if we never intend to implement).

throw new UnsupportedOperationException("Not implemented yet");
}

// Exposed for testing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally think these comments do not add anything over the fact that we can use the IDE to see where a method is used. If it used in tests and only tests then we already know it is exposed for tests.

private FixedBitSet bitset;

CountedBitSet(short numBits) {
assert numBits > 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a hard illegal argument exception?

public boolean get(int index) {
assert 0 <= index && index < this.length();
assert bitset == null || onBits < bitset.length() : "Bitset should be released when all bits are set";

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This empty line can go.


// Ignore set when bitset is full.
if (bitset != null) {
boolean wasOn = bitset.getAndSet(index);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can (and should) be final.

@dnhatn
Copy link
Member Author

dnhatn commented Dec 3, 2017

Sorry @jasontedor, I missed your comments.

dnhatn added a commit to dnhatn/elasticsearch that referenced this pull request Dec 3, 2017
This commit addresses the missed comments from elastic#27547.
dnhatn added a commit that referenced this pull request Dec 3, 2017
Today, we maintain two sets in a SeqNoSet: ongoing sets and completed
sets. We can remove the completed sets and use only the ongoing sets by
releasing the internal bitset of a CountedBitSet when all its bits are
set. This behaves like two sets but simpler. This commit also makes
CountedBitSet as a drop-in replacement for BitSet.

Relates #27268
dnhatn added a commit that referenced this pull request Dec 4, 2017
This commit addresses the missed comments from #27547.
dnhatn added a commit that referenced this pull request Dec 4, 2017
This commit addresses the missed comments from #27547.
@clintongormley clintongormley added :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. and removed :Translog :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. labels Feb 13, 2018
@jimczi jimczi added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Engine Anything around managing Lucene and the Translog in an open shard. >enhancement v6.2.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants