-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce memory pressure when sending large terms queries. #21776
Conversation
Here are two charts showing GC activity over a 8-minutes period before and after the change, when running a query that includes many parts, but in particular a terms query over ~32k longs. Both charts have been created under similar conditions. In the first case (master), major GCs are more frequent and minor GCs often take about 100ms while in the 2nd case (this PR), most of them run in less than 20ms. |
48ca6b6
to
bd02197
Compare
…rch. When users send large `terms` query to Elasticsearch, every value is stored in an object. This change does not reduce the amount of created objects, but makes sure these objects die young by optimizing the list storage in case all values are either non-null instances of Long objects or BytesRef objects, which seems to help the JVM significantly.
bd02197
to
985ec64
Compare
Following @s1monw 's advice, I tried another approach that does the same thing on top of the Stream(In/Out)put layer and results look even better for a similar load: |
@s1monw could you have a look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some suggestions but this looks way more contained... I like the change since it also applied to the values read from XContent on the coordinating node not just to the ones written via node to node communication. I think that might be the improvements we see?
} | ||
|
||
@Override | ||
public Object remove(int i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it should be mutable?
if (o instanceof BytesRef) { | ||
b = (BytesRef) o; | ||
} else { | ||
builder.copyChars((String) o); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we just for safety call o.toString()
instead of the cast?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My reasoning was that it was better to get an exception rather than generate weird terms if something else than a string or a bytesref would end up here, but I don't mind going with toString.
@@ -185,43 +192,108 @@ public String fieldName() { | |||
} | |||
|
|||
public List<Object> values() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we might be able to make this pkg private - it's only used for testing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it needs to remain public since it is part of the public API of this class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair enough not sure anybody needs to access this list :)
private static final Set<Class<?>> STRING_TYPES = new HashSet<>( | ||
Arrays.asList(BytesRef.class, String.class)); | ||
|
||
private static List<?> convert(Iterable<?> values) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we get some java docs on this just to make sure we don't loose the info why we did all this?
@@ -159,7 +166,7 @@ public TermsQueryBuilder(String fieldName, Iterable<?> values) { | |||
throw new IllegalArgumentException("No value specified for terms query"); | |||
} | |||
this.fieldName = fieldName; | |||
this.values = convertToBytesRefListIfStringList(values); | |||
this.values = convert(values); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add some dedicated tests to TermsQueryBuilderTest that stresses this entire convertion a bit? also with mixed value lists like floats / longs etc mixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for having a look. I pushed a new commit.
@@ -185,43 +192,108 @@ public String fieldName() { | |||
} | |||
|
|||
public List<Object> values() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it needs to remain public since it is part of the public API of this class?
if (o instanceof BytesRef) { | ||
b = (BytesRef) o; | ||
} else { | ||
builder.copyChars((String) o); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My reasoning was that it was better to get an exception rather than generate weird terms if something else than a string or a bytesref would end up here, but I don't mind going with toString.
Actually the previous change also applied to xcontent parsing so I am not totally sure how to explain this improvement. Maybe the fact that all Long objects become unreachable at once rather than one by one, not sure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -185,43 +192,108 @@ public String fieldName() { | |||
} | |||
|
|||
public List<Object> values() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair enough not sure anybody needs to access this list :)
When users send large `terms` query to Elasticsearch, every value is stored in an object. This change does not reduce the amount of created objects, but makes sure these objects die young by optimizing the list storage in case all values are either non-null instances of Long objects or BytesRef objects, which seems to help the JVM significantly.
When users send large `terms` query to Elasticsearch, every value is stored in an object. This change does not reduce the amount of created objects, but makes sure these objects die young by optimizing the list storage in case all values are either non-null instances of Long objects or BytesRef objects, which seems to help the JVM significantly.
When users send large
terms
query to Elasticsearch, every value is stored inan object. This change does not reduce the amount of created objects, but makes
sure these objects die young by optimizing the list storage in case all values
are either non-null instances of Long objects or BytesRef objects, which seems
to help the JVM significantly.