Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce memory pressure when sending large terms queries. #21776

Merged
merged 2 commits into from
Nov 30, 2016

Conversation

jpountz
Copy link
Contributor

@jpountz jpountz commented Nov 24, 2016

When users send large terms query to Elasticsearch, every value is stored in
an object. This change does not reduce the amount of created objects, but makes
sure these objects die young by optimizing the list storage in case all values
are either non-null instances of Long objects or BytesRef objects, which seems
to help the JVM significantly.

@jpountz jpountz added :Core/Infra/Core Core issues without another label >enhancement v5.2.0 labels Nov 24, 2016
@jpountz
Copy link
Contributor Author

jpountz commented Nov 24, 2016

Here are two charts showing GC activity over a 8-minutes period before and after the change, when running a query that includes many parts, but in particular a terms query over ~32k longs. Both charts have been created under similar conditions. In the first case (master), major GCs are more frequent and minor GCs often take about 100ms while in the 2nd case (this PR), most of them run in less than 20ms.

baseline
patch

@jpountz jpountz force-pushed the less_garbage branch 3 times, most recently from 48ca6b6 to bd02197 Compare November 29, 2016 14:02
…rch.

When users send large `terms` query to Elasticsearch, every value is stored in
an object. This change does not reduce the amount of created objects, but makes
sure these objects die young by optimizing the list storage in case all values
are either non-null instances of Long objects or BytesRef objects, which seems
to help the JVM significantly.
@jpountz
Copy link
Contributor Author

jpountz commented Nov 29, 2016

Following @s1monw 's advice, I tried another approach that does the same thing on top of the Stream(In/Out)put layer and results look even better for a similar load:
screen

@jpountz
Copy link
Contributor Author

jpountz commented Nov 29, 2016

@s1monw could you have a look?

Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some suggestions but this looks way more contained... I like the change since it also applied to the values read from XContent on the coordinating node not just to the ones written via node to node communication. I think that might be the improvements we see?

}

@Override
public Object remove(int i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it should be mutable?

if (o instanceof BytesRef) {
b = (BytesRef) o;
} else {
builder.copyChars((String) o);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just for safety call o.toString() instead of the cast?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My reasoning was that it was better to get an exception rather than generate weird terms if something else than a string or a bytesref would end up here, but I don't mind going with toString.

@@ -185,43 +192,108 @@ public String fieldName() {
}

public List<Object> values() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we might be able to make this pkg private - it's only used for testing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it needs to remain public since it is part of the public API of this class?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair enough not sure anybody needs to access this list :)

private static final Set<Class<?>> STRING_TYPES = new HashSet<>(
Arrays.asList(BytesRef.class, String.class));

private static List<?> convert(Iterable<?> values) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we get some java docs on this just to make sure we don't loose the info why we did all this?

@@ -159,7 +166,7 @@ public TermsQueryBuilder(String fieldName, Iterable<?> values) {
throw new IllegalArgumentException("No value specified for terms query");
}
this.fieldName = fieldName;
this.values = convertToBytesRefListIfStringList(values);
this.values = convert(values);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add some dedicated tests to TermsQueryBuilderTest that stresses this entire convertion a bit? also with mixed value lists like floats / longs etc mixed

Copy link
Contributor Author

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for having a look. I pushed a new commit.

@@ -185,43 +192,108 @@ public String fieldName() {
}

public List<Object> values() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it needs to remain public since it is part of the public API of this class?

if (o instanceof BytesRef) {
b = (BytesRef) o;
} else {
builder.copyChars((String) o);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My reasoning was that it was better to get an exception rather than generate weird terms if something else than a string or a bytesref would end up here, but I don't mind going with toString.

@jpountz
Copy link
Contributor Author

jpountz commented Nov 30, 2016

I like the change since it also applied to the values read from XContent on the coordinating node not just to the ones written via node to node communication

Actually the previous change also applied to xcontent parsing so I am not totally sure how to explain this improvement. Maybe the fact that all Long objects become unreachable at once rather than one by one, not sure.

Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -185,43 +192,108 @@ public String fieldName() {
}

public List<Object> values() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair enough not sure anybody needs to access this list :)

@jpountz jpountz merged commit a3ef674 into elastic:master Nov 30, 2016
@jpountz jpountz deleted the less_garbage branch November 30, 2016 12:35
jpountz added a commit that referenced this pull request Nov 30, 2016
When users send large `terms` query to Elasticsearch, every value is stored in
an object. This change does not reduce the amount of created objects, but makes
sure these objects die young by optimizing the list storage in case all values
are either non-null instances of Long objects or BytesRef objects, which seems
to help the JVM significantly.
jpountz added a commit that referenced this pull request Dec 2, 2016
When users send large `terms` query to Elasticsearch, every value is stored in
an object. This change does not reduce the amount of created objects, but makes
sure these objects die young by optimizing the list storage in case all values
are either non-null instances of Long objects or BytesRef objects, which seems
to help the JVM significantly.
@jpountz jpountz added the v5.1.1 label Dec 2, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants