-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Significantly Lower Monitoring HttpExport Memory Footprint #48854
Significantly Lower Monitoring HttpExport Memory Footprint #48854
Conversation
The `HttpExportBulk` exporter is using a lot more memory than it needs to by allocating buffers for serialization and IO: * Remove copying of all bytes when flushing, instead use the stream wrapper * Remove copying step turning the BAOS into a `byte[]` * This also avoids the allocation of a single huge `byte[]` and instead makes use of the internal paging logic of the `BytesStreamOutput` * Don't allocate a new BAOS for every document, just keep appending to a single BAOS
Pinging @elastic/es-core-features (:Core/Features/Monitoring) |
|
||
return BytesReference.toBytes(out.bytes()); | ||
} catch (Exception e) { | ||
logger.warn((Supplier<?>) () -> new ParameterizedMessage("failed to render document [{}], skipping it [{}]", doc, name), e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is hard to model with streaming IO, but to me catch and appending an empty byte array seems bogus here and not worth retaining (there's no test covering this case and we should simply make sure out documents that we have full control over serialise shouldn't we?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple questions...
...onitoring/src/main/java/org/elasticsearch/xpack/monitoring/exporter/http/HttpExportBulk.java
Outdated
Show resolved
Hide resolved
...onitoring/src/main/java/org/elasticsearch/xpack/monitoring/exporter/http/HttpExportBulk.java
Show resolved
Hide resolved
...onitoring/src/main/java/org/elasticsearch/xpack/monitoring/exporter/http/HttpExportBulk.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM , thanks for addressing this.
npnp @jakelandis thanks for reviewing! |
…8854) The `HttpExportBulk` exporter is using a lot more memory than it needs to by allocating buffers for serialization and IO: * Remove copying of all bytes when flushing, instead use the stream wrapper * Remove copying step turning the BAOS into a `byte[]` * This also avoids the allocation of a single huge `byte[]` and instead makes use of the internal paging logic of the `BytesStreamOutput` * Don't allocate a new BAOS for every document, just keep appending to a single BAOS
…48966) The `HttpExportBulk` exporter is using a lot more memory than it needs to by allocating buffers for serialization and IO: * Remove copying of all bytes when flushing, instead use the stream wrapper * Remove copying step turning the BAOS into a `byte[]` * This also avoids the allocation of a single huge `byte[]` and instead makes use of the internal paging logic of the `BytesStreamOutput` * Don't allocate a new BAOS for every document, just keep appending to a single BAOS
Even with changes from elastic#48854 we're still seeing significant (as in tens and hundreds of MB) buffer usage for bulk exports in some cases which destabilizes master nodes. Since we need to know the serialized length of the bulk body we can't do the serialization in a streaming manner. (also it's not easily doable with the HTTP client API we're using anyway). => let's at least serialize on heap in compressed form and decompress as we're streaming to the http connection. For small requests this adds negligible overhead but for large requests this reduces the size of the payload field by about an order of magnitude (empirically determined) which is a massive reduction in size when considering O(100MB) bulk requests.
Even with changes from #48854 we're still seeing significant (as in tens and hundreds of MB) buffer usage for bulk exports in some cases which destabilizes master nodes. Since we need to know the serialized length of the bulk body we can't do the serialization in a streaming manner. (also it's not easily doable with the HTTP client API we're using anyway). => let's at least serialize on heap in compressed form and decompress as we're streaming to the HTTP connection. For small requests this adds negligible overhead but for large requests this reduces the size of the payload field by about an order of magnitude (empirically determined) which is a massive reduction in size when considering O(100MB) bulk requests.
Even with changes from elastic#48854 we're still seeing significant (as in tens and hundreds of MB) buffer usage for bulk exports in some cases which destabilizes master nodes. Since we need to know the serialized length of the bulk body we can't do the serialization in a streaming manner. (also it's not easily doable with the HTTP client API we're using anyway). => let's at least serialize on heap in compressed form and decompress as we're streaming to the HTTP connection. For small requests this adds negligible overhead but for large requests this reduces the size of the payload field by about an order of magnitude (empirically determined) which is a massive reduction in size when considering O(100MB) bulk requests.
Even with changes from #48854 we're still seeing significant (as in tens and hundreds of MB) buffer usage for bulk exports in some cases which destabilizes master nodes. Since we need to know the serialized length of the bulk body we can't do the serialization in a streaming manner. (also it's not easily doable with the HTTP client API we're using anyway). => let's at least serialize on heap in compressed form and decompress as we're streaming to the HTTP connection. For small requests this adds negligible overhead but for large requests this reduces the size of the payload field by about an order of magnitude (empirically determined) which is a massive reduction in size when considering O(100MB) bulk requests.
The
HttpExportBulk
exporter is using a lot more memory than it needs toby allocating buffers for serialization and IO:
byte[]
byte[]
and instead makes use of the internal paging logic of theBytesStreamOutput