[Heartbeat] Fix excessive memory usage when parsing bodies #15639

andrewvc · 2020-01-17T15:17:13Z

In 7.5 we introduced a regression where we would allocate a 100MiB buffer per HTTP request. This change uses a growable byte buffer instead.

I'm actually a bit surprised this caused a noticeable increase in heap usage, since I'd have assumed that the GC would recover the 100MiB buffer almost instantaneously, but that's clearly incorrect.

In my testing of various configs I saw a dramatic drop, with memory not exceeding 50MiB where before it would be consistently > than 500MiB

In 7.5 we introduced a regression where we would allocate a 100MiB buffer per HTTP request. This change uses a growable byte buffer instead.

elasticmachine · 2020-01-17T15:17:16Z

Pinging @elastic/uptime (:uptime)

shahzad31 · 2020-01-17T15:32:18Z

heartbeat/monitors/active/http/respbody.go

+	prefixBytes := prefixBuf.Bytes()
+	if utf8.Valid(prefixBytes) {
+		prefix = string(prefixBytes)
 	}


Wouldn't it make more sense to validate utf in for loop , chunk by chunk? and even if one chunk is invalid , break the loop and discard it. That way, it won't have to read remaining body.

I'm pretty sure that would break on multi-byte char boundaries. Thinking about it, even the current code is probably broken on multi-byte unicode > 100MiB where the 100MiB limit falls in the middle of a char.

Looking into it, our unicode detection does nothing here, I've removed it in this PR to keep things simple. I've opened #15647 to follow-up on this.

urso · 2020-01-17T17:26:46Z

heartbeat/monitors/active/http/respbody.go

-	prefixBuf := make([]byte, maxPrefixSize)
-	prefixRemainingBytes := maxPrefixSize
-	prefixWriteOffset := 0
+	var prefixBuf bytes.Buffer


tip: consider strings.Builder. strings.Builder also implements an io.Reader. But when calling (bytes.Buffer).String() you create a copy, while (strings.Builder).String() returns you the buffer as is (no extra copy).

Thanks! Switched to a strings.Builder

urso · 2020-01-17T17:32:48Z

heartbeat/monitors/active/http/respbody.go

@@ -128,9 +123,5 @@ func readPrefixAndHash(body io.ReadCloser, maxPrefixSize int) (respSize int, pre
 		}
 	}


I wonder if the whole loop can be replaced using the io package only. e.g.

hash := sha256.New() var prefixBuf strings.Builder n, err := io.Copy(&prefixBuf, io.TeeReader(io.LimitReader(body, maxPrefixSize), hash))

This reads up to maxPrefixSize bytes from the body and writes every single byte to hash and prefixBuf.

Ooooooh, that's nice, will give it a shot.

I'm actually wondering now, the only reason we do the prefix reading in the first place is to give something for the response checkers to read. The problem before was that if you used a JSON checker and a grep checker, they'd share the same IO, so one would use up the stream before the other could get to work.

Maybe we should just use the tee reader in the first place to split the stream between all the users, then we can always read the full body and not worry about reading some fixed quantity.

So, I think that's a bit out of scope here. I'd like to keep the code as-is.

I tried to get things cleaner with the TeeReader, but that really requires us to have more of a limitWriter than a limitReader. The code sample you posted unfortunately only hashes the prefix, whereas the current code hashes the full body.

For the hash it should be enough to add another io.Copy(body, hash) if the first io.Copy did not reach EOF. It's still the same reader, and readers are stateful. The second copy should continue appending to the hash from the last known position. Like:

hash := sha256.New() var prefixBuf strings.Builder n, err := io.Copy(&prefixBuf, io.TeeReader(io.LimitReader(body, maxPrefixSize), hash)) if err == nil { // finish streaming into hash if the body has not been fully consumed yet var m int m, err = io.Copy(hash, body) n += m }

Alternatively we could introduce a LimitedWriter and a TeeWriter (as you've already noticed). They don't exist in our code base, so we would need to implement these ourselves :(. Then the operation would become:

io.Copy(TeeWriter(LimitedWriter(prefixBuf, ...), hash), body)

The idea is similar to my first sample code. Let's try to reduce code handling buffers, indices, offsets and size limits in loops by replacing them using more declarative (reusable?) reader/writer constructors.

Unfortunately parsers like JSON or regexp support are pull based. Meaning they require an io.Reader in order to "pull" new bytes. Constructing pipelines using a mix of pull and push (reader and writer) can still be somewhat tricky.

At least for JSON I have an alternative. In Beats we ues go-structform for spooling to disk and in our outputs. Interestingly the JSON parser has a push based interface: https://github.com/elastic/go-structform/blob/master/json/parse.go#L154

I've fixed the code to use io.Copy, much cleaner, thanks!

Thinking about it, I think I'm fine with the buffer approach for now. Chances are that users dealing with large files don't want to use either JSON or regexp matching. I think I'm fine waiting for that feature request to come in.

andrewvc · 2020-01-21T17:43:31Z

@urso if you're able to leave a final review giving this an LGTM (or not!) I think it's ready.

urso · 2020-01-21T17:55:16Z

heartbeat/monitors/active/http/respbody.go

-	// We discard the body if it is not valid UTF-8
-	if utf8.Valid(prefixBuf[:prefixWriteOffset]) {
-		prefix = string(prefixBuf[:prefixWriteOffset])
+	if err != nil {


Change in semantics. The original code used to break if err == io.EOF.

This should fix it:
if err != nil && err != io.EOF {

Good catch, fixed and pushed. I think there are no other outstanding issues.

…y-usage

…5639) * [Heartbeat] Fix excessive memory usage when parsing bodies In 7.5 we introduced a regression where we would allocate a 100MiB buffer per HTTP request. This change fixes that to stream data instead. (cherry picked from commit 080dedb)

…15763) * [Heartbeat] Fix excessive memory usage when parsing bodies In 7.5 we introduced a regression where we would allocate a 100MiB buffer per HTTP request. This change fixes that to stream data instead. (cherry picked from commit 080dedb)

…15762) * [Heartbeat] Fix excessive memory usage when parsing bodies In 7.5 we introduced a regression where we would allocate a 100MiB buffer per HTTP request. This change fixes that to stream data instead. (cherry picked from commit 080dedb)

[Heartbeat] Fix excessive memory usage when parsing bodies

f5b2816

In 7.5 we introduced a regression where we would allocate a 100MiB buffer per HTTP request. This change uses a growable byte buffer instead.

andrewvc added bug Heartbeat Team:obs-ds-hosted-services Label for the Observability Hosted Services team v7.6.0 labels Jan 17, 2020

andrewvc requested a review from a team as a code owner January 17, 2020 15:17

andrewvc self-assigned this Jan 17, 2020

More precise byte accounting

50e4218

shahzad31 reviewed Jan 17, 2020

View reviewed changes

Stop checking for UTF-8 in body

7f044df

urso reviewed Jan 17, 2020

View reviewed changes

Switch string builder

cae4335

urso reviewed Jan 17, 2020

View reviewed changes

andrewvc added 2 commits January 17, 2020 13:55

gofmt

03c5110

Use io.Copy

16af47d

urso reviewed Jan 21, 2020

View reviewed changes

andrewvc added 5 commits January 21, 2020 22:45

Properly handle EOF

2ee596b

Add changelog entry

eb65da3

Fix changelog

a820f1d

Merge remote-tracking branch 'origin/master' into fix-excessive-memor…

2de86c8

…y-usage

Fix changelog merge

0b6a5af

urso approved these changes Jan 22, 2020

View reviewed changes

andrewvc merged commit 080dedb into elastic:master Jan 22, 2020

andrewvc deleted the fix-excessive-memory-usage branch January 23, 2020 13:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Heartbeat] Fix excessive memory usage when parsing bodies #15639

[Heartbeat] Fix excessive memory usage when parsing bodies #15639

andrewvc commented Jan 17, 2020 •

edited

Loading

elasticmachine commented Jan 17, 2020

shahzad31 Jan 17, 2020 •

edited

Loading

andrewvc Jan 17, 2020

urso Jan 17, 2020

andrewvc Jan 17, 2020

urso Jan 17, 2020 •

edited

Loading

andrewvc Jan 17, 2020

andrewvc Jan 17, 2020

andrewvc Jan 17, 2020

urso Jan 18, 2020

andrewvc Jan 21, 2020

andrewvc commented Jan 21, 2020

urso Jan 21, 2020

andrewvc Jan 22, 2020

		@@ -128,9 +123,5 @@ func readPrefixAndHash(body io.ReadCloser, maxPrefixSize int) (respSize int, pre
		}
		}

[Heartbeat] Fix excessive memory usage when parsing bodies #15639

[Heartbeat] Fix excessive memory usage when parsing bodies #15639

Conversation

andrewvc commented Jan 17, 2020 • edited Loading

elasticmachine commented Jan 17, 2020

shahzad31 Jan 17, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

urso Jan 17, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andrewvc commented Jan 21, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andrewvc commented Jan 17, 2020 •

edited

Loading

shahzad31 Jan 17, 2020 •

edited

Loading

urso Jan 17, 2020 •

edited

Loading