-
Notifications
You must be signed in to change notification settings - Fork 59
fix: message body size unset after parsed which leads to large io throughputs #1008
Conversation
Better to check that there is no extra buffer space after message_ex been constructed, i.e. at the end of parse_request_body_v0/v1, check msg->header->body_length is equal to msg->buffers[1].length() |
I raise a new issue to describe this bug:apache/incubator-pegasus#866. Could you please fulfill it according to your pull request? |
done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@levy5307 need re-check it
Could you add a unit test for this bug? |
|
I meant to add a new test case, but then I found an old case that already cover this scenario. For some unknown reason, the author truncated the buffer before next message was parsed. I add a new line to check if the buffer is consumed. Before this patch this case failed on the assert, but it passed after this patch. So I believed it works. |
Do you mean: The request size task piled up queue as follow: Then mutation log flush size will: 1KB、1K+1KB、1KB+1KB+1KB、1KB+1KB+1KB+1KB....... |
Yes. Actually I printed the buffer size out and used awk & sort & uniq to analyze the distribution. The buffer size was up to 10000+, and all the number in a way conformed to arithmetic sequence. Not exactly, but followed some kinds of rules. I also wrote a script to analyze all the slog file, and It turned out that a certain key pair appears multiple times in adjacent blocks, but when i dumped by |
The following is a part of
|
Issue
see apache/incubator-pegasus#866
Description
We have noticed pegasus 2.0 are using much more network/disk bandwidths than pegasus 1.12.3. After weeks of debugging, we have finally found that this is a bug introduced in #255, when a new thrift message parser that compatible with both old format and new format is refactored to replace the old one. Now we are trying to fix this.
The main cause is that when parsing the message body, instead of return exactly size of the body, it actually return the whole buffer that holds all the received messages. Thus, when there are many write requests piled up, from the moment the write request arrives, to the moment it writes to mutation logs and to other nodes, the throughput is significantly amplified.
To fix that we merely add one line to set the message body size before calling create_message_from_request_blob().
We also noticed that when a message header is found invalid, the bad message will not be consumed and discarded, instead it stays in the buffer forever. We think it is a bug. To fix this bug, we consume the buffer before calling create_message_from_request_blob().