Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow ByteBuffer.position() to be used for the data length. #1906

Merged
merged 10 commits into from
Nov 18, 2021

Conversation

carloseltuerto
Copy link
Contributor

@carloseltuerto carloseltuerto commented Nov 2, 2021

Today, when sending the request body, the provided ByteBuffer is read entirely - its internal pointers such as the position and the limit are ignored. For the Cronvoy implementation, this is unfortunate. The lifecycle of the ByteBuffers used to send data are fully controlled by Cronet. Therefore, Cronet creates a ByteBuffer without knowing how many bytes the user will put in it. This can be made to work, but before sending the data, a new ByteBuffer needs to be created with the actual size that was put in the original ByteBuffer.

This PR attempts to solve this. By default, and this won't change, ByteBuffer.capacity() is used to determine the length when invoking EnvoyHTTPStream.sendData(). When configuring a new Stream, the user can now specify what should be used for the "length". StreamPrototype.setUseByteBufferPosition is the new method allowing to rather use the ByteBuffer.position() as the length. Then, for Cronet, no need to copy the ByteBuffer with this strategy.

Technical note: the general expectation with a ByteBuffer is to rely on its limit and its position when accessing the data. Relying solely on its capacity is somewhat unorthodox. The proposed implementation with this PR is a bit more in line with the ByteBuffer intended usage, but is not "by the book" either: Normally, when reading, the ByteBuffer must be "flipped", and then each time bytes are consumed, the position must be increased accordingly. This implementation leaves the ByteBuffer untouched when sending. Trying to perfectly mimic this behavior might be tricky: the C++ layer would need to update the ByteBuffer's position once some data has been consumed. For Cronet, this lack of conformity is not concerning: the ByteBuffer lifecycle is fully controlled by Cronet, not by the user - by contract, the user only puts the data it wants to send in the Cronet provided ByteBuffer, strictly when asks to do so (through callback), nothing else.

Description: Add a mechanism that avoids Cronet to copy ByteBuffers when sending the request body.
Risk Level: Small: the business logic is the same when the feature is not used, which is the default.
Testing: CI, Unit Test added
Docs Changes: N/A
Release Notes: N/A
Signed-off-by: Charles Le Borgne cleborgne@google.com

Signed-off-by: Charles Le Borgne <cleborgne@google.com>
Signed-off-by: Charles Le Borgne <cleborgne@google.com>
Copy link
Contributor

@goaway goaway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. I'd even be fine with dropping the option to use capacity and just use position by default, but it doesn't hurt to have the option to choose. Thanks @carloseltuerto!

@goaway goaway merged commit ab51d68 into envoyproxy:main Nov 18, 2021
@goaway
Copy link
Contributor

goaway commented Nov 18, 2021

Note: we actually do have a hook that we could use to "flip" the Java byte buffer - the C++ buffer API, will "drain" the buffer when it's done processing the data, and provides a callback we've already wired up to the bridge layer. Forwarding it up to Java to flip the passed ByteBuffer should be quite doable.

@@ -109,6 +113,11 @@ jobject native_map_to_map(JNIEnv* env, envoy_map map) {
}

envoy_data buffer_to_native_data(JNIEnv* env, jobject j_data) {
size_t data_length = static_cast<size_t>(env->GetDirectBufferCapacity(j_data));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carloseltuerto I believe we need to fix this. I am seeing malloc aborts thanks to data_length being obscenely high. For instance on a debug branch of the Lyft app I am seeing cases where the returned value here is 18446744073709551615

cc @goaway

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need better testing in these utility functions. They influence a lot of other behavior. So even if the larger feature in this PR was opt in, changing the utility functions in the bridge layer has a wider impact.

Copy link
Contributor

@Reflejo Reflejo Nov 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So stating the obvious here but 18446744073709551615 is 2^64-1 which is -1. I think the bug here is that for some reason GetDirectBufferCapacity is returning -1? And one difference is we are now static casting to size_t. Only reason I can think that could happen is either j_data is not an initialized buffer or the JVM on that case for some reason doesn't support JNI access to buffers? The latter seems strange. Just theorizing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So stating the obvious here but 18446744073709551615 is 2^64-1 which is -1

Wasn't so obvious to me when I was looking at this over the weekend 🤦. Thanks for having sharper eyes than mine @Reflejo.

@carloseltuerto we synced internally and @goaway is going to own helping with this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm betting this is happening when the buffer is not a direct buffer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure I see the issue, will post a fix shortly.

goaway added a commit that referenced this pull request Nov 30, 2021
Description: #1906 introduced path whereby the check for whether a buffer is direct was skipped, then treated the -1 value returned by the capacity check as an unsigned size for allocation. This fix ensures the check always happens.
Risk Level: Low
Testing: Kotlin integration tests need to be re-enabled.

Signed-off-by: Mike Schore <mike.schore@gmail.com>
junr03 pushed a commit that referenced this pull request Nov 30, 2021
junr03 pushed a commit that referenced this pull request Nov 30, 2021
…1906)"

This reverts commit ab51d68.

Signed-off-by: Jose Nino <jnino@lyft.com>
junr03 pushed a commit that referenced this pull request Nov 30, 2021
…ength. (#1906)""

This reverts commit 8da335b.

Signed-off-by: Jose Nino <jnino@lyft.com>
@carloseltuerto carloseltuerto deleted the cronvoy045 branch February 15, 2022 19:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants