tls: improve write performance by reducing copying #14053

ggreenway · 2020-11-17T01:00:36Z

Signed-off-by: Greg Greenway ggreenway@apple.com

Commit Message: This change tries to heuristically decide what block size to
write. 16kb blocks are most efficient in the TLS layer, but there were
cases where Envoy would memcpy nearly all the data in order to create
blocks of this size. This change improves performance by sometimes
writing smaller blocks in order to reduce memcpy.

Additional Description:
Risk Level: Medium
Testing: Added UT, all existing tests pass
Docs Changes: none
Release Notes: not needed; no functional change
Platform Specific Features: none
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Deprecated:]

This change tries to heuristically decide what block size to write. 16kb blocks are most efficient in the TLS layer, but there were cases where Envoy would memcpy nearly all the data in order to create blocks of this size. This change improves performance by sometimes writing smaller blocks in order to reduce memcpy. Signed-off-by: Greg Greenway <ggreenway@apple.com>

lizan · 2020-11-17T01:41:25Z

source/common/buffer/buffer_impl.cc

@@ -266,6 +266,30 @@ void* OwnedImpl::linearize(uint32_t size) {
  return slices_.front()->data();
 }

+RawSlice OwnedImpl::maybeLinearize(uint32_t max_size, uint32_t desired_min_size) {


This implementation only look at the 1st and the 2nd slice in the buffer, would it makes more sense if we return linearized a group of whole slices, just before it exceed the max_size?

Possibly. This is very much a heuristic, and there are a lot of ways this could potentially be improved. This improved performance of the case I was benchmarking, and didn't show any degradation for low-throughput (smaller request/response) traffic patterns, and was pretty simple and easy to reason about and/or predict what it will do.

Yeah but I suggested the above because then we don't need the second parameter but keep pretty much same behavior for the rest.

Can you elaborate on what you mean by "a linearized group of whole slices"? I don't understand what you're suggesting.

like:

uint64_t size = 0; for (slice : slices_) { if (size + slice_->dataSize() > max_size) break; size += slice_->dataSize(); } return {linearize(size), size};

At some threshold the memcpy of linearization exceeds the overhead of the extra TLS record. For instance, if all slices were 16383 bytes (1 less than a full record), I don't believe it's faster to linearize everything to 16384; emitting slightly smaller records will be faster.

I made a wild guess at picking 25% of a record as the threshold at which we should definitely linearize, and perf tests indicated that helped. But maybe I need to come up with a small benchmark to try to quantify this relationship.

Right, but here we're comparing emitting many small TLS records vs combining many small buffers a single TLS record, not copying data between slices to completely fill records.

In the example I was considering, I was asking about slices in the 4-8kb range. In that range it's not clear to me whether the memcpy cost will be more than the overhead of generating a TLS record.

The difference between 2x 8KiB vs 1x 16KiB is probably negligible, but I'm pretty sure that memcpy overhead is smaller than writing additional TLS record(s) to the wire.

...but that's an educated guess, feel free to benchmark this (e.g. by comparing proxy throughput, not userland microbenchmarks).

PiotrSikora · 2020-11-17T22:55:25Z

source/extensions/transport_sockets/tls/context_impl.cc

@@ -84,6 +84,10 @@ ContextImpl::ContextImpl(Stats::Scope& scope, const Envoy::Ssl::ContextConfig& c
    int rc = SSL_CTX_set_app_data(ctx.ssl_ctx_.get(), this);
    RELEASE_ASSERT(rc == 1, Utility::getLastCryptoError().value_or(""));

+    constexpr uint32_t mode = SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER;
+    rc = SSL_CTX_set_mode(ctx.ssl_ctx_.get(), mode);


When is this needed? maybeLinearize returns either unmodified slice or linearized slice, but I don't think there is a case when unmodified slice would be later linearized (assuming desired_min_size <= max_size), so the buffer shouldn't move... or am I missing something?

It probably isn't strictly necessary, and with the current implementation I don't think the buffer will move. But the maybeLinearize interface doesn't guarantee this property, and I didn't want to keep the existing book-keeping to ensure the write buffer doesn't change. It doesn't look to me like anything is more expensive in boringSSL when this mode is set, it just removes a check that isn't gaining anything for how we use the API.

Sure, but if maybeLinearize implementation changes enough to require moving buffers, then we can enable this mode. Right now, it removes the default sanity check for no reason.

Huh, looking into this I realize I lost part of the change (I did a bunch of moving code between branches to chop an originally big change into manageable pieces). To simplify this code, I removed bytes_to_retry_. I subsequent call can then end up with a larger buffer from maybeLinearize if data was added to the buffer since the last attempt at SSL_write. That's why I was setting this option here.

I'm trying to remember why I made that change originally; I think it may have been to make it easier to read and reason about. I don't think it had a measurable performance impact. Any preference on whether I make that change or not?

I subsequent call can then end up with a larger buffer from maybeLinearize if data was added to the buffer since the last attempt at SSL_write. That's why I was setting this option here.

I don't believe that SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER allows for buffer to grow between retries. AFAIK, the buffer data has to stay the same, but it can be available at a different address than before.

The docs say it is allowed:

// In TLS, a non-blocking |SSL_write| differs from non-blocking |write| in that // a failed |SSL_write| still commits to the data passed in. When retrying, the // caller must supply the original write buffer (or a larger one containing the // original as a prefix). By default, retries will fail if they also do not // reuse the same |buf| pointer. This may be relaxed with // |SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER|, but the buffer contents still must be // unchanged.

The last sentence literally says the buffer contents still must be unchanged.

...but there is also (or a larger one containing the original as a prefix), hmm. Maybe it's allowed after all.

antoniovicente · 2020-11-18T23:27:37Z

source/common/buffer/buffer_impl.cc

+
+  // The next slice will already be of the desired size, so don't copy and
+  // return the front slice.
+  if (slices_.size() >= 2 && slices_[1]->dataSize() >= max_size) {


Worth considering a generalization of this logic so we refuse to copy if we ever encounter that the next slice is larger than some copy threshold?

That way a buffer containing: {1, 1, 1, 1, 1, 16kb} would only end up copying 5 bytes when called with parameters like: linearize(16kb, 4000)

antoniovicente · 2020-11-18T23:32:19Z

source/extensions/transport_sockets/tls/ssl_socket.cc

@@ -248,15 +248,17 @@ Network::IoResult SslSocket::doWrite(Buffer::Instance& write_buffer, bool end_st
  while (bytes_to_write > 0) {
    // TODO(mattklein123): As it relates to our fairness efforts, we might want to limit the number
    // of iterations of this loop, either by pure iterations, bytes written, etc.
+    const auto slice = write_buffer.maybeLinearize(16384, 4096);


Some of the concerns about resume from a different region could be addressed by skipping the call to linearize when doing a retry. When retrying, we know that the first slice contains roughly bytes_to_retry_

antoniovicente · 2020-11-18T23:35:19Z

source/extensions/transport_sockets/tls/ssl_socket.cc

@@ -248,15 +248,17 @@ Network::IoResult SslSocket::doWrite(Buffer::Instance& write_buffer, bool end_st
  while (bytes_to_write > 0) {
    // TODO(mattklein123): As it relates to our fairness efforts, we might want to limit the number
    // of iterations of this loop, either by pure iterations, bytes written, etc.
+    const auto slice = write_buffer.maybeLinearize(16384, 4096);


I recommend setting the copy threshold to 4000 bytes instead of 4096. This is related to the buffer default slice size being 4096 - sizeof(OwnedSlice) which is about 4032 bytes. Setting it to 4096 will result in a lot of spurious copies. See also #14054 (comment)

Signed-off-by: Greg Greenway <ggreenway@apple.com>

ggreenway · 2020-11-20T23:05:46Z

Here's the results of the benchmark code I just pushed. The testparams are: full_linearize/short_slice_size/num_short_slices.

full_linearize==0 means use the new maybeLinearize(), ==1 means use linearize(16384).
short_slice_size is the size of short slices that are put at the beginning of the buffer
num_short_slices is how many short slices are added.

The short slices were to try to force degenerate behavior.

Please review the test code and make sure it's measuring what we want to measure.

One thing I went back and forth on was whether to use a real socket, or use a mem BIO to communicate between client and server. The real sockets capture a real cost (kernel mode transitions), but I had some inconsistency between test runs, and I wonder if the syscalls contributed.

$ sudo chrt -f 99 ./bazel-bin/test/extensions/transport_sockets/tls/tls_throughput_benchmark -- --benchmark_min_time=2
2020-11-20 15:00:33
Running ./bazel-bin/test/extensions/transport_sockets/tls/tls_throughput_benchmark
Run on (24 X 3200 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x12)
  L1 Instruction 32 KiB (x12)
  L2 Unified 256 KiB (x12)
  L3 Unified 15360 KiB (x2)
Load Average: 1.29, 1.23, 1.02
----------------------------------------------------------------------------------
Benchmark                        Time             CPU   Iterations UserCounters...
----------------------------------------------------------------------------------
testThroughput/0/0/0          97.7 us         97.7 us        28015 throughput=1.84405G/s writes_per_iteration=11
testThroughput/1/0/0          99.2 us         99.1 us        28303 throughput=1.81789G/s writes_per_iteration=11
testThroughput/0/1/1           100 us          100 us        27779 throughput=1.7994G/s writes_per_iteration=12
testThroughput/1/1/1           100 us          100 us        28316 throughput=1.79611G/s writes_per_iteration=11
testThroughput/0/128/1        99.4 us         99.4 us        28503 throughput=1.81276G/s writes_per_iteration=12
testThroughput/1/128/1         100 us          100 us        28521 throughput=1.79555G/s writes_per_iteration=11
testThroughput/0/4096/1       90.5 us         90.5 us        30447 throughput=1.99064G/s writes_per_iteration=12
testThroughput/1/4096/1       99.2 us         99.2 us        27864 throughput=1.81592G/s writes_per_iteration=11
testThroughput/0/1/2           101 us          101 us        27687 throughput=1.77774G/s writes_per_iteration=11
testThroughput/1/1/2           100 us          100 us        27704 throughput=1.80209G/s writes_per_iteration=11
testThroughput/0/128/2        98.5 us         98.5 us        28761 throughput=1.83011G/s writes_per_iteration=12
testThroughput/1/128/2         101 us          101 us        28496 throughput=1.79126G/s writes_per_iteration=11
testThroughput/0/4096/2       93.9 us         93.9 us        30517 throughput=1.91974G/s writes_per_iteration=13
testThroughput/1/4096/2       99.4 us         99.4 us        27828 throughput=1.81344G/s writes_per_iteration=11
testThroughput/0/1/3          99.8 us         99.8 us        27712 throughput=1.80647G/s writes_per_iteration=11
testThroughput/1/1/3          99.9 us         99.9 us        26832 throughput=1.80452G/s writes_per_iteration=11
testThroughput/0/128/3        99.7 us         99.7 us        28752 throughput=1.8075G/s writes_per_iteration=12
testThroughput/1/128/3         101 us          101 us        26020 throughput=1.77846G/s writes_per_iteration=11
testThroughput/0/4096/3        101 us          101 us        30035 throughput=1.78075G/s writes_per_iteration=14
testThroughput/1/4096/3       98.8 us         98.8 us        27776 throughput=1.82495G/s writes_per_iteration=11

ggreenway · 2020-11-20T23:07:21Z

@antoniovicente The benchmark in this PR might help measure effects in #14111

antoniovicente · 2020-11-23T19:25:14Z

test/extensions/transport_sockets/tls/tls_throughput_test.cc

+      num_writes++;
+    }
+
+    state.counters["writes_per_iteration"] = num_writes;


could we capture a count on the number of times linearize did something?

Good idea!

These results were run under bazel, so ignore timing. But the num_times_linearize_did_something is deterministic.

---------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... ---------------------------------------------------------------------------------- testThroughput/0/0/0 330 us 328 us 1 num_times_linearize_did_something=10 throughput=548.771M/s writes_per_iteration=11 testThroughput/1/0/0 276 us 275 us 1 num_times_linearize_did_something=10 throughput=655.961M/s writes_per_iteration=11 testThroughput/0/1/1 285 us 285 us 1 num_times_linearize_did_something=10 throughput=633.111M/s writes_per_iteration=12 testThroughput/1/1/1 342 us 341 us 1 num_times_linearize_did_something=11 throughput=528.143M/s writes_per_iteration=11 testThroughput/0/128/1 290 us 289 us 1 num_times_linearize_did_something=9 throughput=623.632M/s writes_per_iteration=12 testThroughput/1/128/1 294 us 293 us 1 num_times_linearize_did_something=10 throughput=614.242M/s writes_per_iteration=11 testThroughput/0/4096/1 257 us 256 us 1 num_times_linearize_did_something=1 throughput=704.526M/s writes_per_iteration=12 testThroughput/1/4096/1 285 us 285 us 1 num_times_linearize_did_something=11 throughput=633.298M/s writes_per_iteration=11 testThroughput/0/1/2 283 us 283 us 1 num_times_linearize_did_something=11 throughput=637.848M/s writes_per_iteration=11 testThroughput/1/1/2 288 us 287 us 1 num_times_linearize_did_something=11 throughput=628.116M/s writes_per_iteration=11 testThroughput/0/128/2 316 us 316 us 1 num_times_linearize_did_something=7 throughput=571.135M/s writes_per_iteration=12 testThroughput/1/128/2 271 us 270 us 1 num_times_linearize_did_something=10 throughput=667.464M/s writes_per_iteration=11 testThroughput/0/4096/2 272 us 271 us 1 num_times_linearize_did_something=1 throughput=664.256M/s writes_per_iteration=13 testThroughput/1/4096/2 282 us 282 us 1 num_times_linearize_did_something=11 throughput=640.216M/s writes_per_iteration=11 testThroughput/0/1/3 283 us 282 us 1 num_times_linearize_did_something=11 throughput=639.23M/s writes_per_iteration=11 testThroughput/1/1/3 275 us 274 us 1 num_times_linearize_did_something=11 throughput=656.697M/s writes_per_iteration=11 testThroughput/0/128/3 281 us 280 us 1 num_times_linearize_did_something=5 throughput=642.652M/s writes_per_iteration=12 testThroughput/1/128/3 272 us 271 us 1 num_times_linearize_did_something=10 throughput=664.379M/s writes_per_iteration=11 testThroughput/0/4096/3 261 us 260 us 1 num_times_linearize_did_something=1 throughput=693.156M/s writes_per_iteration=14 testThroughput/1/4096/3 273 us 273 us 1 num_times_linearize_did_something=11 throughput=660.858M/s writes_per_iteration=11

num_times_linearize_did_something looks high in the cases where maybeLinearize is used. The number of copies done by both implementations is the same most of the time.

"if (slices_.size() >= 2 && slices_[1]->dataSize() >= max_size) {" should be using desired_min_size

Results after this change:

Benchmark Time CPU Iterations UserCounters... testThroughput/0/0/0 60.5 us 60.5 us 11572 num_times_linearize_did_something=0 throughput=2.97744G/s writes_per_iteration=12 testThroughput/1/0/0 62.9 us 62.9 us 11116 num_times_linearize_did_something=10 throughput=2.86485G/s writes_per_iteration=11 testThroughput/0/1/1 62.1 us 62.1 us 11279 num_times_linearize_did_something=0 throughput=2.90119G/s writes_per_iteration=13 testThroughput/1/1/1 64.0 us 64.0 us 10919 num_times_linearize_did_something=11 throughput=2.81688G/s writes_per_iteration=11 testThroughput/0/128/1 64.4 us 64.4 us 10880 num_times_linearize_did_something=0 throughput=2.79695G/s writes_per_iteration=14 testThroughput/1/128/1 63.3 us 63.3 us 11091 num_times_linearize_did_something=10 throughput=2.84636G/s writes_per_iteration=11 testThroughput/0/4096/1 62.6 us 62.6 us 11181 num_times_linearize_did_something=0 throughput=2.87997G/s writes_per_iteration=13 testThroughput/1/4096/1 63.5 us 63.5 us 11035 num_times_linearize_did_something=11 throughput=2.83919G/s writes_per_iteration=11 testThroughput/0/1/2 60.9 us 60.9 us 11496 num_times_linearize_did_something=1 throughput=2.96085G/s writes_per_iteration=12 testThroughput/1/1/2 63.9 us 63.8 us 10956 num_times_linearize_did_something=11 throughput=2.82267G/s writes_per_iteration=11 testThroughput/0/128/2 63.3 us 63.2 us 11093 num_times_linearize_did_something=1 throughput=2.85011G/s writes_per_iteration=13 testThroughput/1/128/2 63.1 us 63.1 us 11066 num_times_linearize_did_something=10 throughput=2.85719G/s writes_per_iteration=11 testThroughput/0/4096/2 64.4 us 64.4 us 10890 num_times_linearize_did_something=0 throughput=2.80025G/s writes_per_iteration=14 testThroughput/1/4096/2 63.5 us 63.5 us 11029 num_times_linearize_did_something=11 throughput=2.83725G/s writes_per_iteration=11 testThroughput/0/1/3 61.0 us 60.9 us 11461 num_times_linearize_did_something=1 throughput=2.957G/s writes_per_iteration=12 testThroughput/1/1/3 64.0 us 64.0 us 10950 num_times_linearize_did_something=11 throughput=2.81811G/s writes_per_iteration=11 testThroughput/0/128/3 63.8 us 63.8 us 11085 num_times_linearize_did_something=1 throughput=2.82643G/s writes_per_iteration=13 testThroughput/1/128/3 63.1 us 63.0 us 11112 num_times_linearize_did_something=10 throughput=2.85861G/s writes_per_iteration=11 testThroughput/0/4096/3 64.7 us 64.7 us 10804 num_times_linearize_did_something=1 throughput=2.78513G/s writes_per_iteration=14 testThroughput/1/4096/3 63.3 us 63.3 us 11051 num_times_linearize_did_something=11 throughput=2.84711G/s writes_per_iteration=11

antoniovicente · 2020-11-24T03:02:36Z

./bazel-bin/test/extensions/transport_sockets/tls/tls_throughput_benchmark

When I try to run tls_throughput_benchmark, I get:
assert failure: err > 0. Details: SSL_CTX_use_certificate_file

Signed-off-by: Antonio Vicente <avd@google.com>

ggreenway · 2020-11-30T18:25:16Z

./bazel-bin/test/extensions/transport_sockets/tls/tls_throughput_benchmark

When I try to run tls_throughput_benchmark, I get:
assert failure: err > 0. Details: SSL_CTX_use_certificate_file

Does it also fail when you run it under bazel, eg bazel test -c opt //test/extensions/transport_sockets/tls:tls_throughput_benchmark_test?

antoniovicente · 2020-11-30T18:32:23Z

./bazel-bin/test/extensions/transport_sockets/tls/tls_throughput_benchmark

When I try to run tls_throughput_benchmark, I get:
assert failure: err > 0. Details: SSL_CTX_use_certificate_file

Does it also fail when you run it under bazel, eg bazel test -c opt //test/extensions/transport_sockets/tls:tls_throughput_benchmark_test?

That passes, but runs with the test flags to reduce number of iterations. I think that the binary ran successfully if I cd to directory where the binary lives.

Signed-off-by: Greg Greenway <ggreenway@apple.com>

antoniovicente · 2020-11-30T20:42:40Z

source/common/buffer/buffer_impl.cc

+  }
+
+  // The next slice will already be of the desired size, so don't copy and
+  // return the front slice.


There's an error in the next line, should be:
if (slices_.size() >= 2 && slices_[1]->dataSize() >= desired_min_size) {

Given that it's a heuristic, I wouldn't say it's an error, just a choice. If the next slice is slightly larger than 4k, and the current slice is 1 byte, what's the best behavior?

It turns out the slice sizes are terrible, as you've noted, due to the inline storage of the OwnedSlice. The second slice contains just slightly less than 16k (I think it's 64 bytes less), which results in a bunch of copies on subsequent slices.

I think the next step is to remove the inline-storage from the slice (#14111), then re-evaluate this PR.

Sorry, my read of the comment made me think that you intended to compare against desired_min_size.

I think an interesting case is write behavior for HTTP2 which involves a 9 byte data frame header followed by up to 16kb of data. The change in #14111 will have some consequences to how said writes interact with linearize, but I think would result in little to no performance consequences since both the versions of the buffer class would end up copying about the same amount of data during linearize.

Sorry, my read of the comment made me think that you intended to compare against desired_min_size.

I see the confusion now. That comment was written when the parameter had a different (less clear) name. I'll clarify the comment.

antoniovicente · 2020-11-30T20:48:01Z

test/extensions/transport_sockets/tls/tls_throughput_test.cc

+      num_writes++;
+    }
+
+    state.counters["writes_per_iteration"] = num_writes;


num_times_linearize_did_something looks high in the cases where maybeLinearize is used. The number of copies done by both implementations is the same most of the time.

"if (slices_.size() >= 2 && slices_[1]->dataSize() >= max_size) {" should be using desired_min_size

Results after this change:

Benchmark Time CPU Iterations UserCounters... testThroughput/0/0/0 60.5 us 60.5 us 11572 num_times_linearize_did_something=0 throughput=2.97744G/s writes_per_iteration=12 testThroughput/1/0/0 62.9 us 62.9 us 11116 num_times_linearize_did_something=10 throughput=2.86485G/s writes_per_iteration=11 testThroughput/0/1/1 62.1 us 62.1 us 11279 num_times_linearize_did_something=0 throughput=2.90119G/s writes_per_iteration=13 testThroughput/1/1/1 64.0 us 64.0 us 10919 num_times_linearize_did_something=11 throughput=2.81688G/s writes_per_iteration=11 testThroughput/0/128/1 64.4 us 64.4 us 10880 num_times_linearize_did_something=0 throughput=2.79695G/s writes_per_iteration=14 testThroughput/1/128/1 63.3 us 63.3 us 11091 num_times_linearize_did_something=10 throughput=2.84636G/s writes_per_iteration=11 testThroughput/0/4096/1 62.6 us 62.6 us 11181 num_times_linearize_did_something=0 throughput=2.87997G/s writes_per_iteration=13 testThroughput/1/4096/1 63.5 us 63.5 us 11035 num_times_linearize_did_something=11 throughput=2.83919G/s writes_per_iteration=11 testThroughput/0/1/2 60.9 us 60.9 us 11496 num_times_linearize_did_something=1 throughput=2.96085G/s writes_per_iteration=12 testThroughput/1/1/2 63.9 us 63.8 us 10956 num_times_linearize_did_something=11 throughput=2.82267G/s writes_per_iteration=11 testThroughput/0/128/2 63.3 us 63.2 us 11093 num_times_linearize_did_something=1 throughput=2.85011G/s writes_per_iteration=13 testThroughput/1/128/2 63.1 us 63.1 us 11066 num_times_linearize_did_something=10 throughput=2.85719G/s writes_per_iteration=11 testThroughput/0/4096/2 64.4 us 64.4 us 10890 num_times_linearize_did_something=0 throughput=2.80025G/s writes_per_iteration=14 testThroughput/1/4096/2 63.5 us 63.5 us 11029 num_times_linearize_did_something=11 throughput=2.83725G/s writes_per_iteration=11 testThroughput/0/1/3 61.0 us 60.9 us 11461 num_times_linearize_did_something=1 throughput=2.957G/s writes_per_iteration=12 testThroughput/1/1/3 64.0 us 64.0 us 10950 num_times_linearize_did_something=11 throughput=2.81811G/s writes_per_iteration=11 testThroughput/0/128/3 63.8 us 63.8 us 11085 num_times_linearize_did_something=1 throughput=2.82643G/s writes_per_iteration=13 testThroughput/1/128/3 63.1 us 63.0 us 11112 num_times_linearize_did_something=10 throughput=2.85861G/s writes_per_iteration=11 testThroughput/0/4096/3 64.7 us 64.7 us 10804 num_times_linearize_did_something=1 throughput=2.78513G/s writes_per_iteration=14 testThroughput/1/4096/3 63.3 us 63.3 us 11051 num_times_linearize_did_something=11 throughput=2.84711G/s writes_per_iteration=11

github-actions · 2021-01-15T20:27:41Z

This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

github-actions · 2021-01-23T00:36:56Z

This pull request has been automatically closed because it has not had activity in the last 37 days. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

ggreenway requested review from asraa, dio, lizan and PiotrSikora as code owners November 17, 2020 01:00

lizan reviewed Nov 17, 2020

View reviewed changes

junr03 assigned lizan Nov 17, 2020

PiotrSikora reviewed Nov 17, 2020

View reviewed changes

antoniovicente self-assigned this Nov 18, 2020

antoniovicente self-requested a review November 18, 2020 21:14

antoniovicente reviewed Nov 18, 2020

View reviewed changes

ggreenway added 7 commits November 19, 2020 10:13

remove now-unneeded retry logic

1b7f1a2

Signed-off-by: Greg Greenway <ggreenway@apple.com>

Merge remote-tracking branch 'upstream/master' into tls-maybeLinearize

2dcf2f9

Add microbenchmark of combination of Buffer and SSL_write() operations

5aa6497

Signed-off-by: Greg Greenway <ggreenway@apple.com>

reduce duplicate code in benchmark

77b75a0

Signed-off-by: Greg Greenway <ggreenway@apple.com>

fixes

6dd7c60

Signed-off-by: Greg Greenway <ggreenway@apple.com>

fix incorrect drain

c12f14c

Signed-off-by: Greg Greenway <ggreenway@apple.com>

close sockets between test cases

2e2a41d

Signed-off-by: Greg Greenway <ggreenway@apple.com>

antoniovicente reviewed Nov 23, 2020

View reviewed changes

antoniovicente mentioned this pull request Nov 24, 2020

buffer: Optimize memory layout for buffer slices so it is better aligned with the 16KB transport socket read size #14111

Merged

antoniovicente added a commit to antoniovicente/envoy that referenced this pull request Nov 24, 2020

benchmark based on envoyproxy#14053 by ggreenway

5077868

Signed-off-by: Antonio Vicente <avd@google.com>

Merge remote-tracking branch 'upstream/master' into tls-maybeLinearize

da648ca

ggreenway added 2 commits November 30, 2020 11:05

count number of times linearize did something

fed8fcc

Signed-off-by: Greg Greenway <ggreenway@apple.com>

add missing file from last commit

a1d7cad

Signed-off-by: Greg Greenway <ggreenway@apple.com>

antoniovicente reviewed Nov 30, 2020

View reviewed changes

antoniovicente added the waiting label Dec 16, 2020

github-actions bot added the stale stalebot believes this issue/PR has not been touched recently label Jan 15, 2021

Base automatically changed from master to main January 15, 2021 23:01

github-actions bot closed this Jan 23, 2021

ggreenway mentioned this pull request Jul 27, 2021

Memory copy in buffer linearize process #17219

Closed

tls: improve write performance by reducing copying #14053

tls: improve write performance by reducing copying #14053

Conversation

ggreenway commented Nov 17, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lizan Nov 17, 2020 • edited by ggreenway Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ggreenway Nov 18, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ggreenway commented Nov 20, 2020

ggreenway commented Nov 20, 2020

Choose a reason for hiding this comment

ggreenway Nov 30, 2020 • edited Loading

Choose a reason for hiding this comment

antoniovicente Nov 30, 2020 • edited Loading

Choose a reason for hiding this comment

antoniovicente commented Nov 24, 2020

ggreenway commented Nov 30, 2020

antoniovicente commented Nov 30, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

antoniovicente Nov 30, 2020 • edited Loading

Choose a reason for hiding this comment

github-actions bot commented Jan 15, 2021

github-actions bot commented Jan 23, 2021

lizan Nov 17, 2020 •

edited by ggreenway

Loading

ggreenway Nov 18, 2020 •

edited

Loading

ggreenway Nov 30, 2020 •

edited

Loading

antoniovicente Nov 30, 2020 •

edited

Loading

antoniovicente Nov 30, 2020 •

edited

Loading