Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clientv3 requires two copies of byte data for a Put() #7249

Closed
smarterclayton opened this issue Jan 30, 2017 · 8 comments
Closed

clientv3 requires two copies of byte data for a Put() #7249

smarterclayton opened this issue Jan 30, 2017 · 8 comments

Comments

@smarterclayton
Copy link
Contributor

ClientV3 OptPut requires a value in string - for clients that set binary output, if OpPut is not inlined then at least one copy is required (one to turn []byte -> string and one to turn it back into []byte). This causes significant additional garbage collection in write heavy workloads.

I am not seeing OpPut inlined without an explicit call in kubernetes - for very small keys and values the overhead is 10% of total space:

         .          .    145:func OpPut(key, val string, opts ...OpOption) Op {
         .          .    146:	ret := Op{
         .          .    147:		t:   tPut,
      81MB       81MB    148:		key: []byte(key),
     175MB      175MB    149:		val: []byte(val),
    1.56GB     1.56GB    150:	}
         .          .    151:	ret.applyOpts(opts)

and is high for objects:

         .          .    145:func OpPut(key, val string, opts ...OpOption) Op {
         .          .    146:	ret := Op{
         .          .    147:		t:   tPut,
   5308497    5308497    148:		key: []byte(key),
  11468975   11468975    149:		val: []byte(val),
  10461141   10461141    150:	}

Since key and val are private it is not possible to adjust the client library without dropping into raw protobuf.

@smarterclayton
Copy link
Contributor Author

@wojtek-t

@xiang90
Copy link
Contributor

xiang90 commented Jan 30, 2017

@smarterclayton

We probably can use buffer pool or pre allocate some bytes array to avoid the frequent allocation.

Since key and val are private it is not possible to adjust the client library without dropping into raw protobuf.

Can you clarify on this? I do not quite get it.

@smarterclayton
Copy link
Contributor Author

Generally interfaces that deal with binary data would accept []byte to avoid the copy that the compiler adds (to keep the string immutable).

To the second point Op.key and Op.val are private, so it's not possible for an advanced client to add a "WithOptions" func that set those directly, although that might still result in a copy if the compiler can't be certain the values don't escape.

@smarterclayton
Copy link
Contributor Author

I.e. OpPutBytes(key, value []bytes, ... WithOptions) Op would also work for this case.

@heyitsanthony
Copy link
Contributor

@smarterclayton is this an end-to-end profile or a microbenchmark? Is there a major performance penalty in terms of latency/throughput? The overhead of doing a write to etcd is already very expensive compared to an allocation; it's not clear if saving the alloc here is a big win for overall system performance.

@smarterclayton
Copy link
Contributor Author

smarterclayton commented Jan 31, 2017

In this case I was demonstrating micro, but in my tests for high range transactions it has come up. I've been prototyping changes for Kubernetes that allow high cardinality range selection from a Txn (hundreds of key subsets) and the 3x penalty on keys contributes materially to the construction of the query. In embedded cases, this is a non trivial amount of allocations / copies and becomes more noticeable (investigating local indices with embedded that mirror a primary server under a different cluster).

While it may not be a huge issue in most cases, having an API that requires those allocations and copies for a kv store is unusual.

@heyitsanthony
Copy link
Contributor

@smarterclayton the embedded case is still sort of pricey without #4709 since the client will still need to go through a socket (e.g., unix socket) to get to the embedded server. Ideally the embedded client would have implementations of KV/Txn/etc that make direct method calls into EtcdServer without a socket.

Anyway, the reasoning behind the copy, just to clarify that some thought was put behind it, is strings communicate/enforce constness while []byte doesn't. It's also nicer to write "abc" instead of []byte{"abc"}, which seems to be a common case. Unfortunately, grpc / the go runtime's io expects []byte arguments so the client then has to eat a recast/realloc at some point. clientv3 erred on the side of a stricter API interface by using strings, but OpPutBytes or something similar seems OK if it turns out to be a serious show-stopper perf issue.

@stale
Copy link

stale bot commented Apr 7, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Apr 7, 2020
@stale stale bot closed this as completed Apr 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants