Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFE: Native Client3 - no socket for embedded servers. #4435

Closed
timothysc opened this issue Feb 5, 2016 · 16 comments
Closed

RFE: Native Client3 - no socket for embedded servers. #4435

timothysc opened this issue Feb 5, 2016 · 16 comments

Comments

@timothysc
Copy link

Currently the client interface(s) to etcd all make the supposition that is talking over a client socket which has a number performance constraints if you potentially wish to embed etcd into a master/worker.

The main premise behind this would be to eliminate the encode<>decode<>transport + sec handshakes etc from the critical path of distributed systems.

In the case of kubernetes it would be smashing etcd into the api-server very similar to what is outlined by the borg-master in the paper. This would also allow the api-server to simply reference the client library and still enable external etcd if needed.

/cc @wojtek-t @hongchaodeng @xiang90

@xiang90
Copy link
Contributor

xiang90 commented Feb 5, 2016

Talked about embedded etcd inside api server and avoiding extra network/cpu path with @bgrant0607.

But @bgrant0607 mentioned this is not a good way to go. He might want to share some opinions.

@timothysc
Copy link
Author

@xiang90 If it's hidden behind the client library, it shouldn't matter. It's a deployment constraint.

@xiang90
Copy link
Contributor

xiang90 commented Feb 5, 2016

@timothysc I understand that part. I am just not sure if k8s would actually embed etcd...

@timothysc
Copy link
Author

Upstream may not, but that doesn't prevent anyone downstream.

@xiang90
Copy link
Contributor

xiang90 commented Feb 5, 2016

@timothysc I am fine with this idea. etcd server's/client's API layer is pretty decoupled with other component. It is quite easy to add support. But this would not be a high priority thing until we make v3 stable.

@timothysc
Copy link
Author

ack.

@bgrant0607
Copy link

Long-lived sessions could reduce handshake overhead.

There are multiple disadvantages of linking etcd into the apiserver in real clusters (would be great for hyperkube, though):

  • Bugs in apiserver business logic could make the store unavailable, which would make debugging, backups, state repair, etc. harder
  • It would constrain the scaling of apiserver to the etcd quorum size
  • apiserver replicas would presumably only talk to their embedded etcd instances, which would defeat leader-oriented optimizations in etcd, if any

Borg uses a paxos log, not a key-value store. It has a highly compressed representation and supports multi-object transactions, but assumes that the state can only be written and read by the elected master, that all the state is reconstructed in memory before it can serve requests (i.e., post-crash recovery takes a while), and doesn't support Watch.

In Omega, we split out the store.

@timothysc
Copy link
Author

Bugs in apiserver business logic could make the store unavailable, which would make debugging, backups, state repair, etc. harder

I don't believe this is any different from today, for openshift deploys.

It would constrain the scaling of apiserver to the etcd quorum size

Yup, That would be a trade off.

apiserver replicas would presumably only talk to their embedded etcd instances, which would defeat leader-oriented optimizations in etcd, if any

The strategy outlined would "allow" for the embedding, not force. Meaning dealers choice on the matter. Depending on the deployment, we can allow for multiple strategies.

@timothysc
Copy link
Author

/cc @jeremyeder

@timothysc
Copy link
Author

@xiang90 Do the caches make any sense in the native client case, wouldn't etcd hold the copy in memory in this case anyways.

@xiang90
Copy link
Contributor

xiang90 commented Feb 11, 2016

@timothysc etcd will hold all recent data in memory. It does not make a lot of sense to do client side caching in this case unless you are going to have a very tight loop.

@jeremyeder
Copy link

Keep them separated, but do kernel bypass for loopback connections...this is called tcp_friends and we failed, repeatedly, to get it upstream 4 years ago. Solaris and Windows can do it. @netoptimizer -- any thoughts from Spain?

Onbox/hot-path etcd<-->kube via unix domain socket...can we do that now?

@timothysc
Copy link
Author

@jeremyeder It's still overhead + extra caching. Making the client native makes little difference to the higher level code. We've already smashed the executables into a monolith as is, so why not make the simple transition to not pretend, in the smashed deployment config. It "should be" a single initialization config change.

As is today, it's like talking over cell phones even though your in the same room. Buffering + encode + decode + latency.

@jeremyeder
Copy link

Ah if it's that simple then great. tcp_friends was to accelerate apps that weren't so simply dealt with.

@netoptimizer
Copy link

We (unfortunately) don't have any plans for tcp_friends ... I'm fixing
other layers first. Maybe we can find someone else to work on it(?)
Den 12/02/2016 12.41 skrev "Jeremy Eder" notifications@github.com:

Keep them separated, but do kernel bypass for loopback connections...this
is called tcp_friends and we failed, repeatedly, to get it upstream 4 years
ago. Solaris and Windows can do it. @netoptimizer
https://github.com/netoptimizer -- any thoughts from Spain?


Reply to this email directly or view it on GitHub
#4435 (comment).

@xiang90
Copy link
Contributor

xiang90 commented Nov 11, 2016

@timothysc I am closing this in favor of #4709.

This is a dup with #4709. And #4709 provides a more convincing reason to do it. It would benefit you the same if implemented.

@xiang90 xiang90 closed this as completed Nov 11, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

5 participants