Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grpc_proxy: Request Caching Documenation #7597

Closed
davissp14 opened this issue Mar 24, 2017 · 7 comments
Closed

grpc_proxy: Request Caching Documenation #7597

davissp14 opened this issue Mar 24, 2017 · 7 comments

Comments

@davissp14
Copy link
Contributor

davissp14 commented Mar 24, 2017

In the GRPC documentation it states:

The gRPC proxy caches responses for requests when it does not break consistency requirements. This can protect the etcd server from abusive clients in tight for loops.

The v3 API docs states:

serializable sets the range request to use serializable member-local reads. Range requests are linearizable by default; linearizable requests have higher latency and lower throughput than serializable requests but reflect the current consensus of the cluster. For better performance, in exchange for possible stale reads, a serializable range request is served locally without needing to reach consensus with other nodes in the cluster.

I feel the Grpc documentation is pretty misleading when it refers to "consistency requirements". I suspect the only requests that are cached are serializable requests, right? In this case we would also be susceptible to stale data, which again would make the GRPC proxy docs inaccurate.

Am I missing something?

@heyitsanthony
Copy link
Contributor

What's the inaccuracy, exactly?

@davissp14
Copy link
Contributor Author

davissp14 commented Mar 24, 2017

I guess the big question is what the "consistency requirements" are in the context of the grpc proxy, and how "possible stale reads" fits into those requirements.

@heyitsanthony
Copy link
Contributor

If it's serialized, the etcd server can return stale data too, but it'll still return data in-order. Whatever consistency you'd expect from etcd should also be respected by the proxy. Is there a case where this is being violated?

Only serialized requests are serviced by the cache, but that doesn't mean the proxy only caches serialized requests. For example, it will also cache keys from successful puts and linearized requests.

@banks
Copy link

banks commented Mar 24, 2017

Please correct me if I'm wrong, but I think there is an unstated assumption that any single client will only ever connect to one proxy node.

Example:

Imagine a setup with 3 etcd server nodes, and 2 proxy nodes. Imagine a client randomly connects to one or other proxy for load-balancing/fault tolerance.

In this case if it reads values with serializable consistency it might get a more recent version from one proxy node and then later on after a reconnect see an older version from the other proxy node.

That invalidates serializability.

I think there might be further cases even if the client only connects to one proxy node if that proxy looses connection to the etcd server and randomly picks another that happens to be further behind. In this case it would be possible if very unlikely that the client could read a older value than it previously read form the same proxy...

The second one may be almost non-existent in practice but the first should really be documented (unless I'm wrong - please correct).

I should also say I realise this is probably not the deployment topology grpc-proxy is designed for but that is not stated anywhere and will certainly be used in some cases.

@heyitsanthony
Copy link
Contributor

heyitsanthony commented Mar 24, 2017

Sure, but this will also happen with an etcd serialized request.

Have an etcd cluster with five members and let A and B be behind with Arev=10 and Brev=20 (e.g., they're partitioned from the cluster but not the client). If the client submits a serialized request to B then a serialized request to A, the client will see rev=20 followed by rev=10.

(There are ways to mitigate this on the client-side since all responses are tagged with a revision, but as of now it'll let cases like that through.)

@heyitsanthony
Copy link
Contributor

One proxy connecting to many etcd servers seems worse. There are cases where if the proxy caches over multiple endpoints, it will deliver weaker serialized request ordering than a single etcd server. The pattern is:

  1. Serve Rev=456 as the current revision
  2. Reboot / invalidate with txn comparison / evict with LRU
  3. Connect to partitioned server
  4. Fetch and serve Rev=123 as the current revision

Also interacts poorly with using linearized requests as a barrier. If the expectation is that a single grpcproxy will deliver in the same order as a single etcdserver, then this needs to be fixed. It shouldn't be too difficult and it'd solve the client/multiendpoint problem as well.

/cc @xiang90

@stale
Copy link

stale bot commented Apr 6, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Apr 6, 2020
@stale stale bot closed this as completed Apr 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

5 participants