-
Notifications
You must be signed in to change notification settings - Fork 255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't check StreamingCallSink is alive at close #211
base: master
Are you sure you want to change the base?
Conversation
There is nothing in place which guarantees that sending the last message on the Sink, and closing the Sink, happen atomically. There is currently a race condition where if the server responds incredibly quickly, its response may be received between the last streaming message being sent, and close being called. It isn't obvious to me what value this check_alive call is bringing; if there is one, we should try to guarantee its success by either: 1. Implementing a custom send_all implementation such that the last send and close are called atomically, getting rid of the window where the race can be lost. 2. Establishing a happens-before relationship between StreamingCallSink.close and whatever code receives the server response (I think Call.start_recv_message?). I think this would either need a pretty coarse-grained lock, or a pretty complex restructuring of code. But as the check doesn't appear to provide any useful guarantees (and indeed mostly appears to make losing this race condition problematic), I figured I'd try just removing it...
This should be an expected behavior. I don't think I understand you completely. It would be helpful if you can explain more details about your problem. |
My assumption is that once the server has sent a message, further sends from the client cease to be allowed. What I believe we're seeing is that the server is responding so quickly that the receiving half of the channel is closed before we start trying to close the sending half, and so we get an (I can reproduce this semi-deterministically - about 20% of attempts with the right server behaviour set up on a localhost server, and can deterministically not reproduce it after this PR is applied) |
This includes tikv/grpc-rs#211 which stops us seeing RpcFinished errors when the server responds very quickly to requests. Unfortunately, this will slow down building pants, as you'll need to checkout the git repository, but that should be a once-per-machine cost.
This includes tikv/grpc-rs#211 which stops us seeing RpcFinished errors when the server responds very quickly to requests. Unfortunately, this will slow down building pants, as you'll need to checkout the git repository, but that should be a once-per-machine cost.
commit f239399 Author: Eric Arellano <earellano@foursquare.com> Date: Thu Aug 16 22:34:25 2018 -0700 Remove stray pdb import commit 3babfb9 Author: Eric Arellano <earellano@foursquare.com> Date: Thu Aug 16 22:23:19 2018 -0700 Add back Py3 tests that now work commit a46b827 Author: Eric Arellano <earellano@foursquare.com> Date: Thu Aug 16 22:22:52 2018 -0700 Fix backend python issues commit 88fa18a Author: Eric Arellano <earellano@foursquare.com> Date: Thu Aug 16 20:08:42 2018 -0700 Fix backend project_info issues commit f4d4a2d Author: Eric Arellano <earellano@foursquare.com> Date: Thu Aug 16 19:54:39 2018 -0700 Actually fix console separator issues commit d0c83aa Author: Eric Arellano <earellano@foursquare.com> Date: Thu Aug 16 19:38:39 2018 -0700 Fix issues with backend native commit 9a1b3b7 Author: Eric Arellano <earellano@foursquare.com> Date: Thu Aug 16 19:33:48 2018 -0700 Fix most issues with backend jvm commit 62bc2a4 Author: Eric Arellano <earellano@foursquare.com> Date: Thu Aug 16 14:47:42 2018 -0700 Fix logging issue with passing object to format() commit a4bf759 Author: Eric Arellano <earellano@foursquare.com> Date: Thu Aug 16 08:05:07 2018 -0700 Decode test's execute_task() output commit b81cd24 Author: Eric Arellano <earellano@foursquare.com> Date: Thu Aug 16 08:03:57 2018 -0700 Fix issues with backend graph_info commit 5d28b9a Author: Eric Arellano <earellano@foursquare.com> Date: Sun Aug 12 23:13:06 2018 -0700 Fix console_task's console_separator unicode issues commit a8674f9 Author: Eric Arellano <earellano@foursquare.com> Date: Sun Aug 12 22:48:42 2018 -0700 Fix backend codegen unicode vs bytes issues commit c893f8c Author: Eric Arellano <earellano@foursquare.com> Date: Sun Aug 12 22:47:04 2018 -0700 Return unicode from stdout in pants integration test commit f4ab07c Author: Benjy Weinberger <benjyw@gmail.com> Date: Thu Aug 16 19:43:41 2018 -0600 Make requirements on codegen products optional. (pantsbuild#6357) We declare product requirements in various places to force codegen to run before, e.g., dependency resolution. However currently these requirements are mandatory, so deregistering all codegen backends will cause the round engine to find no producers of these products, and fail. This commit changes these requirements to optional, so that if there are no codegen backends, we don't attempt to depend on their products. Also, minor unrelated documentation nits that weren't worth a separate pull request. commit b0a2469 Author: Eric Arellano <ericarellano@me.com> Date: Thu Aug 16 12:08:29 2018 -0700 Python 3 - fixes to get green contrib (pantsbuild#6340) All contrib folders should now be green on Python 2 and Python 3, excluding `googlejavaformat` and `python`. This fixes multiple issues causing Py3 failures + adds Py3 to Travis. commit 070a5c5 Author: Daniel Wagner-Hall <dawagner@gmail.com> Date: Wed Aug 15 19:30:15 2018 -0700 Run clippy with nightly rust on CI (pantsbuild#6347) This enables all default lints, and a selection of pedantic ones. It also fixes all code to be clean with respect to those lints. Each commit is independently reviewable. It's a shame that the lint selecting can't be shared across creates in some way. commit 8ce6297 Author: Daniel Wagner-Hall <dawagner@gmail.com> Date: Tue Aug 14 20:04:02 2018 -0700 Use --entry-point not -c when building pex (pantsbuild#6349) I deterministically get this error: RuntimeError: Ambiguous script specification pants matches multiple entry points:pants = pants.bin.pants_loader:main pants = pants.bin.pants_loader:main when building with -c. pantsbuild#6267 changed this behaviour, but seems broken - this commit allows me to successfully build pexes. commit 30878d8 Author: Daniel Wagner-Hall <dawagner@gmail.com> Date: Tue Aug 14 19:11:19 2018 -0700 Fix formatting of store.rs (pantsbuild#6350) It looks like pantsbuild#6336 merged with bad formatting. commit 5f2ad63 Author: Daniel Wagner-Hall <dawagner@gmail.com> Date: Tue Aug 14 18:40:32 2018 -0700 Allow pex download path to be overridden (pantsbuild#6348) commit b7c607f Author: Stu Hood <stuhood@twitter.com> Date: Tue Aug 14 17:40:34 2018 -0700 Recover from cancelled remote execution RPCs (pantsbuild#6188) ### Problem It's possible for an individual RPC to be cancelled while an entire Operation continues, but this is not currently handled. See the problem description on pantsbuild#6100. ### Solution Recover from `RpcStatus::Cancelled` by retrying the same operation. ### Result Without the Fixes pantsbuild#6100. commit a96001e Author: Ity Kaul <ity@users.noreply.github.com> Date: Tue Aug 14 16:19:31 2018 -0700 Download Directory recursively from remote CAS (pantsbuild#6336) ### Problem We dont currently have the ability to download a `Directory`, and its contents, recursively. ### Solution Add `ensure_local_has_recursive_directory()` to be able to download a `Directory`, and its contents, recursively. commit e391cd1 Author: Daniel Wagner-Hall <dawagner@gmail.com> Date: Tue Aug 14 16:11:22 2018 -0700 Process execution: Create symlink to JDK on demand (pantsbuild#6346) This creates a symlink `.jdk` pointing at the passed JDK, if one was passed. This is a hack to allow for a remote execution platform to provide a pointer at its pre-installed JDK without needing to pollute absolute paths into the action definition. Long-term, we want the JDK to be part of the input directory for the action, but that's a lot of work, and we want to experiment with pre-installed JDKs in the mean time. commit d79b7c3 Author: Daniel Wagner-Hall <dawagner@gmail.com> Date: Mon Aug 13 21:26:08 2018 -0700 Simplify ExecuteProcessRequest construction (pantsbuild#6345) Remove create_from_snapshot and create_with_empty_snapshot - they don't pull their weight. Allow default values for all defaultable values. commit 33674ce Author: Daniel Wagner-Hall <dawagner@gmail.com> Date: Mon Aug 13 18:57:20 2018 -0700 Use forked version of grpcio (pantsbuild#6344) This includes tikv/grpc-rs#211 which stops us seeing RpcFinished errors when the server responds very quickly to requests. Unfortunately, this will slow down building pants, as you'll need to checkout the git repository, but that should be a once-per-machine cost. commit ccdf523 Author: Daniel Wagner-Hall <dawagner@gmail.com> Date: Mon Aug 13 18:36:28 2018 -0700 ci.sh uses positive rather than negative flags (pantsbuild#6342) This makes it much easier to reason about what's happening, and keeps .travis.yml more concise. commit 68f4b59 Author: Daniel Wagner-Hall <dawagner@gmail.com> Date: Mon Aug 13 16:12:44 2018 -0700 Merge directories with identical files (pantsbuild#6343) This is useful for merging Snapshots for Targets if multiple targets own the same file. This should be rare, but does happen (e.g. multiple overlapping resource targets). commit e189acd Author: Daniel Wagner-Hall <dawagner@gmail.com> Date: Mon Aug 13 09:50:23 2018 -0700 Set chunk size in process_executor (pantsbuild#6337) commit 20d267f Author: Paul <yaup@cs.washington.edu> Date: Sun Aug 12 14:41:26 2018 -0700 added fullpath to fix path concat issue with files when not in git root (pantsbuild#6331) ### Problem As described in pantsbuild#6301 , when the git root and the build root is not the same, `changed` functionalities can be messed up because `git` returns path relative to the working directory, which produces bad paths when it is [concatenated with the git root](https://github.com/pantsbuild/pants/blob/c9f3af460a7504e082931a4b5a668b66f0cd4ed0/src/python/pants/scm/git.py#L161). ### Solution By adding the `--full-name` tag to [`git ls-files`](https://github.com/pantsbuild/pants/blob/c9f3af460a7504e082931a4b5a668b66f0cd4ed0/src/python/pants/scm/git.py#L178), the file concatenation is done properly ### Result Doing `./pants list --changed...` will concatenate file names correctly when it is not run from the git root.
This includes tikv/grpc-rs#211 which stops us seeing RpcFinished errors when the server responds very quickly to requests. Unfortunately, this will slow down building pants, as you'll need to checkout the git repository, but that should be a once-per-machine cost.
Can you add a test case for it? |
There is nothing in place which guarantees that sending the last
message on the Sink, and closing the Sink, happen atomically.
There is currently a race condition where if the server responds
incredibly quickly, its response may be received between the last
streaming message being sent, and close being called.
It isn't obvious to me what value this check_alive call is bringing; if
there is one, we should try to guarantee its success by either:
send and close are called atomically, getting rid of the window
where the race can be lost.
StreamingCallSink.close and whatever code receives the server
response (I think Call.start_recv_message?). I think this would
either need a pretty coarse-grained lock, or a pretty complex
restructuring of code.
But as the check doesn't appear to provide any useful guarantees (and
indeed mostly appears to make losing this race condition problematic),
I figured I'd try just removing it...