Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition when closing a stream via grpcbox_client:close_and_recv/1 #16

Open
davisp opened this issue Aug 20, 2019 · 3 comments
Open

Comments

@davisp
Copy link

davisp commented Aug 20, 2019

I've got a bit of a conundrum trying to chase down a spurious error in a test suite. I'm occasionally getting timeout back from grpcbox_client:close_and_recv/1 instead of the expected stream_finished.

The code between grpcbox and chatterbox is a bit hard to follow. But near as I can tell, what's happening is that we're receiving the eos message at [1] and then there's a race to get to [2] where most of the time is_process_alive/1 returns false although occasionally it can return true which leads to us getting a timeout.

At this point I think I'm just going to remove my assertion on the return of grpcbox_client:close_and_recv/1 because I can't figure out an appropriate patch given how we demonitor when receiving the eos message. The only thing I can think of is to have a new grpcbox_client_stream:recv_end/2 call that ends up returning any messages in the mailbox until the process is dead or something? But that seemed not quite right so I figured I'd just open the issue to see if anyone else had any better ideas.

[1] https://github.com/tsloughter/grpcbox/blob/v0.9.1/src/grpcbox_client.erl#L197-L199
[2] https://github.com/tsloughter/grpcbox/blob/master/src/grpcbox_client_stream.erl#L117-L122

davisp added a commit to cloudant-labs/ateles that referenced this issue Aug 20, 2019
@tsloughter
Copy link
Owner

Hey, I'll dig into this soon, I've been wanting/needing to work on the client some more. I'll get back to you on this specific issue first. The client is quirky, I don't like how it is sending messages to the process that made the request, it should maybe be an optional setting if it is really desired to receive messages this way.

@tsloughter
Copy link
Owner

I think there is likely a hack fix to this that I can do before I full refactoring of the client.

Since the stream has ended it is assumed the process will be gone by the time it checks if the process is alive. I can change that recv with a 0 timeout to be either a different function or include an attribute to signal it is after the stream has ended.

@davisp
Copy link
Author

davisp commented Sep 4, 2019

Yeah, my only thought was to have a separate function there that would add a timeout to the recv since we're expecting it to be dead. I just don't know enough of the details to know how much of recv_msg would be normally be expected to possibly execute in those circumstances and what would more likely constitute an error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants