Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Two important fixes (and possibly more improvements) #188

Merged
merged 5 commits into from
Mar 19, 2018

Conversation

MrMage
Copy link
Collaborator

@MrMage MrMage commented Mar 19, 2018

Fixed two critical memory leaks in the C shim, as well as another flake in GRPCTests.

MrMage added 2 commits March 19, 2018 16:23
…would all GRPC call objects to never get released, which in turn caused their completion queues and associated file descriptors to never get released. This became apparent after ~3-5k requests on macOS.
@MrMage
Copy link
Collaborator Author

MrMage commented Mar 19, 2018

Trying to re-run CI...

@MrMage MrMage closed this Mar 19, 2018
@MrMage MrMage reopened this Mar 19, 2018
MrMage added 2 commits March 19, 2018 17:42
…ue to the queue itself. This is needed because it appears that otherwise, the underlying completion queue gets deallocated during its spinloop, which it doesn't like.
@MrMage
Copy link
Collaborator Author

MrMage commented Mar 19, 2018

Please don't merge this yet. I might have introduced new errors.

@MrMage
Copy link
Collaborator Author

MrMage commented Mar 19, 2018

Actually, I think it's fine ;-)

@timburks timburks merged commit dd4d1cb into grpc:master Mar 19, 2018
@MrMage MrMage deleted the even-more-improvements branch March 19, 2018 18:46
Copy link
Contributor

@cvanderschuere cvanderschuere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These test changes reverts the critical check of statusCode to mask an existing bug.

}

// We need to return status OK in both cases, as it seems like the server might never send out the last few messages
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a huge problem. We need to assert that the server sends an appropriate status code and if it's not currently then tests should be failing

@@ -157,23 +166,32 @@ func callUnary(channel: Channel) throws {
try call.start(.unary, metadata: metadata, message: message) {
response in
// verify the basic response from the server
XCTAssertEqual(response.statusCode, (i % 2 == 0) ? evenStatusCode : oddStatusCode)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means we never check for non .ok status codes which was the original bug filed in #144.

@cvanderschuere
Copy link
Contributor

@timburks @MrMage Why was this merged? It invalidates work done for #144 in #147 & #183

@MrMage
Copy link
Collaborator Author

MrMage commented Mar 19, 2018

The tests you provided regularly had "unexpected response" messages (which I converted into XCTFail calls) and even occasional crashes (due to forcibly unwrapped optionals in the tests). It appears that if code returns a non-OK status, one can not rely on the gRPC library to still send all sent messages to the client. It took me several hours to figure this out. If you want tests that verify that a non-OK status code is sent, make sure to not send any messages back, as those might get dropped.

Also notice that we still have tests for this — it is now covered (and I'd say in a much cleaner way) by ServerThrowingTests.

@MrMage
Copy link
Collaborator Author

MrMage commented Mar 19, 2018

P.S.: I agree that we might want to raise the issue of the server not sending back all remaining messages when it is given a non-OK status to return with the gRPC-Core team. However, do keep in mind that this is now tested (with much less wrapping code that can cause unrelated flakes) in ServerThrowingTests, and I specifically left the part in that tests for custom status messages, given that status codes and messages are always sent together.

@MrMage MrMage mentioned this pull request Mar 20, 2018
@cvanderschuere
Copy link
Contributor

It appears that if code returns a non-OK status, one can not rely on the gRPC library to still send all sent messages to the client. It took me several hours to figure this out. If you want tests that verify that a non-OK status code is sent, make sure to not send any messages back, as those might get dropped.

This doesn't make any sense. Servers regularly return non-OK statuses and continue to operate. I think this is a problem specific to the swift implementation.

My concern is the haste in which these changes were made and the lack of debate over them. It doesn't inspire much faith in the project and quality users can expect from it.

@timburks
Copy link
Member

@cvanderschuere Thanks for submitting tests and raising your concerns. Please consider everything since our last tagged version tentative.

@MrMage
Copy link
Collaborator Author

MrMage commented Mar 28, 2018

FYI, I have filed grpc/grpc#14842 with the gRPC project to inquire about the issue with non-OK status codes.

@MrMage
Copy link
Collaborator Author

MrMage commented Apr 3, 2018

Received an update on grpc/grpc#14842 (comment), with a link to https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/grpc-io/OwZZq27awUQ/uNRL0_7ICQAJ:

It is guaranteed if the status code of the RPC is OK. If the RPC terminates with any non-OK status, it is not guaranteed that the client will see all messages sent by the server.

The currently-supported behavior is that service-side abortion of an RPC (terminating an RPC with any status code other than OK) has "drop everything on the floor" semantics. The abortion renders the response message semantically meaningless, so the server-side gRPC runtime will not bother sending it if it hasn't already done so, and upon learning of the RPC abortion, the invocation-side gRPC runtime will not bother delivering the response (if it even received it) to the local application.

So I think the server will need to keep returning .ok, or the tests need to be rewritten. We still have ServerThrowingTests to check that non-OK status codes are sent, though.

@MrMage MrMage mentioned this pull request Apr 3, 2018
36 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants