-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PickFirstLeafLoadBalancer
does not emit TRANSIENT_FAILURE
states
#11082
Comments
@larry-safran You might be best to investigate this one. |
I can not reproduce it yet in the unit test. The existing UT covers some basic situations for the channel to report TRANSIENT_FAILURE after CONNECTING, e.g. with multiple and one address, initial iteration complete for all the addresses |
Thank you for looking into this, much appreciated - I'll see if I can find the time to dig into what's different between those grpc-java UT's and the behavior I'm seeing in the pekko-grpc integration test. Might have to wait a couple of weeks, though - busy period here. |
Are you providing multiple addresses or only a single address? Is it only in CONNECTING for a limited time or remains that way permanently? |
I was providing 2 addresses (that IIRC both would respond with 'connection refused'). It seemed to remain in CONNECTING.
…On 15 April 2024 23:12:13 WEST, Larry Safran ***@***.***> wrote:
Are you providing multiple addresses or only a single address? Is it only in CONNECTING for a limited time or remains that way permanently?
--
Reply to this email directly or view it on GitHub:
#11082 (comment)
You are receiving this because you authored the thread.
Message ID: ***@***.***>
|
I created a possible unit test reproducer in https://github.com/grpc/grpc-java/compare/master...raboof:grpc-java:test-for-PickFirstLeafLoadBalancer-11082?expand=1 - I'm not too familiar with the grpc-java codebase, so it is possible that I'm misunderstanding something and not accurately reproducing the issue, but it might be a good starting point for further analysis. The behaviour does look similar to what I'm seeing in the pekko-grpc failure, where |
1.64.0 seems to work fin in the Apache Pekko gRPC tests. - apache/pekko-grpc#311 |
I'm OK with closing this as the Pekko gRPC tests suggests the problem is gone on 1.64. My reproducer at https://github.com/grpc/grpc-java/compare/master...raboof:grpc-java:test-for-PickFirstLeafLoadBalancer-11082?expand=1 still fails when rebased, but I guess that's likely a problem with the test rather than with the implementation. |
We'd expect 1.64 would be fixed, because we disabled PickFirstLeafLoadBalancer (by default). But we'll be turning it on again in the future, so this is appropriate to keep open and us address. |
Ah 😄
Makes sense, LMK if you want me to test anything! |
@raboof Looking at your test code for reproducing, you are sending the subchannel state twice to the same subchannel which is why isPassComplete keeps returning false. Changing the logic for the second onSubchannelState call makes the test case pass.
Is it possible that the pacheko test that was failing isn't actually providing a failure on the second subchannel? |
ah, sorry for botching the reproducer ;)
hmm, that seems unlikely, as I'm not failing subchannels explicitly there (just passing a service with two invalid addresses). |
@raboof There was another change that went into 1.64 related to load balancers that might have also had an impact on the problem. Could you try 1.64 with GRPC_EXPERIMENTAL_ENABLE_NEW_PICK_FIRST=true and see if the problem is still there? |
Possible reproducer for grpc#11082 . I'm not too familiar with the grpc-java codebase, so it is possible that I'm misunderstanding something and not accurately reproducing the issue, but it might be a good starting point for further analysis. The behaviour does look similar to what I'm seeing in the pekko-grpc failure, where `isPassComplete` keeps returning `false` (because `addressIndex.isValid()` remains `true`).
OK, I can confirm our test still fails on 1.64 with In the pekko test we're passing a list of two Is that invalid input or something you'd like to support/handle? |
Nice! Indeed pekko-grpc updated to v1.66.0 which should have this fix and default to |
When using
channel.notifyWhenStateChanged
and trying to connect to addresses that don't accept connections, thePickFirstLoadBalancer
emitsTRANSIENT_FAILURE
states, while thePickFirstLeafLoadBalancer
just stays inCONNECTING
.What version of gRPC-Java are you using?
I noticed this after updating from 1.62.2 to 1.63.0. I can also reproduce it on 1.62.2 when I set the
GRPC_EXPERIMENTAL_ENABLE_NEW_PICK_FIRST
environment variable totrue
(which has become the default in 1.63.0).What is your environment?
Linux (NixOS unstable), Oracle Java 1.8.0_362
What did you expect to see?
Alternating between
CONNECTING
andTRANSIENT_FAILURE
statesWhat did you see instead?
Silence after entering the
CONNECTING
stateSteps to reproduce the bug
I don't have a particularly minimal reproducer, but can reliably show the problem with the "NonBalancingIntegrationSpecNetty" test in Akka gRPC (apache/pekko-grpc#271 (comment))
The text was updated successfully, but these errors were encountered: