Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SignalR, No Connection with that ID #9917

Closed
rezanid opened this issue May 2, 2019 · 21 comments
Closed

SignalR, No Connection with that ID #9917

rezanid opened this issue May 2, 2019 · 21 comments
Labels
area-signalr Includes: SignalR clients and servers

Comments

@rezanid
Copy link

rezanid commented May 2, 2019

I have an asp.net core app that uses SignalR to report metrics that are visualized in the browser. It has been working fine for several weeks, but today I noticed that SignalR fails to connect very often and even when it connects, it disconnect shortly after. I'm using LongPolling and I can see in the network logs of Chrome that the first request to "negotiate" is always successful.

{"connectionId":"Vg4Mieg0ya3aD17-f8kMxw","availableTransports":[{"transport":"LongPolling","transferFormats":["Text","Binary"]}]}

but the second one which is a POST to "notify" is not (49 out of 50 times) and it responds with 404 Not found.

No Connection with that ID

I sometimes get the following as well.

{"error":"Handshake was canceled."}

The app works flawlessly when hosted locally (using localhost). It only happens when the app is hosted (IT/FT/..) in Azure as a Web App Service and all the resources of the application are downloaded successfully. I have the above issues only for SignalR requests from browser and logs on the server side don't show anything wrong.
Target framework: .NET Core 2.2

@BrennanConroy BrennanConroy added the area-signalr Includes: SignalR clients and servers label May 2, 2019
@BrennanConroy
Copy link
Member

Are you using a multi-instance web app? If so do you have ARR Affinity (sticky sessions) enabled?

@rezanid
Copy link
Author

rezanid commented May 3, 2019

My colleague just checked. We have 2 instances and ARR Affinity is on. I'm going to deploy to a local IIS (as opposed to express that I'm using to debug) with production build and similar URL (virtual folders) to see if I can reproduce the issue in my dev box.

@BrennanConroy
Copy link
Member

Could you collect a network trace of the failing connections? https://docs.microsoft.com/en-us/aspnet/core/signalr/diagnostics?view=aspnetcore-2.2#network-traces

@rezanid
Copy link
Author

rezanid commented May 8, 2019

Thanks for the suggestion. I did and to be sure I also asked my colleague to reduce the instance count to 1 to reduce the number of things that can go wrong. After reducing the instances, it's now working fine apart from random Gateway timeout issues that can happen a few times during the day. If it happens I think SignalR stops polling permanently until I refresh the page. Is there any retry-policy + circuit breaker in it that I can configure?
wallboard-gatewaytimeout-signalr

@rezanid rezanid reopened this May 8, 2019
@BrennanConroy
Copy link
Member

After reducing the instances, it's now working fine

This heavily indicates that sticky sessions is not enabled or working correctly. The network traces (with 2 instances) would be useful to see what's going on.

Is there any retry-policy + circuit breaker in it that I can configure?

In 3.0 we're adding support for automatic reconnect, but for now you can do something like https://docs.microsoft.com/en-us/aspnet/core/signalr/javascript-client?view=aspnetcore-2.2#reconnect-clients

@rezanid
Copy link
Author

rezanid commented May 8, 2019

I will do the trace with multiple instances tomorrow to prove your theory.

Thanks for the suggestion. I thought I have read all the documentation, but obviously I have missed this one at least.

@analogrelay
Copy link
Contributor

Closing this as we haven't heard from you and generally close issues with no response after ~7 days. Please feel free to comment if you're able to get the information we're looking for and we can reopen the issue to investigate further!

@jagathprasanga
Copy link

Im also experiencing same issue , configurations are
Asp.Net Core 2.2 Load balancing between two instances (Redis is using)
Hosted on kestrel server on CentOS behind the HAProxy.
Please help me on this matter.
Thanks.

@analogrelay
Copy link
Contributor

Asp.Net Core 2.2 Load balancing between two instances (Redis is using)
Hosted on kestrel server on CentOS behind the HAProxy.

Do you have session persistence (also called session affinity or "sticky sessions") enabled to ensure that all requests for the same user always go to the same server? SignalR requires that all HTTP requests for a single connection go to the same physical server and if any request ends up getting routed to a different server, you'll get a 404.

@davidfowl
Copy link
Member

Lets improve the error message to include that information both in the logs and response.

@analogrelay
Copy link
Contributor

cough #5350 cough

@jagathprasanga
Copy link

Yes, 404 error disappears when sticky sessions enabled in HAProxy, but why SignalR can't share connection information via Redis, then what is the purpose of Redis backplane as described here (https://docs.microsoft.com/en-us/aspnet/core/signalr/redis-backplane?view=aspnetcore-2.2)

@davidfowl
Copy link
Member

Yes, 404 error disappears when sticky sessions enabled in HAProxy, but why SignalR can't share connection information via Redis,

Because it's unreliable to expect a persistent connection to exist across multiple server instances.

then what is the purpose of Redis backplane as described here (https://docs.microsoft.com/en-us/aspnet/core/signalr/redis-backplane?view=aspnetcore-2.2)

Message propagation across multiple servers (broadcast, group send, sending to another connection on another server).

@jagathprasanga
Copy link

Ok, I got the answer, Thank you very much.

@scabana
Copy link

scabana commented Jul 15, 2019

Yes, 404 error disappears when sticky sessions enabled in HAProxy, but why SignalR can't share connection information via Redis,

Because it's unreliable to expect a persistent connection to exist across multiple server instances.

@davidfowl

I've seen a few presentations about the subject. I do understand this does reduce the complexity of the SignalR codebase. But, with cloud workloads becoming more and more common, scaling out and load balancing instead of scaling up an instance (be it on app services or kubernetes), load balancing on a round robin is being used more and more. Now, add geo localized deployments behing an Azure Traffic Manager and the problem shows up even more. This limitation (keeping a single server endpoint for 1 signal-r connection) is increasing the complexity of deployment and maintenance of systems since for this specific usage, we need a sticky session/server pair where, if in a micro-service pattern, most if not all services will not require sticky sessions. Move from app services to kubernetes and this become a bigger problem since containers are expected to get re-balanced over time. Is there a plan to enable breaking this link between server and session? If there's no plan, is there a work around?

Thanks a lot!

@davidfowl
Copy link
Member

There is no plan. This is why we built the azure SignalR service, to hide the complexity of scaling persistent connections. It’s not about making the code base cleaner, it’s about making it easier to scale persist connections. This isn’t your typical stateless web tier, it’s stateful and you need to be aware of that when you use persistent connections. If for some reason you can’t use the service, then split your SignalR traffic from your web tier and scale them differently (that effectively what the service gives you for free).

@scabana
Copy link

scabana commented Jul 15, 2019

Thank you for the quick reply.

@zeroregard
Copy link

Hi, I'm getting this issue after having my server running for a while. As far as I know, I have not scaled out my SignalR server to multiple instances, I assume this is something you have to configure actively on Azure somewhere (?) If it happens on a single instance, what could be the reason behind it? The exact error I get is

Request Finished Successfully, but the server sent an error. Status Code: 404-Not Found Message: No Connection with that ID

@davidfowl
Copy link
Member

If it happens on a single server then something else is wrong. What server is this and does it happen randomly or can you reproduce it.

@zeroregard
Copy link

It's an Azure Web App running ASP.NET Core with the newest version of SignalR. I increased the timeout intervals, but it didn't seem to do a difference. It happens randomly, although there seems to be periods where it happens several times in a row.

@analogrelay
Copy link
Contributor

@mathiassiig I'd strongly advise you to create a new issue with your specific scenario. Since this is a closed issue, it doesn't come up in our regular tracking so you're relying on David and I alone paying attention to our GitHub notifications (which we generally do, but can easily get behind on ;)).

@ghost ghost locked as resolved and limited conversation to collaborators Dec 3, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-signalr Includes: SignalR clients and servers
Projects
None yet
Development

No branches or pull requests

7 participants