Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry on SocketExcetion in SendAsync #650

Merged
merged 1 commit into from
Nov 23, 2021

Conversation

patrikhellgren
Copy link
Contributor

Q A
Bug fix? no
New feature? yes
New sample? no
Related issues?

What's in this Pull Request?

Added retry for SocketExceptions in SendAsync to handle e.g. network errors in addition to specific response status codes.

@jansenbe jansenbe self-assigned this Nov 22, 2021
@jansenbe
Copy link
Contributor

@patrikhellgren : thanks for this PR! I've been investigating socket exceptions as well (due to some internal folks using PnP Core SDK reporting this). Can you provide more clarity on what socket exceptions you see? What's the scenario you're using this in (maybe via PnP Framework?) and finally what version of .NET are you using?

@jansenbe jansenbe added the area: framework ⚙ Changes in the SDK core framework code label Nov 22, 2021
@patrikhellgren
Copy link
Contributor Author

patrikhellgren commented Nov 23, 2021

@jansenbe: I would say the socket exceptions we see fall into three different categories. Due to the nature of these issues I guess they would occur with any .NET version but have been occurring with .NET 5, .NET Core 3.1 and .Net Framework 4.7.2:

  1. The obvious app running on a laptop with a bad WiFi connection so that it intermittently looses connection. When this happens we do want the app to do some retries.
  2. We have over time experienced some intermittent socket exceptions from both Graph and SharePoint Online. At the time it was with PnP.Framework but since it is basically the same code for the SendAsync method we would see it with PnP.Core also if it were to appear again. See [BUG] SSL connection cannot be established at random intervals pnpframework#292 for more details.
  3. This is the most critical one at the moment We have some customers using ADFS where the ADFS server as it seems under heavier load misses one or two authentication responses. This casts a SocketException in PnP.Core authentication providers and is easily handled using a retry mechanism on socket exceptions. This could of course happen if the ADFS server had other issues too and can easily be simulated by setting a false IP-address for the ADFS server in the hosts file and then running a console app that tries to authenticate, it will cast a socket exception and then quit but using the retry handler it would continue if you removed the false IP-address again within the retry period.

I know that number 3 is not fixed by this PR alone since RetryHandlerBase isn't used by the authentication providers but wanted to get this PR in first and keep it separate from the authentication.
I also have finished code for handling this in the authentication providers using .WithHttpClientFactory and a MsalHttpClientFactory that uses a handler inheriting from RetryHandlerBase that I was about to submit a PR for as a follow up to this one. Would you be interested in that or would you like it made in another way? I made it this way to work in the same way as the existing retry code but it could of course be done with a policy handler instead like in Polly.

How would you like to move forward?

@jansenbe jansenbe merged commit 1501583 into pnp:dev Nov 23, 2021
@jansenbe
Copy link
Contributor

@patrikhellgren : I've merged your PR, will ensure more stability on flaky networks + SharePoint seems to sometimes terminate connections without a clear reason. We've considered using Polly in the beginning, but would have to start digging in my archives to understand why we opted to not use it...if we do a major version bump that's something we can reconsider again.

Please create the PR to add similar capabilities to the authentication project.

Thanks 💪🥇🎉

@patrikhellgren patrikhellgren deleted the add-retry-on-exception branch November 23, 2021 23:48
@lrtoenjes
Copy link

@patrikhellgren Does this mean that you fixed the socket issue that we're seeing in [BUG] SSL connection cannot be established at random intervals in PnP.Framework, here in PnP.Core? If I migrate my project to PnP.Core instead of PnP.Framework, will I no longer be getting these frustrating errors?

@patrikhellgren
Copy link
Contributor Author

@lrtoenjes This exact fix was also implemented in PnP.Framework in pnp/pnpframework#528 which should be in PnP.Framework 1.8. There was also some retry handling for this implemented in pnp/pnpframework#301.

I haven't seen any issues with Socket exceptions in PnP.Framework since pnp/pnpframework#528 was added. So I would say that you shouldn't have to switch to PnP.Core to get the same exception handling.

If you are on version 1.8 of PnP.Framework and still seeing issues there might be some other problems. I have seen a couple of different issues with Socket exceptions, the first being the one in pnp/pnpframework#292 which I still believe (at least in part) was due to some issues on Microsofts SharePoint side that was later fixed. The retry handling that was implemented (pnp/pnpframework#301) would of course also handle that. The second issue I have seen was with an ADFS implementation that seemed to have some problems during peak load hours when it didn't respond to some authentication requests which made PnP.Framework (and PnP.Core) cast a Socket exception. This is now fixed in PnP.Core with this PR (#650) together with #656 and in PnP.Framework with pnp/pnpframework#528.

You should also try using HttpClientWebRequestExecutorFactory (pnp/pnpframework#261) so that you're not running into any port exhaustion issues and you will also gain some performance. I am using this with PnP.Framework 1.8 and have not seen any Socket exceptions with that configuration but did from time to time before that.

@lrtoenjes
Copy link

@patrikhellgren Ahh, I see. I missed that. I assumed that because pnp/pnpframework#292 was still open, that the issue hadn't been fixed there. Thanks for fixing it! My project is currently on an older version of PnP.Framework, so I will attempt an upgrade and hopefully that rectifies things. Thank you for the tips!

Sorry if this isn't the place to ask this, but I see that PnP.Core is an eventual replacement for PnP.Framework. Seeing as how my project does mostly simple Teams and SPO interactions and doesn't use the Provisioning Engine, is there a benefit to migrating it to use the new methods provided in PnP.Core?

@patrikhellgren
Copy link
Contributor Author

@lrtoenjes I would say that the benefits right now would be if you wanted a fluent API and dependency injection instead of the CSOM like approach in PnP.Framework. But since PnP.Core will be the way going forward, it could of course be an idea to start looking in to migrating. You could always use PnP.Framework interop if you find that some features are missing from PnP.Core.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: framework ⚙ Changes in the SDK core framework code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants