-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SocketsHttpHandler: do not use DualMode sockets in default connection logic #45614
Conversation
Tagging subscribers to this area: @dotnet/ncl Issue DetailsIn some environments IPV6 (thus dual-stack) sockets do not work despite I'm reverting the default branch of The only thing I changed in the old code is comments. @scalablecory @geoffkizer PTAL.
|
socket.NoDelay = true; | ||
return new NetworkStream(socket, ownsSocket: true); | ||
} | ||
catch (Exception error) when (!(error is OperationCanceledException)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit:
catch (Exception error) when (!(error is OperationCanceledException)) | |
catch (Exception error) when (error is not OperationCanceledException) |
Reads a bit nicer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to not do refactors in this PR and keep ConnectHelper.ConnectAsync
as it was in it's original 628d99b state, there are way too many thinks we may want to fix.
I don't understand something. If the problem is that the OS doesn't allow dual-stack sockets, why isn't Socket.DualMode throwing an exception when we try to set it? |
return async ? ConnectAsync(host, port, cancellationToken) : new ValueTask<Stream>(Connect(host, port, cancellationToken)); | ||
} | ||
|
||
private static async ValueTask<Stream> ConnectAsync(string host, int port, CancellationToken cancellationToken) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is all concerning to me. The premise of the SocketsHttpHandler.ConnectCallback's design (and not exposing the default implementation of how a socket is created or the default connect logic) was that it's just a couple of lines of code to emulate the default behavior... now it's over a 100 lines?
cc: @geoffkizer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're going to introduce an easy mode static Socket.ConnectAsync
API, so it'll go back to a 5 liner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then shouldn't we add that and then use it here rather than doing it this way? I don't see why we're putting back this connect helper logic. If the concern is being able to more easily backport, that same static API can just be added as an internal SocketEx static in the backport or something like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @stephentoub here
It seems like the OS as a whole does, but the NIC driver or some firewall setting is blocking it at a later point. |
So our message then is don't use DualMode because something somewhere may not support it even if the system says it does? |
From the conversion, it seems like the Azure VPN ignores the dual-mode socket and the packet is than routed to physical interface instead of private VPN and then rejected by firewall (or lost as it missed tunnel) |
@wfurt, that's what it sounded like to me, too, which is why I'm skeptical of the direction of this change. |
And AWS |
@stephentoub people are also hitting it in AWS: So far I'd say there is a new user confirming the issue every week. Do we want to tell each of them to use the workaround? For some it's not possible, because they are using |
I agree it seems like dual-mode should work fine here, and these environments are doing something wrong. |
If the environments are broken, we should work with the powers that be to fix the environments. If alternatively we're saying dual-mode should never be used because it can't be trusted, we should be explicit about that, find and fix all such usage in all of our code, deprecate the relevant ctors/properties, etc etc. |
@stephentoub I don't think we are saying that dual-mode should be never used. In this solution we are switching to a default connect implementation that provides the highest compatibility counting with flawed environments. Users normally do not need to make such compatibility efforts in their ConnectCallbacks. |
I don't have a strong opinion here, I think I raised more or less the same concern in my #44686 (comment) as @stephentoub here in the PR, but @scalablecory @geoffkizer and @karelz voted for the trivial workaround. Now it seems opinions are changing. There are basically 3 options:
How do we plan to decide? |
So when someone uses the Socket(SocketType, ProtocolType) ctor, directly or via some other library that does so, we do or do not expect that to work in their favorite cloud? I'm questioning the larger picture here. |
In my opinion the But I think we are mixing concerns in this discussions now:
|
To be clear, while I think this is an environment problem, I still feel we should make this change for compat purposes. I do not think we should recommend against dual-mode, but I don't mind us making something a little bit more compatible by default when customers using common cloud environments are affected. |
(That said, we also need to reach out to these platforms to understand why they do it this way, and ask them to fix it if there is no why and it's just a config problem) |
I'm pushing on this for a few reasons. This isn't just about HttpClient. Yes, it's a key library, but there are others, and if
Are all of these broken if deployed to Azure or AWS? That sounds really strange, but if it's true, that's really bad. And if it's not true, then our understanding of the cause seems to have some gaps that we should fill in before pushing for a solution. With regards to HttpClient, we've pushed the ConnectCallback as the thing that addresses everyone's woes. Need to bind to a particular interface? Use ConnectCallback. Need to configure TTL? Use ConnectCallback. Need to set Receive/SendTimeout? Use ConnectCallback. Etc. It's going to be used, and not infrequently. Which is a good thing, except that the solution we've put forth now apparently has impliciations we didn't previously understand. For the next year, developers that follow-suit with what we've done in .NET 5 are either going to a) have to write more than a hundred lines of complicated code to get the "right" behavior, or b) suffer the same "this no longer works in the preeminent clouds" behavior, assuming that's the actual impact. That's not good. Even if we introduce a new I want to make sure we have our understanding and story straight before we rush in something "for compatibility", because while that may end up being the right answer, it has consequences. At the end of the day, the right answer may be to merge a fix into release/5.0 that reverts to the more complicated scheme, but let's really understand what's going on, having spoken with the folks at Azure and AWS, before we do so, as it has ramifications for lots of other things. And if it turns out that libraries and apps using With regards to the actual fix, @scalablecory mentioned exposing a static runtime/src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/HttpConnectionPool.cs Lines 1301 to 1316 in dc0e11d
to something like this: // Otherwise, create and connect a socket using default settings.
if (async)
{
socket = await Socket.ConnectAsync(endPoint, cancellationToken).ConfigureAwait(false);
}
else
{
socket = new Socket(SocketType.Stream, ProtocolType.Tcp);
using (cancellationToken.UnsafeRegister(static s => ((Socket)s!).Dispose(), socket))
{
socket.Connect(endPoint);
}
}
socket.NoDelay = true;
return new NetworkStream(socket, ownsSocket: true); yes? So, is the thinking the "simpler" solution is churning all this code in master, porting that back to release/5.0, and then immediately churning master again to the real .NET 6 solution? |
@stephentoub the API is indeed missing, we need to propose it: #44686 (comment)
Just added you to the email thread we have with Azure about this. The problem is understood, solution seems to be on the way, but there is no ETA. I have no idea who and how could connect AWS on this matter, so it gets priority. |
@normj can you help us with AWS side of this? |
Context on AWS: #44686 (comment) |
Totally agree. Doesn't help with the issue at hand, but something to consider for the future. |
I think we are saying both. First, the environments are broken and we should push them to fix this. But second, since this is how they behave today, you should never use dual-mode in these environments (or in code that could run in these environments) because it can't be trusted today.
Yes -- given what we know about these environments, I don't think we should ever use dual mode sockets in our own code.
I think that's going a little far. If you know your target environment supports dual-mode sockets, then feel free to use dual-mode sockets. That said, as @antonfirsov pointed out above, the structure of the API makes it such that it's often not obvious that you are using dual-mode sockets. So perhaps we should do something to address that. |
I think the answer is: All of them are broken when deployed to Azure or AWS in certain deployment configurations. What I don't have a good sense of is how common these configurations are. It would be good to get more data here. But regardless, I think we have enough data to say that they are common enough to justify changing HttpClient to not use dual-mode sockets, and probably for other libraries as well, depending on how they are deployed. |
{ | ||
// For synchronous connections, we can just create a socket and make the connection. | ||
cancellationToken.ThrowIfCancellationRequested(); | ||
var socket = new Socket(SocketType.Stream, ProtocolType.Tcp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this creating a dual-mode socket?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah ... I dont like this inconsistence either, just took the state at the commit I mentioned. (Note that there was no sync in 3.1)
The bad thing is that the ony alternatives I see is to either do sync over async or implement a sync version of DnsConnect within System.Net.Http
{ | ||
// If a ConnectCallback was supplied, use that to establish the connection. | ||
if (Settings._connectCallback != null) | ||
try |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old try catch wrapped both the ConnectCallback case as well as the default case. This one only wraps the ConnectCallback case. Why the change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an equivalent catch block in ConnectHelper.ConnectAsync
:
runtime/src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/ConnectHelper.cs
Lines 62 to 65 in 6cba345
catch (Exception error) when (!(error is OperationCanceledException)) | |
{ | |
throw CreateWrappedException(error, host, port, cancellationToken); | |
} |
runtime/src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/ConnectHelper.cs
Lines 239 to 244 in 6cba345
internal static Exception CreateWrappedException(Exception error, string host, int port, CancellationToken cancellationToken) | |
{ | |
return CancellationHelper.ShouldWrapInOperationCanceledException(error, cancellationToken) ? | |
CancellationHelper.CreateOperationCanceledException(error, cancellationToken) : | |
new HttpRequestException($"{error.Message} ({host}:{port})", error, RequestRetryType.RetryOnNextProxy); | |
} |
I agree with all of this, but I'm not sure what to do about it. I think we should be doing all of the following:
But unless we are willing to expose new API in 5.0, I think we are kinda stuck in a not ideal place here. Thoughts? |
Since it's our IPv6 support detection logic that is unreliable in these environments, I think we should also consider an environment variable to force-disable IPv6 as an alternative to this PR. Pros: Not specific to HttpClient, fixes the problem for all users of Cons: not automatic, .NET 5 migration is still a breaking change in such environments, that has to be actioned by the customers. |
Can we improve that logic? For example, right now we only try to create the socket. What happens if we try to bind it? What happens if we try to connect it (over loopback)? Etc. |
@stephentoub I would be happy to experiment with all of this ... if I had access to a repro environment, or if I knew how to create one. There were some emails about in the past 15 hours, we are working under time pressure now. |
If the default connect logic will be whole of this: runtime/src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/ConnectHelper.cs Lines 25 to 70 in 6cba345
How are we going to communicate to people how to implement their own I understand this needs to be solved for 5.0 and rather quickly, but can't we at least think of something more user-friendly before we ship this? Also, isn't the same problem happening for the sync code path? I really like the idea of solving this at the socket level. |
@antonfirsov Would it help if I gave you commit access to my repro repository? I think I'd need to change the deployment from Or, maybe I could give you publish access to the App Service so that you can push your own self-contained app directly? |
@ManickaP @jonsagara not decided yet, but we may ship a quickfix for the January update of .NET 5. If we do so we need to act ... very quickly. If we want to catch that train, we need to merge either this PR or #45893 as is. We can then continue investigations about solving this at socket level either by making IPv6 detection more robust or by shipping & backporting new static |
@jonsagara does pushing trigger deployment? |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
It does, but I don't think that will work for this case. App Service We can get around this by directly publishing your changes as a self-contained deployment. All I'd need to do is give you the Publish Profile from Azure Portal. Then, clone the repo, import the Publish Profile, and you should be able to publish and test your changes as a self-contained application. I just tested this using a currently unsupported build (.NET SDK 5.0.101), and the app started and still reproduces the bug. I can't share the Publish Profile publicly, so if you're interested, please let me know the best place to send it. Thanks! |
@jonsagara DM me on twitter: https://twitter.com/antonfrv |
@antonfirsov Will do. Will you please follow me back so that I can send you a message? https://twitter.com/jonsagara |
Closing the PR as we do not plan to fix the problem this way anymore -- see #44686 (comment) for details. |
In some environments IPV6 (thus dual-stack) sockets do not work despite
Socket.SupportsIPv6
returning true.I'm reverting the default branch of
HttpConnectionPool.ConnectToTcpHostAsync
to use the logic in 628d99b (before #39524) to fix #44686.The only thing I changed in the old code is comments.
@scalablecory @geoffkizer PTAL.