-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assertion failure 0 <= fd && fd < sysconf(_SC_OPEN_MAX) in System.Net.Mail.Functional.Tests #72830
Comments
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsDescriptionSystem.Net.Mail.Functional.Tests are failing with this assert in CI:
Reproduction StepsExample CI build: https://dev.azure.com/dnceng/public/_build/results?buildId=1897299&view=ms.vss-test-web.build-test-results-tab Expected behaviorTest doesn't fail in CI Actual behaviorTest does fail in CI, see description. Regression?Unknown Known WorkaroundsUnknown ConfigurationLinux Debug x64 Mono Interpreter Other informationNo response
|
Judging by the stacktrace and the job itself it's mono-interp |
Tagging subscribers to this area: @BrzVlad Issue DetailsDescriptionSystem.Net.Mail.Functional.Tests are failing with this assert in CI:
Reproduction StepsExample CI build: https://dev.azure.com/dnceng/public/_build/results?buildId=1897299&view=ms.vss-test-web.build-test-results-tab Expected behaviorTest doesn't fail in CI Actual behaviorTest does fail in CI, see description. Regression?Unknown Known WorkaroundsUnknown ConfigurationLinux Debug x64 Mono Interpreter Other informationNo response
|
I pasted stacks here. It appears that the mail code, or underlying networking code, is attempting to use a file descriptor of -1, which I assume is invalid. |
Tagging subscribers to this area: @dotnet/ncl Issue DetailsDescriptionSystem.Net.Mail.Functional.Tests are failing with this assert in CI:
Reproduction StepsExample CI build: https://dev.azure.com/dnceng/public/_build/results?buildId=1897299&view=ms.vss-test-web.build-test-results-tab Expected behaviorTest doesn't fail in CI Actual behaviorTest does fail in CI, see description. Regression?Unknown Known WorkaroundsUnknown ConfigurationLinux Debug x64 Mono Interpreter Other informationNo response
|
cc: @tmds seems like we do closing magic on invalid socket ...
|
Tagging subscribers to this area: @dotnet/ncl Issue DetailsDescriptionSystem.Net.Mail.Functional.Tests are failing with this assert in CI:
Reproduction StepsExample CI build: https://dev.azure.com/dnceng/public/_build/results?buildId=1897299&view=ms.vss-test-web.build-test-results-tab Expected behaviorTest doesn't fail in CI Actual behaviorTest does fail in CI, see description. Regression?Unknown Known WorkaroundsUnknown ConfigurationLinux Debug x64 Mono Interpreter Other informationNo response
|
@rzikm can you please check how often it happens? Thanks! |
Very often, 98 hits in the last 14 days. Curiously, none of these are on main |
Aside from some authentication changes, #70046 would be biggest suspect. It may not be root cause as the assert is in Sockets. I tried to reproduce it (main on Linux) but I did not get hit. We can probably look at some of the Linux/Windows core files to see what particular tests are running. |
If it is happening that often perhaps we should disable the test for now to avoid noise in CI ... @rzikm thoughts? |
Status: After re-enabling the test, we got some hits on PRs. @rzikm has actionable dump link. |
when I dump the SafeHandle, it has reasonable value...
any idea how that becomes invalid @jkotas ?
since this was last reported on macOS, it seems unlikely related to #73972. |
@AaronRobinsonMSFT Could you please take a look? The |
I can't seem to reproduce this on an M1. I will try Linux-x64 next. |
We spent days with @rzikm to reproduce it without any luck @AaronRobinsonMSFT. We really only have some dumps from CI. |
runtime/src/libraries/System.Net.Sockets/src/System/Net/Sockets/Socket.Unix.cs Lines 139 to 140 in 960e4d7
I think, this is the root cause of this issue. Because it's the only place that we can set the handle to -1 temporarily (via SafeSocketHandle oldHandle = _handle;
SafeSocketHandle newHandle;
SocketError errorCode = SocketPal.CreateSocket(_addressFamily, _socketType, _protocolType, out newHandle);
_handle = newHandle; Something like this should fix it. |
I can't quite see how that fixes it. It's an out parameter, so how is what you wrote different to the existing code? |
The problem is race condition, actually. At the beginning of the runtime/src/libraries/System.Net.Sockets/src/System/Net/Sockets/SocketPal.Unix.cs Line 59 in 960e4d7
Which means we're replacing the current In the same function we have another line to update file descriptor: runtime/src/libraries/System.Net.Sockets/src/System/Net/Sockets/SocketPal.Unix.cs Line 91 in 960e4d7
Edit: I deleted the wrong information |
Ah, I didn't realize it's multithreaded. |
Where is the code in |
Yesterday evening we were discussing it with @antonfirsov as well, after that we noticed that I mistracked the path and |
I haven't looked at the code, but I think there probably is such a path. The fix in #70046 (previously mentioned by @wfurt) was about making SendAsync keep using an open connection, see #49340 (comment). So that may have triggered the issue. It's definitely possible we seeing a race between connect replacing the handle, and SmtpClient Abort observing this half-initialized handle. |
We should double check it then, thanks for correcting me!
Proposed fix worth to try then. |
So far it has not been reported by external customers. The reports came in only from our CI. Not worth servicing 7.0.x, until we get reports from customers. |
@karelz @wfurt FYI this failure happened again today in 7.0. Based on the last comment, I won't reopen the issue, but I am pasting all the information here so this gets linked with the affected PR, and to preserve history.
Expand
|
Description
System.Net.Mail.Functional.Tests are failing with this assert in CI:
https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-72664-merge-3f079befb6de4fac81/System.Net.Mail.Functional.Tests/1/console.08ced1c9.log?helixlogtype=result
Reproduction Steps
Example CI build: https://dev.azure.com/dnceng/public/_build/results?buildId=1897299&view=ms.vss-test-web.build-test-results-tab
Expected behavior
Test doesn't fail in CI
Actual behavior
Test does fail in CI, see description.
Regression?
Unknown
Known Workarounds
Unknown
Configuration
Linux Debug x64 Mono Interpreter
Other information
No response
Report
Summary
The text was updated successfully, but these errors were encountered: