Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ArgumentOutOfRangeException at System.Net.Security.SslStream.ProcessBlob #62109

Closed
liiri opened this issue Nov 28, 2021 · 22 comments · Fixed by #63184
Closed

ArgumentOutOfRangeException at System.Net.Security.SslStream.ProcessBlob #62109

liiri opened this issue Nov 28, 2021 · 22 comments · Fixed by #63184
Labels
area-System.Net.Security os-linux Linux OS (any supported distro)
Milestone

Comments

@liiri
Copy link

liiri commented Nov 28, 2021

Description

I'm getting recurring but non consistent exceptions in couple of my environments running the same code under the Docker image mcr.microsoft.com/dotnet/aspnet:6.0-bullseye-slim

This only happened in environments where I have an SSL certificate provisioned by certbot, but I don't know if its related. Certificate is signed and up to date.

Reproduction Steps

not consistent

Expected behavior

No errors on Kestrel level

Actual behavior

Microsoft.AspNetCore.Server.Kestrel: Unhandled exception while processing 0HMDG00D520FK.
System.ArgumentOutOfRangeException: Specified argument was out of the range of valid values.
   at System.Net.Security.SslStream.ProcessBlob(Int32 frameSize)
   at System.Net.Security.SslStream.ReceiveBlobAsync[TIOAdapter](TIOAdapter adapter)
   at System.Net.Security.SslStream.ForceAuthenticationAsync[TIOAdapter](TIOAdapter adapter, Boolean receiveFirst, Byte[] reAuthenticationData, Boolean isApm)
   at Microsoft.AspNetCore.Server.Kestrel.Https.Internal.HttpsConnectionMiddleware.OnConnectionAsync(ConnectionContext context)
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Infrastructure.KestrelConnection`1.ExecuteAsync()

Regression?

Did not observe this in dotnet 5 or core 3.1 that I run beforehand

Known Workarounds

No response

Configuration

Dotnet 6.0.100
OS is Ubuntu 18.0.4 x86_64 , running Docker image mcr.microsoft.com/dotnet/aspnet:6.0-bullseye-slim`

Other information

No response

@dotnet-issue-labeler dotnet-issue-labeler bot added area-System.Net.Security untriaged New issue has not been triaged by the area owner labels Nov 28, 2021
@ghost
Copy link

ghost commented Nov 28, 2021

Tagging subscribers to this area: @dotnet/ncl, @vcsjones
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

I'm getting recurring but non consistent exceptions in couple of my environments running the same code under the Docker image mcr.microsoft.com/dotnet/aspnet:6.0-bullseye-slim

This only happened in environments where I have an SSL certificate provisioned by certbot, but I don't know if its related. Certificate is signed and up to date.

Reproduction Steps

not consistent

Expected behavior

No errors on Kestrel level

Actual behavior

Microsoft.AspNetCore.Server.Kestrel: Unhandled exception while processing 0HMDG00D520FK.
System.ArgumentOutOfRangeException: Specified argument was out of the range of valid values.
   at System.Net.Security.SslStream.ProcessBlob(Int32 frameSize)
   at System.Net.Security.SslStream.ReceiveBlobAsync[TIOAdapter](TIOAdapter adapter)
   at System.Net.Security.SslStream.ForceAuthenticationAsync[TIOAdapter](TIOAdapter adapter, Boolean receiveFirst, Byte[] reAuthenticationData, Boolean isApm)
   at Microsoft.AspNetCore.Server.Kestrel.Https.Internal.HttpsConnectionMiddleware.OnConnectionAsync(ConnectionContext context)
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Infrastructure.KestrelConnection`1.ExecuteAsync()

Regression?

Did not observe this in dotnet 5 or core 3.1 that I run beforehand

Known Workarounds

No response

Configuration

Dotnet 6.0.100
OS is Ubuntu 18.0.4 x86_64 , running Docker image mcr.microsoft.com/dotnet/aspnet:6.0-bullseye-slim`

Other information

No response

Author: liiri
Assignees: -
Labels:

area-System.Net.Security, untriaged

Milestone: -

@wfurt
Copy link
Member

wfurt commented Nov 29, 2021

Can you get packet captures @liiri and step in with debugger to get value of frameSize. There were some changes in 6.0 to process more frames at once when possible so there can be new bugs. However there is not enough information to investigate. I cannot say for sure but the certificate seems unlikely, unless it is somewhat unusual - like very big. This part of the code really deals with IO and TLS frames so it is more matter of size and framing.

@liiri
Copy link
Author

liiri commented Nov 30, 2021

I will try to get more information, would have if it was simple. This never occurred in an environment where a debugger is handy to attach to.

Can you offer any workaround to catch this exception or retry the blob receive?

@ghost ghost added needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration and removed needs more info labels Nov 30, 2021
@wfurt
Copy link
Member

wfurt commented Nov 30, 2021

This is inbound connection, right? Any idea if this happens with particular client?

One more thought:
Can you possibly try https://docs.microsoft.com/en-us/dotnet/api/system.appdomain.firstchanceexception?view=net-6.0
And than possibly "Environment.FailFast() if e is System.ArgumentOutOfRangeException" That would create core dump if the system is properly configured. (needs coredump_filter=0x3f https://github.com/dotnet/diagnostics/blob/main/documentation/debugging-coredump.md)
There are other ways how to get dump but if we can get one from when this happen, we can likely solve the mystery.

@karelz karelz added the os-linux Linux OS (any supported distro) label Nov 30, 2021
@liiri
Copy link
Author

liiri commented Nov 30, 2021

This is inbound connection, from browser, most likely Chrome

@karelz
Copy link
Member

karelz commented Dec 7, 2021

@liiri did you get chance to try it out?

@karelz karelz added needs more info and removed needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration labels Dec 7, 2021
@ghost

This comment has been minimized.

@liiri
Copy link
Author

liiri commented Dec 9, 2021

@liiri did you get chance to try it out?

I've set up the dump to catch this exception, waiting for it to reproduce

@ghost ghost added needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration and removed needs more info labels Dec 9, 2021
@karelz karelz added needs more info and removed needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration labels Dec 9, 2021
@ghost

This comment has been minimized.

@liiri
Copy link
Author

liiri commented Dec 14, 2021

I managed to reproduce the issue, and did set the dump configuration as advised, but no dump was generated. Reading around questions like https://stackoverflow.com/questions/1134048/generating-net-crash-dumps-automatically , it seems to me that it might not be as trivial to create a minidump when running in docker, as there is no "other" process that can create the minidump for us.
Any idea? should I address it in a different issue?

@ghost ghost added needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration and removed needs more info labels Dec 14, 2021
@karelz
Copy link
Member

karelz commented Dec 14, 2021

@liiri it seems to be specific to Docker environment. We are not deeply familiar with it and the problems don't seem to be specific to .NET. Perhaps you can find some help on Docker or general (StackOverflow) forums?

Based on quick search, these articles might be useful:

@liiri
Copy link
Author

liiri commented Dec 14, 2021

Thanks, I'll look into it, but I don't think I'll be able to provide the mini dump anytime soon.
Feel free to close the issue if it cannot be further investigated without more info.

@wfurt
Copy link
Member

wfurt commented Dec 19, 2021

Can you try something like this @liiri ?

using System;
using System.Runtime.ExceptionServices;
using Microsoft.Diagnostics.NETCore.Client;

namespace dump
{
    class Program
    {
        public static void  WriteDump(object source, FirstChanceExceptionEventArgs e)
        {
            if (e.Exception is ArgumentOutOfRangeException)
            {
                int pid = Environment.ProcessId;
                var client = new DiagnosticsClient(pid);
                //client.WriteDump(DumpType.Normal, "/tmp/minidump.dmp");
                client.WriteDump(DumpType.Full, $"/tmp/dump.dmp.{pid}");
            }
        }

        static void Main(string[] args)
        {
            AppDomain.CurrentDomain.FirstChanceException += WriteDump;
            Console.WriteLine("Hello, World!");
            try
            {
                throw new  ArgumentOutOfRangeException("BOO");
            }
            catch {};

            Console.WriteLine("All done");
        }
    }
}

You will need to add reference to Microsoft.Diagnostics.NETCore.Client package but this should give you option to write dump without any OS support. I did quick test inside container and it seems to work fine without need to add privileged option.

The dump will be large and it will contain your private keys (and perhaps other sensitive data). I would still probably start with Full dump and fall-back to Normal if that gives you grief
https://docs.microsoft.com/en-us/dotnet/core/diagnostics/microsoft-diagnostics-netcore-client

You can either send me private email with location or I can walk you through the dump to get some insight.

@liiri
Copy link
Author

liiri commented Dec 23, 2021

Thanks, this seemed to have worked, forwarding dumps link by email

@liiri
Copy link
Author

liiri commented Dec 26, 2021

It seems the requests were made using Ssl2, which shouldn't be allowed. This may be related to https://docs.microsoft.com/en-us/dotnet/core/compatibility/aspnet-core/5.0/kestrel-default-supported-tls-protocol-versions-changed
Will add the ConfigureHttpsDefaults snippet and see if issue still reproduces. The requests in question may be done as part of some random attacks, as this endpoint is exposed publicly.

@wfurt
Copy link
Member

wfurt commented Dec 29, 2021

That part will probably not matter. As far as I can tell, this happens before the platform code even executes.
Here is what I see from one of the dumps:

image

The bytes sequence does not seems to be valid TLS frame. It is either some fuzzer/garbage or Kestrel somehow mangles the data (cc: @Tratcher in case there is some known issue)

Either one should not matter - we should not get ArgumentOutOfRangeException.
Since we fail to recognize validation frame, _framing will be Unified and we hit.

if (_framing != Framing.SinceSSL3)
{
#pragma warning disable 0618
_lastFrame.Header.Version = SslProtocols.Ssl2;
#pragma warning restore 0618
_lastFrame.Header.Length = GetFrameSize(_handshakeBuffer.ActiveReadOnlySpan) - TlsFrameHelper.HeaderSize;
}

Given sequence of input bytes gives _lastFrame.Header.Length = 12797 + 5 bytes header is 12802 -> matching the ActiveStart above.

The interesting one is the AvailableStart . That basically points to where next bytes should go, suggesting that we got 182 bytes so far. Looking at the actual buffer, there are mostly nonzero bytes in first 182 bytes, followed by all zeros.

We should read rest of the frame or fail but we don't seem to

if (_handshakeBuffer.ActiveLength < frameSize)
{
await FillHandshakeBufferAsync(adapter, frameSize).ConfigureAwait(false);
}

and then

private ProtocolToken ProcessBlob(int frameSize)
{
int chunkSize = frameSize;
ReadOnlySpan<byte> availableData = _handshakeBuffer.ActiveReadOnlySpan;
// Discard() does not touch data, it just increases start index so next
// ActiveSpan will exclude the "discarded" data.
_handshakeBuffer.Discard(frameSize);

This updates the _activeStart, that makes ActiveLength negative and we fail slicing the buffer

return _context!.NextMessage(availableData.Slice(0, chunkSize));

For some reason FillHandshakeBufferAsync does not read the full frame. It may be inconsistent EOF handling between sync/async path but that would not explain why 5.0 would work -> the code is mostly identical.

Is there any functional impact besides annoyance @liiri?
I'll proceed with fix to avoid the ArgumentOutOfRangeException but I think the session will fail anyway with different exception since the input looks invalid.

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Dec 29, 2021
@liiri
Copy link
Author

liiri commented Dec 29, 2021

There is no functional impact, but we do get an unhandled exception which is alarming our system. Any way to catch this exception in some middleware or other callback?

I would still note that these fuzzy requests were only recorded in environments where our ASP server is exposed without any intermediate load balancer or gateway. In SaaS installations where we use a Kubernetes Ingress, we never encountered these errors.

@wfurt
Copy link
Member

wfurt commented Dec 29, 2021

@Tratcher would probably be best person to answer the question about the middleware.

@wfurt wfurt removed untriaged New issue has not been triaged by the area owner needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration labels Dec 29, 2021
@wfurt wfurt added this to the 7.0.0 milestone Dec 29, 2021
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Dec 31, 2021
@liiri
Copy link
Author

liiri commented Jan 2, 2022

The issue of handling this or the more accurate exception is still relevant, would you rather I open a new issue?

@Tratcher
Copy link
Member

Tratcher commented Jan 3, 2022

@liiri handling the exception would be an issue for the AspNetCore/Kestrel layer. You should be able to test the change in a few days using the build from https://github.com/dotnet/installer#installers-and-binaries. Once you see the new error you can ask over at https://github.com/dotnet/aspnetcore if you're still having trouble handling it.

@wfurt
Copy link
Member

wfurt commented Jan 4, 2022

No need for separate issue @liiri. I was primarily asking for servicing. I don't think this would qualify for back port to 6.0 but @karelz may have different prospective. In either case, it would be great if you could deploy daily build with the fix and provide feedback.

@karelz
Copy link
Member

karelz commented Jan 4, 2022

Do I understand it correctly that the difference of the fix is just which exception is thrown? (IOException vs. ArgumentOutOfRangeException)
If that is the case, unless we have more users that are hit by this where the type of exception would make substantial difference for them, I agree that it does not meet the servicing bar.

@ghost ghost locked as resolved and limited conversation to collaborators Feb 3, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Net.Security os-linux Linux OS (any supported distro)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants