Connection Activities in SocketsHttpHandler #101814

antonfirsov · 2024-05-02T18:48:29Z

Resolves #93832 by introducing an activity for every HTTP/1.1 and HTTP/2.2 connection, emitting ActivityEvent at DNS/TCP/TLS completion and linking the connection activity to the request activity when the request is assigned to a connection from the pool.

Marking as a draft for now, since there are several open questions:

Do we also need a DiagnosticListener like in DiagnosticsHandler or are we OK just publishing the activities?
Emitting an ActivityEvent doesn't publish notifications on ActivityListeners. However, when a request is completed, the linked connection activity should contain all the events by that time. Would this be "interactive enough" for Aspire?
The proposed implementation does not emit DNS & TCP events when ConnectCallback is set. To overcome this, we would need to grab Activity.Current in downlevel libraries, create sub-activities instead of events, or expose an API so users can emit these events manually in their callbacks. Do we expect Aspire users to use ConnectCallback frequently enough to make it worth the trouble?
The proposed solution doesn't emit events for HTTP/3. Is this limitation problematic? This would need to be solved in System.Net.Quic (again either by emitting events on Activity.Current or by creating sub-activities instead of events).
The lifetime of the activity doesn't match the span of the http.client.connection.duration metric, since the latter starts measuring time only after a successful TLS handshake. Do we need to harmonize the two?
What should be the naming convention for the events emitted? Are there any rules?

cc @samsp-msft @noahfalk @MihaZupan @lmolkova

samsp-msft · 2024-05-02T19:03:01Z

Do we also need a DiagnosticListener like in DiagnosticsHandler or are we OK just publishing the activities?

I don't think so - although related, we don't need to do a diagnostic source for this scenarios

Emitting an ActivityEvent doesn't publish notifications on ActivityListeners. However, when a request is completed, the linked connection activity should contain all the events by that time. Would this be "interactive enough" for Aspire?

Yes. The Aspire dashboard will be able to resolve links when we implement dotnet/aspire#2577. This will be a forcing function to make that happen.

The proposed implementation does not emit DNS & TCP events when ConnectCallback is set. To overcome this, we would need to grab Activity.Current in downlevel libraries, create sub-activities instead of events, or expose an API so users can emit these events manually in their callbacks. Do we expect Aspire users to use ConnectCallback frequently enough to make it worth the trouble?

According to @ReubenBond, we don't use it in Aspire

The proposed solution doesn't emit events for HTTP/3. Is this limitation problematic? This would need to be solved in System.Net.Quic (again either by emitting events on Activity.Current or by creating sub-activities instead of events).

Http/3 usage is still uncommon, and more complicated due to a QUIC connection being more virtual than physical. We should flush out the scenarios with 1.1/2 and then look at HTTP/3 later.

The lifetime of the activity doesn't match the span of the http.client.connection.duration metric, since the latter starts measuring time only after a successful TLS handshake. Do we need to harmonize the two?

It would be better if we can be consistent, but not if that causes a bunch of extra trouble for that metric.

What should be the naming convention for the events emitted? Are there any rules?
We should follow similar patterns to semantic conventions for name/value pairs and activity names.

@lmolkova probably has recommendations.

src/libraries/System.Net.Http/src/System/Net/Http/DiagnosticsHandlerLoggingStrings.cs

MihaZupan · 2024-05-02T19:16:23Z

src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/HttpConnectionBase.cs

@@ -104,6 +108,17 @@ public void MarkConnectionAsNotIdle()
            _connectionMetrics?.IdleStateChanged(idle: false);
        }

+        public void LinkRequestActivity()
+        {
+            Activity? requestActivity = Activity.Current;


Nit: can we check _activity is null before looking up Activity.Current to make it a bit cheaper when disabled?

MihaZupan · 2024-05-02T19:16:50Z

src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/HttpConnectionBase.cs

+                return;
+            }
+
+            requestActivity.AddLink(new ActivityLink(_activity.Context));


Is tooling/infra prepared to deal with there being millions of links to a given connection?

Millions of links to is not a problem, millions of links from could be!

samsp-msft · 2024-05-02T22:24:16Z

src/libraries/System.Net.Http/src/System/Net/Http/DiagnosticsHandlerLoggingStrings.cs

+        public const string RequestActivityStopName        = RequestActivityName + ".Stop";
+
+        public const string ConnectionNamespace            = "System.Net.Http.Connections";
+        public const string ConnectionActivityName = ConnectionNamespace + ".HttpConnectionOut";


Is it a convention to have an Out postfix on these names?

There is a .NET convention to do so, but not an OTel convention. In general we probably want to sync with @lmolkova on naming and conventions here. If we can determine an OTel convention that would be preferable.

AFAIK this one is OperationName and I think has no OTel equivalent.
For the DisplayName, we'd need to define a new span and span name.

Does this one last for the connection lifetime? If so I'd use something like

connection {server.address}:{server.port}

I am trying to figure out what is more clear in a trace:
System.Net.Http.Connections.HttpConnectionOut
or
System.Net.Http.Connections.HttpConnection ?

AFAIK this one is OperationName and I think has no OTel equivalent.
For the DisplayName, we'd need to define a new span and span name.

OperationName is the default value used for DisplayName unless it is overriden somewhere. I don't think we get any value picking one string for OperationName and a different string for DisplayName. So we probably want OperationName = DisplayName = connection {server.address}:{server.port}

I am trying to figure out what is more clear in a trace

What gets transmitted by OTLP and rendered in the UI should be DisplayName and @lmolkova is suggesting that be "connection {server.address}:{server.port}". I think that would be clearer than either of those namespaced names no?

OperationName could still be important for the DiagnosticSource (DS -> EventSource) as the event name and then it's nice to have a const string for it, but you all know much more about it than I do.

OperationName could still be important for the DiagnosticSource

This PR is not adding DiagnosticSource / DiagnosticListener, only ActivitySource + activities, because this is what #93832 is about in my understanding. Is there a reason for also adding a DiagnosticListener like we have it in DiagnosticHandler? I know little about how is this being consumed.

So we probably want OperationName = DisplayName = connection {server.address}:{server.port}

Given that OperationName has no OTel equivalent according to #101814 (comment), IMO it would be better if the OperationName followed the existing .NET convention for consistency. We can use a different DisplayName if we want. In that case wouldn't we also want to change the DisplayName for the request activity, or do we consider it a breaking change we better avoid?

Given that OperationName has no OTel equivalent according to #101814 (comment), IMO it would be better if the OperationName followed the existing .NET convention for consistency.

👍

In that case wouldn't we also want to change the DisplayName or the request activity, or do we consider it a breaking change we better avoid?

I would consider it part of onboarding onto HTTP semantic conventions which define what span name looks like.
My 2c on if changing DisplayName is breaking: existing otel http instrumentations set span name according to the conventions and it could only be breaking to something else like vanilla ActivityListener, but they'd still have the same OperationName as before

OperationName could still be important for the DiagnosticSource (DS -> EventSource) as the event name and then it's nice to have a const string for it, but you all know much more about it than I do.

You can filter on Activity name as part of DiagnosticSourceEventSource and ActivityListener.Start/Stop, but not as part of ActivityListener.Sample or OTel. Given that OTel is a primary scenario we want to support I'd want to ensure that ActivitySource alone is sufficiently fine-grained to do the filtering we'd like to do. It won't harm anything to pick some constant value for OperationName that differs from DisplayName, but ideally nothing should be relying on us doing that and we'd move away from showing OperationName in our tools as a distinct piece of data.

samsp-msft · 2024-05-02T22:32:16Z

@antonfirsov - would it be possible to show the output from a basic scenario. The easiest way to do that is to use the consoleExporter for Tracing.

Add a nuget ref for OpenTelemetry.Exporter.Console

And then use

        builder.Services.AddOpenTelemetry()
            .WithTracing(tracing =>
            {
              tracing.AddSource("System.Net.Http.Connections");
              tracing.AddSource("System.Net.Http");
              tracing.AddConsoleExporter();
             });

noahfalk · 2024-05-07T02:45:28Z

Emitting an ActivityEvent doesn't publish notifications on ActivityListeners. However, when a request is completed, the linked connection activity should contain all the events by that time. Would this be "interactive enough" for Aspire?

Yes. The Aspire dashboard will be able to resolve links when we implement dotnet/aspire#2577. This will be a forcing function to make that happen.

Beware the timing of when the Activity gets sent may make the UX awkward when trying to view the telemetry data live (or close to live). The OTel SDK and exporter do not transmit any information about an Activity until it has stopped as far as I am aware. If a request completes but the connection that serviced the request doesn't close until 10 minutes later that means there will be a 10 minute period where the event timings haven't been transmitted yet. If you want the timings to be available more quickly in the UI the options I can think of are:

Update the OTLP spec to allow sending information about incomplete Activities, then update OTel.NET exporter to do that. This is probably a lengthy path as it involves lots of discussion, consensus building, and coordination.
Instead of Activity events, use structured log messages.

.../System.Net.Http/src/System/Net/Http/SocketsHttpHandler/ConnectionPool/HttpConnectionPool.cs

lmolkova · 2024-05-07T03:34:27Z

src/libraries/System.Net.Http/src/System/Net/Http/SocketsHttpHandler/ConnectHelper.cs

@@ -74,10 +74,12 @@ public static async ValueTask<SslStream> EstablishSslConnectionAsync(SslClientAu
                        sslStream.AuthenticateAsClient(sslOptions);
                    }
                }
+                activity?.AddEvent(new ActivityEvent("Tls.Authenticated"));


I'd prefer to use spans and avoid using span events at all costs:

A child span for TLS handshake would have a duration and timestamp

It won't be verbose since connections are long-lived (much longer than requests)

It can live without HTTP instrumentation on

It can be turned off (if coming from a different source) while HTTP is on

Span events are likely to be deprecated in the future in favor of independent Event API similar to logging

We'll have standard means to record errors/duration.

I'd prefer to use spans and avoid using span events at all costs:

That seems fine to me. I mentioned in another comment that there was a timing issue where one long activity for the entire connection wouldn't transmit the events until the connection closed. I suggested changing the events to log messages as one potential solution, but changing them to spans also solves that problem.

It won't be verbose since connections are long-lived (much longer than requests)

I assume at the limit we might have one connection per request, but I hope such scenarios are uncommon in any high throughput service. Worst case if connection activity overhead was too high the app developer can always sample them or turn them off.

I like the advantages that come from implementing this with sub activities. Another big advantage is that it would also work if ConnectCallback is present, since we would implement them in downstream libraries. However, there are some challanges:

It increases the scope of the problem, since we would need to touch multiple System.Net.* libraries. We may need to pull in more reviewers from the networking team.

If we want to add attributes to those sub-activities, it increases the scope even further since we would need to make decisions on the characteristics of those attributes (eg. naming, possible values for attributes like error.type, etc). As I also noted in Connection Activities in SocketsHttpHandler #101814 (comment), my recommendation would be to track this in a separate issue.

If there is an Activity.Current to be used as a parent activity, and System.Net.Http.Connections.HttpConnectionOut is not enabled, the DNS/TCP/TLS activity would attach to the Http activity as parent which is somewhat unfortunate. We would need to document that users should avoid this.

Regardless, I agree sub-activities are better. If we have consensus, I will change this PR to add activities in downstream libs. @MihaZupan any concerns?

changing the events to log messages

Just to not exclude this option completely yet: @noahfalk do you mean a writing to DiagnosticListener by this or some other technique?

If there is an Activity.Current to be used as a parent activity, and System.Net.Http.Connections.HttpConnectionOut is not enabled, the DNS/TCP/TLS activity would attach to the Http activity as parent which is somewhat unfortunate. We would need to document that users should avoid this.

This is why when we originally talked I said not to use Activity.Current and be explicit about passing in the activity if you can.
If the DNS and/or TLS activities are invoked and the parent is not a connection (As the source isn't enabled) , then they can look at the activity and see if its a connection or not and therefore not parent themselves to it. You will still see them on the overall timeline.

The other hack would be to put all these activities in the same source, so they can only be enabled/disabled as a group. But as I think they would be useful for other scenarios beyond http connections, that really does seem like a hack.

I was thinking about something that would ultimately be ILogger but later recalled that I don't think ILogger is easy to generate from the level of this code in the stack. Technically there are ways around that but I wouldn't worry about pursuing it any further. I think sub-activities are both easier and better.

But as I think they would be useful for other scenarios beyond http connections

This is why I would prefer to create the sub-activities in downlevel libs. System.Net.Sockets|NameResolution|Security would all have their own ActivitySource-s.

passing in the activity if you can.

It's not possible to pass it to downlevel libraries without significant (and IMO bloated) API additions, which we likely don't want.

lmolkova · 2024-05-07T03:48:22Z

.../System.Net.Http/src/System/Net/Http/SocketsHttpHandler/ConnectionPool/HttpConnectionPool.cs

            switch (_kind)
            {
                case HttpConnectionKind.Http:
                case HttpConnectionKind.Https:
                case HttpConnectionKind.ProxyConnect:
-                    stream = await ConnectToTcpHostAsync(_originAuthority.IdnHost, _originAuthority.Port, request, async, cancellationToken).ConfigureAwait(false);
+                    stream = await ConnectToTcpHostAsync(_originAuthority.IdnHost, _originAuthority.Port, request, async, activity, cancellationToken).ConfigureAwait(false);


is there a chance that we'd need to instrument connections outside of HTTP client stack?
If so, would we regret not adding these ones to Socket and friends?

lmolkova · 2024-05-07T03:50:51Z

.../System.Net.Http/src/System/Net/Http/SocketsHttpHandler/ConnectionPool/HttpConnectionPool.cs

@@ -666,13 +680,19 @@ private async ValueTask<Stream> ConnectToTcpHostAsync(string host, int port, Htt
                    {
                        if (async)
                        {
-                            await socket.ConnectAsync(endPoint, cancellationToken).ConfigureAwait(false);
+                            IPAddress[] addresses = await Dns.GetHostAddressesAsync(host, cancellationToken).ConfigureAwait(false);
+                            activity?.AddEvent(new ActivityEvent("Dns.Resolved"));


same here on events. Spans would provide much more value and flexibility than events.

lmolkova · 2024-05-07T03:52:44Z

.../System.Net.Http/src/System/Net/Http/SocketsHttpHandler/ConnectionPool/HttpConnectionPool.cs

+                            IPAddress[] addresses = await Dns.GetHostAddressesAsync(host, cancellationToken).ConfigureAwait(false);
+                            activity?.AddEvent(new ActivityEvent("Dns.Resolved"));
+                            await socket.ConnectAsync(addresses, port, cancellationToken).ConfigureAwait(false);
+                            activity?.AddEvent(new ActivityEvent("Transport.Connected"));


The metric that shows how long it takes to establish a connection and a span that shows an establishment attempt is extremely valuable, while event only allow to visually see it on a trace.

Query that would calculate duration between parent activity start time to event timestamp would be not a great experience.

lmolkova · 2024-05-07T03:58:03Z

.../System.Net.Http/src/System/Net/Http/SocketsHttpHandler/ConnectionPool/HttpConnectionPool.cs

        {
            Stream? stream = null;
            IPEndPoint? remoteEndPoint = null;
+            Activity? activity = CreateActivity()?.Start();


let's make sure we don't forget to pick relevant attributes for this one and hopefully TLS, DNS and connect spans.

lmolkova · 2024-05-07T03:58:30Z

.../System.Net.Http/src/System/Net/Http/SocketsHttpHandler/ConnectionPool/HttpConnectionPool.cs

@@ -689,9 +709,10 @@ private async ValueTask<Stream> ConnectToTcpHostAsync(string host, int port, Htt
            }
            catch (Exception ex)
            {
+                activity?.Stop();


we'll need to record error.type

This PR doesn't add attributes and it would be weird to record only error.type. We are tracking the addition of attributes to the request activity in #93019. We may consider adding attributes to the connection span as well, but I would do it in a separate PR together with or after resolving #93019. @samsp-msft do you think think this would be valuable? If yes, my recommendation is to open a separate tracking issue.

sounds good!

…andlerLoggingStrings.cs Co-authored-by: Miha Zupan <mihazupan.zupan1@gmail.com>

Co-authored-by: Noah Falk <noahfalk@users.noreply.github.com>

dotnet-policy-service · 2024-06-16T16:46:09Z

Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it.

antonfirsov added 4 commits April 22, 2024 15:28

implement Activity.AddLink

080ed80

.

311adb6

Merge branch 'main' into connection-activities-01

d4daf03

basic support for connection activities & events

7a40693

antonfirsov added the area-System.Net.Http label May 2, 2024

antonfirsov marked this pull request as draft May 2, 2024 18:48

dotnet-policy-service bot assigned antonfirsov May 2, 2024

Merge branch 'main' into connection-activities-01

6362c14

MihaZupan reviewed May 2, 2024

View reviewed changes

samsp-msft reviewed May 2, 2024

View reviewed changes

build-analysis bot mentioned this pull request May 3, 2024

[8.0][ios][arm64][mono] Build error: SecKeychainUnlock signing-certs.keychain-db: user name or passphrase incorrect #101830

Closed

noahfalk reviewed May 7, 2024

View reviewed changes

.../System.Net.Http/src/System/Net/Http/SocketsHttpHandler/ConnectionPool/HttpConnectionPool.cs Outdated Show resolved Hide resolved

lmolkova reviewed May 7, 2024

View reviewed changes

antonfirsov and others added 4 commits May 9, 2024 14:39

Merge branch 'main' into connection-activities-01

d9f41ac

Merge branch 'main' into connection-activities-01

6147532

Update src/libraries/System.Net.Http/src/System/Net/Http/DiagnosticsH…

927d3bf

…andlerLoggingStrings.cs Co-authored-by: Miha Zupan <mihazupan.zupan1@gmail.com>

apply suggestion

3c7167e

Co-authored-by: Noah Falk <noahfalk@users.noreply.github.com>

apply suggestions

8cb8dbb

build-analysis bot mentioned this pull request May 17, 2024

Test failure: Unknown chain building error in System.Net.Security.CertificateValidationRemoteServer.ConnectWithRevocation_WithCallback #101835

Closed

dotnet-policy-service bot closed this Jun 16, 2024

karelz added this to the 9.0.0 milestone Jun 24, 2024

antonfirsov mentioned this pull request Jun 24, 2024

Activities for Http Connections, Dns, Sockets and SslStream #103922

Merged

github-actions bot locked and limited conversation to collaborators Jul 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Connection Activities in SocketsHttpHandler #101814

Connection Activities in SocketsHttpHandler #101814

antonfirsov commented May 2, 2024

samsp-msft commented May 2, 2024 •

edited

Loading

MihaZupan May 2, 2024

MihaZupan May 2, 2024

samsp-msft May 2, 2024

samsp-msft May 2, 2024

noahfalk May 7, 2024 •

edited

Loading

lmolkova May 7, 2024

samsp-msft May 8, 2024

noahfalk May 9, 2024

lmolkova May 9, 2024 •

edited

Loading

antonfirsov May 9, 2024 •

edited

Loading

lmolkova May 9, 2024

noahfalk May 9, 2024

samsp-msft commented May 2, 2024

noahfalk commented May 7, 2024

lmolkova May 7, 2024

noahfalk May 8, 2024

antonfirsov May 9, 2024

samsp-msft May 9, 2024

noahfalk May 9, 2024

antonfirsov May 9, 2024

lmolkova May 7, 2024

lmolkova May 7, 2024

lmolkova May 7, 2024

lmolkova May 7, 2024

lmolkova May 7, 2024

antonfirsov May 9, 2024 •

edited

Loading

lmolkova May 9, 2024

dotnet-policy-service bot commented Jun 16, 2024

Connection Activities in SocketsHttpHandler #101814

Connection Activities in SocketsHttpHandler #101814

Conversation

antonfirsov commented May 2, 2024

samsp-msft commented May 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

noahfalk May 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lmolkova May 9, 2024 • edited Loading

Choose a reason for hiding this comment

antonfirsov May 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

samsp-msft commented May 2, 2024

noahfalk commented May 7, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

antonfirsov May 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dotnet-policy-service bot commented Jun 16, 2024

samsp-msft commented May 2, 2024 •

edited

Loading

noahfalk May 7, 2024 •

edited

Loading

lmolkova May 9, 2024 •

edited

Loading

antonfirsov May 9, 2024 •

edited

Loading

antonfirsov May 9, 2024 •

edited

Loading