Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Support For Multiplexing System.Diagnostics.Metrics for Dotnet Monitor and Dotnet Counters #86504

Merged
merged 32 commits into from
Jul 10, 2023

Conversation

kkeirstead
Copy link
Member

@kkeirstead kkeirstead commented May 19, 2023

This PR corresponds to dotnet/diagnostics#3889 from the diagnostics repo to support multiplexing for System.Diagnostics.Metrics. This change converts dotnet monitor and dotnet counters to use a shared session, assuming the MaxHistograms, MaxTimeSeries, and IntervalSeconds are all the same. The code is in a functional state for anyone who wants to interact with it locally; however, there may still be some bugs (I haven't done extensive testing yet).

I'll call attention to specific areas where I'm looking for feedback via comments.

@wiktork @davmason @noahfalk

@@ -45,6 +45,8 @@ internal sealed class MetricsEventSource : EventSource
{
public static readonly MetricsEventSource Log = new();

private const string SharedSessionId = "SHARED";
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in the diagnostics repo, using the previously discussed "SHARED" value to signify these special sessions


string commandSessionId = GetSessionId(command);

if ((command.Command == EventCommand.Update
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of the actual code here needs to be cleaned up so don't bother too much with that, but here's the main idea:

  • If we don't currently have a shared session, proceed as we would before
  • If we have an active shared session, see if we can fuse with it (by checking that the args are consistent) -> if not, we give a MultipleSessionsConfiguredIncorrectlyError. -> if so, we call Update on the _aggregationManager.
  • We do ref counting for enables/disables - once all SHARED sessions have been disabled, this frees the _aggregationManager, allowing a new session to start.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to EventSource limitations I think you will struggle to do refcounting. I believe the only thing you can know for sure is that when IsEnabled() returns false then there are no sessions active. I think @davmason remembers these details better than I though.

(I ignored looking at the code in more detail, as you suggested)

@@ -16,7 +16,7 @@ namespace System.Diagnostics.Metrics.Tests
public class MetricEventSourceTests
{
ITestOutputHelper _output;
const double IntervalSecs = 10;
const double IntervalSecs = 4;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend mostly ignoring the tests file for this draft - I have added a lot of tests around this (all of which pass), but there's a lot of cleanup that needs to happen here since I was adding new tests any time I found a bug (which means there's probably a lot of redundancy).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests are now in a more cleaned-up state for anyone who wants to take a look (I still need to change this variable back and remove the commenting for the OuterLoop attribute)

private void IncrementRefCount(string uniqueIdentifier, EventCommandEventArgs command)
{
// Could be unsafe if UniqueIdentifier protocol isn't followed
if (command.Arguments!.TryGetValue("UniqueIdentifier", out string? uniqueIdentifierArg))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just reject shared sessions that do not have UniqueIdentifiers?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that's possible...the way Shared sessions work, every consumer (e.g. dotnet-monitor) with a SHARED session id will listen to the events, unless the UniqueIdentifier (now renamed to ClientId) has a MultipleSessionsConfiguredIncorrectlyError. Without a ClientId, we have no way to communicate to that particular consumer that they shouldn't be allowed to listen to the SHARED session. In theory you could shut down the whole SHARED session, but that doesn't necessarily feel like a great solution either.

if (command.Command == EventCommand.Update
|| command.Command == EventCommand.Enable)
{
IncrementRefCount(commandSessionId, command);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davmason - EventSource sends Enable events even when sessions are being disabled right, except for the last one? That makes it impossible to do this kind of ref-counting and why we had to resort to testing IsEnabled() instead for EventCounters as I recall.

Is behavior documented anywhere? I always struggle to remember it, nobody guesses it because its unintuitive and pages where I'd expect to find it don't mention it:
https://learn.microsoft.com/en-us/dotnet/api/system.diagnostics.tracing.eventcommand?view=net-7.0
https://learn.microsoft.com/en-us/dotnet/api/system.diagnostics.tracing.eventsource.oneventcommand?view=net-7.0#system-diagnostics-tracing-eventsource-oneventcommand(system-diagnostics-tracing-eventcommandeventargs)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm leaving this as unresolved for now (I'm not sure if @davmason had a chance to take a look), but assuming the ClientId protocol is followed by all consumers, is there a reason why this can't/shouldn't work? In my own testing the ref counting seems to work properly (I tested with DM metrics, DM live metrics, and Dotnet Counters simultaneously), but I may be lacking some depth in understanding around this issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I didn't see Noah's comment. The underlying providers (ETW and EventPipe) send an enable command for every disabled session except the one that turns the provider off completely, but EventPipe will translate those to Enables so you don't have to worry about it.

The problem with ref counting is that we can't guarantee an even number of Enables and Disables with EventListeners. An EventListener can send as many Enables as they want and will send at most one Disable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For EventCounters we just check on every Disable command if the EventSource is active and shut everything down if it is not


string commandSessionId = GetSessionId(command);

if ((command.Command == EventCommand.Update
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to EventSource limitations I think you will struggle to do refcounting. I believe the only thing you can know for sure is that when IsEnabled() returns false then there are no sessions active. I think @davmason remembers these details better than I though.

(I ignored looking at the code in more detail, as you suggested)

@@ -21,17 +21,17 @@ internal sealed class AggregationManager
// these fields are modified after construction and accessed on multiple threads, use lock(this) to ensure the data
// is synchronized
private readonly List<Predicate<Instrument>> _instrumentConfigFuncs = new();
private TimeSpan _collectionPeriod;
public TimeSpan CollectionPeriod { get; private set; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: consider scoping the lock to this property, rather than 'this' + _aggregationManager

@kkeirstead kkeirstead marked this pull request as ready for review June 16, 2023 20:20
@@ -217,6 +225,16 @@ public void UpDownCounterRateValuePublished(string sessionId, string meterName,
WriteEvent(16, sessionId, meterName, meterVersion ?? "", instrumentName, unit ?? "", tags, rate, value);
}

[Event(17, Keywords = Keywords.TimeSeriesValues)]
#if !NET8_0_OR_GREATER
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm lacking some context here since this change happened midway through my change, but it appears that builds were failing due to this not being here - let me know if this isn't the correct resolution

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, we made a change in EventSource to get rid of most of these suppression attributes, so after that change it would not be required except on below 8.

@kkeirstead
Copy link
Member Author

@davmason or @noahfalk let me know if you have any new feedback on this - just want to make sure this PR keeps moving forward

@@ -25,7 +25,638 @@ public MetricEventSourceTests(ITestOutputHelper output)
}

[ConditionalFact(typeof(PlatformDetection), nameof(PlatformDetection.IsNotBrowser))]
[OuterLoop("Slow and has lots of console spew")]
//[OuterLoop("Slow and has lots of console spew")]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For convenience I've been leaving these commenting as we iterate (and using a reduced IntervalSecs) - I'll revert all of this before merging.

@noahfalk
Copy link
Member

@davmason or @noahfalk let me know if you have any new feedback on this - just want to make sure this PR keeps moving forward

Thanks for the heads up and don't block on me :) I'm relying on @davmason from our side and David can ask me to look again if he thinks that will be useful.

Copy link
Member

@davmason davmason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The shared session approach looks good to me. I didn't review it line by line. If you need me to look at a specific section of code let me know.

@davmason davmason merged commit b99a4dc into dotnet:main Jul 10, 2023
davmason pushed a commit to dotnet/diagnostics that referenced this pull request Aug 3, 2023
… Monitor` and `Dotnet Counters` (#3889)

This PR corresponds to dotnet/runtime#86504 from
the runtime repo to support multiplexing for
`System.Diagnostics.Metrics`. This change converts `dotnet monitor` and
`dotnet counters` to use a shared session, assuming the `MaxHistograms`,
`MaxTimeSeries`, and `IntervalSeconds` are all the same.
@ghost ghost locked as resolved and limited conversation to collaborators Aug 14, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Diagnostics.Metric community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants