Add Runtime metrics test #900

rajkumar-rangaraj · 2022-06-29T01:55:09Z

Why

Part of #665 .NET runtime metrics

Metrics are lost during application exit. Especially, in a console app that run and exit metrics are lost. Adding a MeterProvider.ForceFlush resolves an issue. Spent lot of debugging hours in isolating this issue.

On a side note: We might need a ForceFlush for TracerProvider too, will leave it for discussion and create an issue.

What

Runtime metrics emitted for .NET Framework and .NET Core are not the same. But, there are few common metrics name asserting on those to prove the runtime metrics is working as expected.

Tests

Added SubmitMetrics test case to RunTimeMetricsTests

Checklist

~~[ ] CHANGELOG.md is updated.~~
~~[ ] Documentation is updated.~~
~~[ ] New features are covered by tests.~~

rajkumar-rangaraj · 2022-06-29T02:40:44Z

ForceFlush fails IntegrationTests.Http.HttpTests.SubmitMetrics as HttpTests started receiving 2 metrics. Will modify HttpTests for a fix,

…rangaraj/opentelemetry-dotnet-instrumentation into rajrang/runtimeMetTest

pellared · 2022-06-29T08:05:04Z

Metrics are lost during application exit. Especially, in a console app that run and exit metrics are lost. Adding a MeterProvider.ForceFlush resolves an issue. Spent lot of debugging hours in isolating this issue.

As far as I understand MeterProvider.ForceFlush is forcing the readers to collect metrics. This causes .NET Runtime metrics to be recorded and then they are emitted before MeterProvider.Dispose exits. I do not think we should force metrics collection when the application is closing. At least this is not even shown in none of these examples:

We can consider adding a configuration setting like OTEL_DOTNET_AUTO_METRICS_COLLECT_ON_EXIT to control MeterProvider.ForceFlush usage which defaults to false.

I have never tried the metrics API and I am exploring how the collection works when using a PUSH-based exporter like OTLP. This is probably where the collection of .NET Runtime metric happens.
I have found this. It looks that the default collection period for OTLP exporter is 1 minute; see here.

The OTel spec defines OTEL_METRIC_EXPORT_INTERVAL which should be used to configure the metrics export settings. I have created an issue open-telemetry/opentelemetry-dotnet#3417, however, I think we should implement it on our own before it is available.

On a side note: We might need a ForceFlush for TracerProvider too, will leave it for discussion and create an issue.

The current implementation gives the processor up to 5 seconds to shut down. I think it is OK.

To sum up, I propose:

Option 1 (OTel spec-compliant):

Implementing OTEL_METRIC_EXPORT_INTERVAL on our side (and maybe OTEL_METRIC_EXPORT_TIMEOUT as well)
Create a test application similar to https://github.com/open-telemetry/opentelemetry-dotnet-contrib/blob/main/examples/runtime-instrumentation/Program.cs
Set OTEL_METRIC_EXPORT_INTERVAL to 1 ms.
Give up to 1 second to collect the collect and assert some .NET runtime metrics
Close the application when assertions are done (preferably via some SIGINT but it may be easier to make a /shutdown call)

Option 2 (probably more pragmatic at the moment):

Implement something like OTEL_DOTNET_AUTO_METRICS_COLLECT_ON_EXIT to control MeterProvider.ForceFlush usage.

As far as lazy I would probably go for Option 2. And in the meantime, I can try implementing open-telemetry/opentelemetry-dotnet#3417

nrcventura · 2022-06-29T15:38:53Z

I'm also leaning towards option 2.

Option 1 does not seem like the right solution to this problem either even though it leverages a spec-compliant setting. The problem is exposed by short-running programs. If the program completes before the configured export interval, the runtime metrics would still not be collected and exported. If that duration is not predictable, and they want to prevent frequent metric exports (when the application runs longer) the env variable in option 2 may be a better option.

rajkumar-rangaraj · 2022-07-01T18:06:14Z

Will fix the issue, once after #910 gets merged.

pellared · 2022-07-04T08:31:01Z

It looks like this test is flaky

Error: [xUnit.net 00:00:12.84]     IntegrationTests.RunTimeMetricsTests.SubmitMetrics [FAIL]
  22:20:12 [ERR] [xUnit.net 00:00:12.84]     IntegrationTests.RunTimeMetricsTests.SubmitMetrics [FAIL]
  22:20:12 [DBG]   Failed IntegrationTests.RunTimeMetricsTests.SubmitMetrics [8 s]
  22:20:12 [DBG]   Error Message:
  22:20:12 [DBG]    System.NullReferenceException : Object reference not set to an instance of an object.
  22:20:12 [DBG]   Stack Trace:
  22:20:12 [DBG]      at IntegrationTests.RunTimeMetricsTests.SubmitMetrics() in D:\a\opentelemetry-dotnet-instrumentation\opentelemetry-dotnet-instrumentation\test\IntegrationTests\RunTimeMetricsTests.cs:line 40
  22:20:12 [DBG]   Standard Output Messages:
  22:20:12 [DBG]  Platform: X64
  22:20:12 [DBG]  Configuration: Release
  22:20:12 [DBG]  TargetFramework: net462
  22:20:12 [DBG]  .NET Core: False
  22:20:12 [DBG]  Found profiler at D:\a\opentelemetry-dotnet-instrumentation\opentelemetry-dotnet-instrumentation\bin\tracer-home\win-x64\OpenTelemetry.AutoInstrumentation.Native.dll.
  22:20:12 [DBG]  Profiler DLL: D:\a\opentelemetry-dotnet-instrumentation\opentelemetry-dotnet-instrumentation\bin\tracer-home\win-x64\OpenTelemetry.AutoInstrumentation.Native.dll
  22:20:12 [DBG]  [TestHttpListener]: Listening on 'http://localhost:64990/'
  22:20:12 [DBG]  Starting Application: D:\a\opentelemetry-dotnet-instrumentation\opentelemetry-dotnet-instrumentation\test\test-applications\integrations\TestApplication.Smoke\bin\x64\Release\net462\TestApplication.Smoke.exe
  22:20:12 [DBG]  Found integrations at D:\a\opentelemetry-dotnet-instrumentation\opentelemetry-dotnet-instrumentation\bin\tracer-home\win-x64\OpenTelemetry.AutoInstrumentation.Native.dll.
  22:20:12 [DBG]  ProcessName: TestApplication.Smoke
  22:20:12 [DBG]  ProcessId: 2944
  22:20:12 [DBG]  Exit Code: 0
  22:20:12 [DBG]  [MockMetricsCollector]: Shutting down. Total metric requests received: '2'
  22:20:12 [DBG]  [TestHttpListener]: Listener is shutting down.

source: https://github.com/open-telemetry/opentelemetry-dotnet-instrumentation/runs/7157395320?check_suite_focus=true

I guess that var metrics = metricRequests.SelectMany(r => r.ResourceMetrics).Where(s => s.ScopeMetrics.Count > 0).FirstOrDefault(); was assigned to null.

After a quick look, I think that

WaitForMetrics returns the "first" request. Maybe we should return the "latest" metrics request? I assume that the OTLP exporter always sends all metrics.
The application can close before the .NET Runtime metrics get collected. I think we still need the OTEL_DOTNET_AUTO_METRICS_COLLECT_ON_EXIT. EDIT: you can also use LONG_RUNNING and an asynchronous assertion like implemented here: Add Prometheus exporter integration test #918

rajkumar-rangaraj · 2022-07-05T16:48:47Z

LONG_RUNNING works better here as it is going to execute the same smoke tests.

test/IntegrationTests/RunTimeMetricsTests.cs

nrcventura · 2022-07-06T16:25:59Z

test/IntegrationTests/Helpers/TestHelper.cs

@@ -121,7 +121,7 @@ public async Task<Container> StartContainerAsync(TestSettings testSettings, int
    /// StartTestApplication starts the test application
    // and returns the Process instance for further interaction.
    /// </summary>
-    public Process StartTestApplication(int traceAgentPort = 0, string arguments = null, string packageVersion = "", int aspNetCorePort = 0, string framework = "", bool enableStartupHook = true)
+    public Process StartTestApplication(int traceAgentPort = 0, int metricsAgentPort = 0, string arguments = null, string packageVersion = "", int aspNetCorePort = 0, string framework = "", bool enableStartupHook = true)


I think that our continued reliance on these optional parameters is going make the tests harder to maintain. On the one hand, it's nice that the test only needs to call StartTestApplication with a single named parameter. However, based on my experience with these usages, it gets harder and harder to keep track of which parameters are being used over time. That is why I prefer the overloads that take settings/config objects, which make it a lot easier to track down which specific settings are being used/customized.

Add ForceFlush / Runtime Metrics Tests

60f92c0

rajkumar-rangaraj requested a review from a team June 29, 2022 01:55

Merge branch 'main' into rajrang/runtimeMetTest

4ee73c8

rajkumar-rangaraj mentioned this pull request Jun 29, 2022

Meter provider - integration test #665

Closed

7 tasks

rajkumar-rangaraj added 2 commits June 28, 2022 19:43

HttpTests fix

d4cf855

Merge branch 'rajrang/runtimeMetTest' of https://github.com/rajkumar-…

f4f1e33

…rangaraj/opentelemetry-dotnet-instrumentation into rajrang/runtimeMetTest

pellared mentioned this pull request Jun 29, 2022

Support OTEL_METRIC_EXPORT_INTERVAL and OTEL_METRIC_EXPORT_TIMEOUT open-telemetry/opentelemetry-dotnet#3417

Closed

rajkumar-rangaraj added 3 commits July 1, 2022 10:25

fix merge conflict

9958def

Remove forceflush

9fa54e8

Fix test.

40beaf4

rajkumar-rangaraj added 2 commits July 1, 2022 12:19

Merge remote-tracking branch 'origin/main' into rajrang/runtimeMetTest

ade4c20

New changes.

66a4bd1

rajkumar-rangaraj added 2 commits July 5, 2022 09:32

Fix flaky test

02b67b7

Merge branch 'main' into rajrang/runtimeMetTest

7daa5ff

LONG_RUNNING

2611995

pellared approved these changes Jul 5, 2022

View reviewed changes

test/IntegrationTests/RunTimeMetricsTests.cs Outdated Show resolved Hide resolved

test/IntegrationTests/RunTimeMetricsTests.cs Outdated Show resolved Hide resolved

pellared changed the title ~~Add MeterProvider ForceFlush / Runtime Metrics Tests~~ Add Runtime metrics test Jul 5, 2022

feedback

67aeb6f

pellared merged commit 454070a into open-telemetry:main Jul 5, 2022

nrcventura reviewed Jul 6, 2022

View reviewed changes

pellared mentioned this pull request Jul 6, 2022

Refactor the usage of optional parameters #925

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Runtime metrics test #900

Add Runtime metrics test #900

rajkumar-rangaraj commented Jun 29, 2022 •

edited by pellared

Loading

rajkumar-rangaraj commented Jun 29, 2022

pellared commented Jun 29, 2022 •

edited

Loading

nrcventura commented Jun 29, 2022

rajkumar-rangaraj commented Jul 1, 2022

pellared commented Jul 4, 2022 •

edited

Loading

rajkumar-rangaraj commented Jul 5, 2022

nrcventura Jul 6, 2022

pellared Jul 6, 2022

Add Runtime metrics test #900

Add Runtime metrics test #900

Conversation

rajkumar-rangaraj commented Jun 29, 2022 • edited by pellared Loading

Why

What

Tests

Checklist

rajkumar-rangaraj commented Jun 29, 2022

pellared commented Jun 29, 2022 • edited Loading

nrcventura commented Jun 29, 2022

rajkumar-rangaraj commented Jul 1, 2022

pellared commented Jul 4, 2022 • edited Loading

rajkumar-rangaraj commented Jul 5, 2022

nrcventura Jul 6, 2022

Choose a reason for hiding this comment

pellared Jul 6, 2022

Choose a reason for hiding this comment

rajkumar-rangaraj commented Jun 29, 2022 •

edited by pellared

Loading

pellared commented Jun 29, 2022 •

edited

Loading

pellared commented Jul 4, 2022 •

edited

Loading