Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Runtime metrics test #900

Merged

Conversation

rajkumar-rangaraj
Copy link
Contributor

@rajkumar-rangaraj rajkumar-rangaraj commented Jun 29, 2022

Why

Part of #665 .NET runtime metrics

Metrics are lost during application exit. Especially, in a console app that run and exit metrics are lost. Adding a MeterProvider.ForceFlush resolves an issue. Spent lot of debugging hours in isolating this issue.

On a side note: We might need a ForceFlush for TracerProvider too, will leave it for discussion and create an issue.

What

Runtime metrics emitted for .NET Framework and .NET Core are not the same. But, there are few common metrics name asserting on those to prove the runtime metrics is working as expected.

Tests

  • Added SubmitMetrics test case to RunTimeMetricsTests

Checklist

  • [ ] CHANGELOG.md is updated.
  • [ ] Documentation is updated.
  • [ ] New features are covered by tests.

@rajkumar-rangaraj rajkumar-rangaraj requested a review from a team June 29, 2022 01:55
@rajkumar-rangaraj
Copy link
Contributor Author

ForceFlush fails IntegrationTests.Http.HttpTests.SubmitMetrics as HttpTests started receiving 2 metrics. Will modify HttpTests for a fix,

@pellared
Copy link
Member

pellared commented Jun 29, 2022

Metrics are lost during application exit. Especially, in a console app that run and exit metrics are lost. Adding a MeterProvider.ForceFlush resolves an issue. Spent lot of debugging hours in isolating this issue.

As far as I understand MeterProvider.ForceFlush is forcing the readers to collect metrics. This causes .NET Runtime metrics to be recorded and then they are emitted before MeterProvider.Dispose exits. I do not think we should force metrics collection when the application is closing. At least this is not even shown in none of these examples:

We can consider adding a configuration setting like OTEL_DOTNET_AUTO_METRICS_COLLECT_ON_EXIT to control MeterProvider.ForceFlush usage which defaults to false.

I have never tried the metrics API and I am exploring how the collection works when using a PUSH-based exporter like OTLP. This is probably where the collection of .NET Runtime metric happens.
I have found this. It looks that the default collection period for OTLP exporter is 1 minute; see here.

The OTel spec defines OTEL_METRIC_EXPORT_INTERVAL which should be used to configure the metrics export settings. I have created an issue open-telemetry/opentelemetry-dotnet#3417, however, I think we should implement it on our own before it is available.

On a side note: We might need a ForceFlush for TracerProvider too, will leave it for discussion and create an issue.

The current implementation gives the processor up to 5 seconds to shut down. I think it is OK.

To sum up, I propose:

Option 1 (OTel spec-compliant):

Option 2 (probably more pragmatic at the moment):

  • Implement something like OTEL_DOTNET_AUTO_METRICS_COLLECT_ON_EXIT to control MeterProvider.ForceFlush usage.

As far as lazy I would probably go for Option 2. And in the meantime, I can try implementing open-telemetry/opentelemetry-dotnet#3417

@nrcventura
Copy link
Member

I'm also leaning towards option 2.

Option 1 does not seem like the right solution to this problem either even though it leverages a spec-compliant setting. The problem is exposed by short-running programs. If the program completes before the configured export interval, the runtime metrics would still not be collected and exported. If that duration is not predictable, and they want to prevent frequent metric exports (when the application runs longer) the env variable in option 2 may be a better option.

@rajkumar-rangaraj
Copy link
Contributor Author

Will fix the issue, once after #910 gets merged.

@pellared
Copy link
Member

pellared commented Jul 4, 2022

It looks like this test is flaky

Error: [xUnit.net 00:00:12.84]     IntegrationTests.RunTimeMetricsTests.SubmitMetrics [FAIL]
  22:20:12 [ERR] [xUnit.net 00:00:12.84]     IntegrationTests.RunTimeMetricsTests.SubmitMetrics [FAIL]
  22:20:12 [DBG]   Failed IntegrationTests.RunTimeMetricsTests.SubmitMetrics [8 s]
  22:20:12 [DBG]   Error Message:
  22:20:12 [DBG]    System.NullReferenceException : Object reference not set to an instance of an object.
  22:20:12 [DBG]   Stack Trace:
  22:20:12 [DBG]      at IntegrationTests.RunTimeMetricsTests.SubmitMetrics() in D:\a\opentelemetry-dotnet-instrumentation\opentelemetry-dotnet-instrumentation\test\IntegrationTests\RunTimeMetricsTests.cs:line 40
  22:20:12 [DBG]   Standard Output Messages:
  22:20:12 [DBG]  Platform: X64
  22:20:12 [DBG]  Configuration: Release
  22:20:12 [DBG]  TargetFramework: net462
  22:20:12 [DBG]  .NET Core: False
  22:20:12 [DBG]  Found profiler at D:\a\opentelemetry-dotnet-instrumentation\opentelemetry-dotnet-instrumentation\bin\tracer-home\win-x64\OpenTelemetry.AutoInstrumentation.Native.dll.
  22:20:12 [DBG]  Profiler DLL: D:\a\opentelemetry-dotnet-instrumentation\opentelemetry-dotnet-instrumentation\bin\tracer-home\win-x64\OpenTelemetry.AutoInstrumentation.Native.dll
  22:20:12 [DBG]  [TestHttpListener]: Listening on 'http://localhost:64990/'
  22:20:12 [DBG]  Starting Application: D:\a\opentelemetry-dotnet-instrumentation\opentelemetry-dotnet-instrumentation\test\test-applications\integrations\TestApplication.Smoke\bin\x64\Release\net462\TestApplication.Smoke.exe
  22:20:12 [DBG]  Found integrations at D:\a\opentelemetry-dotnet-instrumentation\opentelemetry-dotnet-instrumentation\bin\tracer-home\win-x64\OpenTelemetry.AutoInstrumentation.Native.dll.
  22:20:12 [DBG]  ProcessName: TestApplication.Smoke
  22:20:12 [DBG]  ProcessId: 2944
  22:20:12 [DBG]  Exit Code: 0
  22:20:12 [DBG]  [MockMetricsCollector]: Shutting down. Total metric requests received: '2'
  22:20:12 [DBG]  [TestHttpListener]: Listener is shutting down.

source: https://github.com/open-telemetry/opentelemetry-dotnet-instrumentation/runs/7157395320?check_suite_focus=true

I guess that var metrics = metricRequests.SelectMany(r => r.ResourceMetrics).Where(s => s.ScopeMetrics.Count > 0).FirstOrDefault(); was assigned to null.

After a quick look, I think that

  1. WaitForMetrics returns the "first" request. Maybe we should return the "latest" metrics request? I assume that the OTLP exporter always sends all metrics.
  2. The application can close before the .NET Runtime metrics get collected. I think we still need the OTEL_DOTNET_AUTO_METRICS_COLLECT_ON_EXIT. EDIT: you can also use LONG_RUNNING and an asynchronous assertion like implemented here: Add Prometheus exporter integration test #918

@rajkumar-rangaraj
Copy link
Contributor Author

LONG_RUNNING works better here as it is going to execute the same smoke tests.

test/IntegrationTests/RunTimeMetricsTests.cs Outdated Show resolved Hide resolved
test/IntegrationTests/RunTimeMetricsTests.cs Outdated Show resolved Hide resolved
@pellared pellared changed the title Add MeterProvider ForceFlush / Runtime Metrics Tests Add Runtime metrics test Jul 5, 2022
@pellared pellared merged commit 454070a into open-telemetry:main Jul 5, 2022
@@ -121,7 +121,7 @@ public async Task<Container> StartContainerAsync(TestSettings testSettings, int
/// StartTestApplication starts the test application
// and returns the Process instance for further interaction.
/// </summary>
public Process StartTestApplication(int traceAgentPort = 0, string arguments = null, string packageVersion = "", int aspNetCorePort = 0, string framework = "", bool enableStartupHook = true)
public Process StartTestApplication(int traceAgentPort = 0, int metricsAgentPort = 0, string arguments = null, string packageVersion = "", int aspNetCorePort = 0, string framework = "", bool enableStartupHook = true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that our continued reliance on these optional parameters is going make the tests harder to maintain. On the one hand, it's nice that the test only needs to call StartTestApplication with a single named parameter. However, based on my experience with these usages, it gets harder and harder to keep track of which parameters are being used over time. That is why I prefer the overloads that take settings/config objects, which make it a lot easier to track down which specific settings are being used/customized.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants