-
Notifications
You must be signed in to change notification settings - Fork 488
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Global Log handler cleanup - Logs SDK #2184
Global Log handler cleanup - Logs SDK #2184
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2184 +/- ##
=====================================
Coverage 79.0% 79.0%
=====================================
Files 122 122
Lines 21059 21042 -17
=====================================
- Hits 16651 16643 -8
+ Misses 4408 4399 -9 ☔ View full report in Codecov by Sentry. |
@@ -146,7 +148,10 @@ impl Drop for LoggerProviderInner { | |||
fn drop(&mut self) { | |||
for processor in &mut self.processors { | |||
if let Err(err) = processor.shutdown() { | |||
global::handle_error(err); | |||
otel_warn!( | |||
name: "LoggerProvider.Drop.AlreadyShutdown", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure how we assume that the error is due to "AlreadyShutdown".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes good catch I have changed it to LoggerProvider.Drop.ShutdownError
. Have kept it as warning, as the error is during shutdown of LoggerProvider, and should be less destructive ( I think)
…litb/opentelemetry-rust into global-error-handler-cleanup-logs-sdk
…litb/opentelemetry-rust into global-error-handler-cleanup-logs-sdk
otel_warn!( | ||
name: "logger_provider_already_shutdown" | ||
name: "LoggerProvider.Shutdown.AlreadyShutdown", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks redundant, as returned Result already contains this information. If we need to keep this, then maybe keep it at debug/info level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if the error is returned as Result to the user, we decide not to log it even if this could be actionable ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not to log or log at a very low severity.
Do you see any reason why this should be logged, given Result already carries this information? "Internal logging" is more for things that are otherwise hard to figure out, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer logging at a lower severity (debug?) for internal debugging. When we request logs from the customer at the debug level, they should include everything necessary for troubleshooting, even if the customer disregards the shutdown result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good point! Agree with debug! here. It won't cause noise in normal operations, but when we ask users to report bug/issues, we can ask to collect all logs, including debug, so we'll have enough to investigate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed this to debug now.
otel_warn!( | ||
name: "logger_provider_shutdown_error", | ||
error = format!("{:?}", err) | ||
otel_error!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as https://github.com/open-telemetry/opentelemetry-rust/pull/2184/files#r1793832380, this feels redundant, given Result will carry this info already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed this to debug now.
@@ -146,7 +146,10 @@ impl Drop for LoggerProviderInner { | |||
fn drop(&mut self) { | |||
for processor in &mut self.processors { | |||
if let Err(err) = processor.shutdown() { | |||
global::handle_error(err); | |||
otel_warn!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we should do more changes to this part - this is the automatically invoked shutdown upon drop right? I believe we should first check if user has already called shutdown, and if yes, return without any logs (maybe a debug level log?)
if user has not called shutdown, then we need to call shutdown ourselves. Even here - should we log for each processor? or just log once saying shutdownfailed. The detailed failure reason should be part of the individual processor's own log.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we should do more changes to this part - this is the automatically invoked shutdown upon drop right? I believe we should first check if user has already called shutdown, and if yes, return without any logs (maybe a debug level log?)
Modified it accordingly, and add debug log for already shutdown.
if user has not called shutdown, then we need to call shutdown ourselves. Even here - should we log for each processor? or just log once saying shutdownfailed. The detailed failure reason should be part of the individual processor's own log.
The Processor::shutdown() returns the error, and we are now logging each to debug (instead of warn).
@@ -100,7 +99,8 @@ impl LogProcessor for SimpleLogProcessor { | |||
// noop after shutdown | |||
if self.is_shutdown.load(std::sync::atomic::Ordering::Relaxed) { | |||
otel_warn!( | |||
name: "simple_log_processor_emit_after_shutdown" | |||
name: "SimpleLogProcessor.Export.AfterShutdown", | |||
error = LogError::Other("Attempted to export a log after processor shutdown".into()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any advantage of wrapping this into LogError?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it useful - it adds convenience on top of std::error::Error, by declaring various error types as enum and mapping it to actual error string. We don't need to add the error string while returning the error.
@@ -115,10 +115,9 @@ impl LogProcessor for SimpleLogProcessor { | |||
}); | |||
if let Err(err) = result { | |||
otel_error!( | |||
name: "simple_log_processor_emit_error", | |||
error = format!("{:?}", err) | |||
name: "SimpleLogProcessor.Export.Error", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its not easy to tell if this is useful or not.. The failure could be due to
- Mutex poisoning
- Exporter facing issues like network etc.
- Futures-Executor can fail itself too.
I don't have a good solution in mind, so probably okay to just replace current global error handler with otel_error! for now until we can find a better alternative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have modified it to differentiate between MutexPoisoning and Exporter failure - I think we can further extend it in future.
@@ -172,10 +168,9 @@ impl<R: RuntimeChannel> LogProcessor for BatchLogProcessor<R> { | |||
|
|||
if let Err(err) = result { | |||
otel_error!( | |||
name: "batch_log_processor_emit_error", | |||
error = format!("{:?}", err) | |||
name: "BatchLogProcessor.Export.Error", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will this be triggered when channel is full? If yes, we need to rethink this, as this can spam the log output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this error only triggers when channel is full or closed. We need to add some throttling or logic to prevent flooding - have added the TODO for now, as we need common strategy for such flooding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree we need a common strategy, but lets remove the error log from here. It'll flood as-is when buffer is full.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we remove altogether, or make it otel_debug for now - with comment to change it to otel_error once throttling is ready.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
either of them are fine with me, though I slightly prefer removing altogether, as I don't know if we can ship a throttling solution for next release.
…litb/opentelemetry-rust into global-error-handler-cleanup-logs-sdk
for processor in &mut self.processors { | ||
if let Err(err) = processor.shutdown() { | ||
global::handle_error(err); | ||
if self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this change, but please make it own separate PR, so we can keep this PR strictly for replacing global error handler with internal logger.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was done in response to the comment - #2184 (comment). Should we keep it in this PR with separate changelog entry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I always prefer short, very focused PRs!
One can easily review them and continuously make progress by merging short PRs quickly. Slow and steady!
@@ -50,11 +50,30 @@ macro_rules! otel_warn { | |||
{ | |||
tracing::warn!(name: $name, target: env!("CARGO_PKG_NAME"), ""); | |||
} | |||
#[cfg(not(feature = "internal-logs"))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with this change, but for future - please make them into own separate PR to make it easier to review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Please keep this change separate with own changelog entry:
https://github.com/open-telemetry/opentelemetry-rust/pull/2184/files#r1796235052
Fixes #
Design discussion issue (if applicable) #
Changes
Please provide a brief description of the changes here.
Merge requirement checklist
CHANGELOG.md
files updated for non-trivial, user-facing changes