Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Global Log handler cleanup - Logs SDK #2184

Merged

Conversation

lalitb
Copy link
Member

@lalitb lalitb commented Oct 8, 2024

Fixes #
Design discussion issue (if applicable) #

Changes

Please provide a brief description of the changes here.

Merge requirement checklist

  • CONTRIBUTING guidelines followed
  • Unit tests added/updated (if applicable)
  • Appropriate CHANGELOG.md files updated for non-trivial, user-facing changes
  • Changes in public API reviewed (if applicable)

@lalitb lalitb requested a review from a team as a code owner October 8, 2024 19:25
Copy link

codecov bot commented Oct 8, 2024

Codecov Report

Attention: Patch coverage is 33.33333% with 22 lines in your changes missing coverage. Please review.

Project coverage is 79.0%. Comparing base (3f5c230) to head (8d88392).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
opentelemetry-sdk/src/logs/log_processor.rs 25.0% 15 Missing ⚠️
opentelemetry-sdk/src/logs/log_emitter.rs 46.1% 7 Missing ⚠️
Additional details and impacted files
@@          Coverage Diff          @@
##            main   #2184   +/-   ##
=====================================
  Coverage   79.0%   79.0%           
=====================================
  Files        122     122           
  Lines      21059   21042   -17     
=====================================
- Hits       16651   16643    -8     
+ Misses      4408    4399    -9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -146,7 +148,10 @@ impl Drop for LoggerProviderInner {
fn drop(&mut self) {
for processor in &mut self.processors {
if let Err(err) = processor.shutdown() {
global::handle_error(err);
otel_warn!(
name: "LoggerProvider.Drop.AlreadyShutdown",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure how we assume that the error is due to "AlreadyShutdown".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes good catch I have changed it to LoggerProvider.Drop.ShutdownError. Have kept it as warning, as the error is during shutdown of LoggerProvider, and should be less destructive ( I think)

lalitb added 2 commits October 8, 2024 13:32
…litb/opentelemetry-rust into global-error-handler-cleanup-logs-sdk
@cijothomas cijothomas added the integration tests Run integration tests label Oct 8, 2024
otel_warn!(
name: "logger_provider_already_shutdown"
name: "LoggerProvider.Shutdown.AlreadyShutdown",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks redundant, as returned Result already contains this information. If we need to keep this, then maybe keep it at debug/info level.

Copy link
Member Author

@lalitb lalitb Oct 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if the error is returned as Result to the user, we decide not to log it even if this could be actionable ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not to log or log at a very low severity.

Do you see any reason why this should be logged, given Result already carries this information? "Internal logging" is more for things that are otherwise hard to figure out, right?

Copy link
Member Author

@lalitb lalitb Oct 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer logging at a lower severity (debug?) for internal debugging. When we request logs from the customer at the debug level, they should include everything necessary for troubleshooting, even if the customer disregards the shutdown result.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good point! Agree with debug! here. It won't cause noise in normal operations, but when we ask users to report bug/issues, we can ask to collect all logs, including debug, so we'll have enough to investigate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed this to debug now.

otel_warn!(
name: "logger_provider_shutdown_error",
error = format!("{:?}", err)
otel_error!(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as https://github.com/open-telemetry/opentelemetry-rust/pull/2184/files#r1793832380, this feels redundant, given Result will carry this info already.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed this to debug now.

@@ -146,7 +146,10 @@ impl Drop for LoggerProviderInner {
fn drop(&mut self) {
for processor in &mut self.processors {
if let Err(err) = processor.shutdown() {
global::handle_error(err);
otel_warn!(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we should do more changes to this part - this is the automatically invoked shutdown upon drop right? I believe we should first check if user has already called shutdown, and if yes, return without any logs (maybe a debug level log?)

if user has not called shutdown, then we need to call shutdown ourselves. Even here - should we log for each processor? or just log once saying shutdownfailed. The detailed failure reason should be part of the individual processor's own log.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we should do more changes to this part - this is the automatically invoked shutdown upon drop right? I believe we should first check if user has already called shutdown, and if yes, return without any logs (maybe a debug level log?)

Modified it accordingly, and add debug log for already shutdown.

if user has not called shutdown, then we need to call shutdown ourselves. Even here - should we log for each processor? or just log once saying shutdownfailed. The detailed failure reason should be part of the individual processor's own log.

The Processor::shutdown() returns the error, and we are now logging each to debug (instead of warn).

@@ -100,7 +99,8 @@ impl LogProcessor for SimpleLogProcessor {
// noop after shutdown
if self.is_shutdown.load(std::sync::atomic::Ordering::Relaxed) {
otel_warn!(
name: "simple_log_processor_emit_after_shutdown"
name: "SimpleLogProcessor.Export.AfterShutdown",
error = LogError::Other("Attempted to export a log after processor shutdown".into())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any advantage of wrapping this into LogError?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it useful - it adds convenience on top of std::error::Error, by declaring various error types as enum and mapping it to actual error string. We don't need to add the error string while returning the error.

@@ -115,10 +115,9 @@ impl LogProcessor for SimpleLogProcessor {
});
if let Err(err) = result {
otel_error!(
name: "simple_log_processor_emit_error",
error = format!("{:?}", err)
name: "SimpleLogProcessor.Export.Error",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its not easy to tell if this is useful or not.. The failure could be due to

  1. Mutex poisoning
  2. Exporter facing issues like network etc.
  3. Futures-Executor can fail itself too.

I don't have a good solution in mind, so probably okay to just replace current global error handler with otel_error! for now until we can find a better alternative.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have modified it to differentiate between MutexPoisoning and Exporter failure - I think we can further extend it in future.

@@ -172,10 +168,9 @@ impl<R: RuntimeChannel> LogProcessor for BatchLogProcessor<R> {

if let Err(err) = result {
otel_error!(
name: "batch_log_processor_emit_error",
error = format!("{:?}", err)
name: "BatchLogProcessor.Export.Error",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this be triggered when channel is full? If yes, we need to rethink this, as this can spam the log output.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this error only triggers when channel is full or closed. We need to add some throttling or logic to prevent flooding - have added the TODO for now, as we need common strategy for such flooding.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree we need a common strategy, but lets remove the error log from here. It'll flood as-is when buffer is full.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we remove altogether, or make it otel_debug for now - with comment to change it to otel_error once throttling is ready.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

either of them are fine with me, though I slightly prefer removing altogether, as I don't know if we can ship a throttling solution for next release.

for processor in &mut self.processors {
if let Err(err) = processor.shutdown() {
global::handle_error(err);
if self
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this change, but please make it own separate PR, so we can keep this PR strictly for replacing global error handler with internal logger.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was done in response to the comment - #2184 (comment). Should we keep it in this PR with separate changelog entry?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always prefer short, very focused PRs!
One can easily review them and continuously make progress by merging short PRs quickly. Slow and steady!

@@ -50,11 +50,30 @@ macro_rules! otel_warn {
{
tracing::warn!(name: $name, target: env!("CARGO_PKG_NAME"), "");
}
#[cfg(not(feature = "internal-logs"))]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with this change, but for future - please make them into own separate PR to make it easier to review.

Copy link
Member

@cijothomas cijothomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
Please keep this change separate with own changelog entry:
https://github.com/open-telemetry/opentelemetry-rust/pull/2184/files#r1796235052

@lalitb
Copy link
Member Author

lalitb commented Oct 14, 2024

internal-log and shutdown related changes are now merged separately in #2195 and #2196. Just incase if there needs another look, else will go with merge.

@lalitb
Copy link
Member Author

lalitb commented Oct 14, 2024

internal-log and shutdown related changes are now merged separately in #2195 and #2196. Just incase if there needs another look, else will go with merge.

Merging it to make the cleanup faster, please comment if any changes required will incorporate in separate PR.

@lalitb lalitb merged commit 16c0e10 into open-telemetry:main Oct 14, 2024
24 of 25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integration tests Run integration tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants