-
Notifications
You must be signed in to change notification settings - Fork 453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve internal opentelemetry logging #2128
Improve internal opentelemetry logging #2128
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2128 +/- ##
=======================================
- Coverage 79.0% 79.0% -0.1%
=======================================
Files 121 121
Lines 20859 20856 -3
=======================================
- Hits 16490 16482 -8
- Misses 4369 4374 +5 ☔ View full report in Codecov by Sentry. |
…emetry-rust into log-handler-using-tracing
@@ -0,0 +1,200 @@ | |||
#![allow(unused_macros)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why expose this from the opentelemetry
(API) package if it is seemingly only going to be used by SDK or implementation specific crates in the project? Won't changes to this stuff necessitate a possible semver change in opentelemetry
API crate itself, unnecessarily?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kept it at the level of custom error handler which is within API. I thought there could be the cases where we may want to used within API e.g., debugging Context
implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am not sure if there are better ways to "hide" it. It should be public API, so it follows the same sem ver rules, and it can be used by OTel itself + any other components extending otel. For example, if someone authors an exporter, they can also use this.
We can doc it so user end up misusing it for any other logging!
Won't changes to this stuff necessitate a possible semver change in opentelemetry API crate itself, unnecessarily?
We should treat this API as just any other API, so it cannot make breaking changes. As long as we are doing a minimal API (super thin wrapper around tracing), we should be able to stabilize it by 1.0, and keep it non-breaking forever (or until 2.0 ships)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shaun-cox The current macros are lightweight and closely resemble the tracing macros. Since not all opentelemetry-* crates depend on opentelemetry-sdk, exposing these macros from the API package appears to be the most suitable approach.
Let me know if you see any concerns here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look good! Thank you!.
I'd suggest to removing adding logs throughout into its own separate PR. We should be doing it very carefully (irrespective of the level), so its best to start with adding error level ones for the failure scenarios (that can be used to remove the current global handler), and anything else on a case-by-case basis, few at a time. (I am trying to be conservative here, as using tracing
is still a very new thing, so lets start small)
Ah, it does add some. I can remove it, but then the diagnostics example won't have much to show. |
Builder::default() | ||
} | ||
|
||
pub(crate) fn log_processors(&self) -> &[Box<dyn LogProcessor>] { | ||
otel_debug!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be good enough to get the processor count information just once during LoggerProvider build. Is there any value in repeatedly logging it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to unblock and merge this PR quickly, I think its best to remove all internal logging that are not error/warning, and we can selective bring them one by one in future PRs.
@@ -104,6 +128,9 @@ impl LoggerProvider { | |||
|
|||
/// Shuts down this `LoggerProvider` | |||
pub fn shutdown(&self) -> LogResult<()> { | |||
otel_info!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be logged only in case of a successful shutdown? Only in this case:
if errs.is_empty() {
Ok(())
}
@@ -221,6 +260,9 @@ impl Logger { | |||
instrumentation_lib: Arc<InstrumentationLibrary>, | |||
provider: LoggerProvider, | |||
) -> Self { | |||
otel_info!( | |||
name: "logger_new" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably add some identification information for the logger such as name, version etc.
All the debug/info has been removed now. |
@@ -98,16 +99,12 @@ impl LogProcessor for SimpleLogProcessor { | |||
fn emit(&self, record: &mut LogRecord, instrumentation: &InstrumentationLibrary) { | |||
// noop after shutdown | |||
if self.is_shutdown.load(std::sync::atomic::Ordering::Relaxed) { | |||
otel_warn!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could continue to be the hot path even after shutdown so it might not be a great idea to keep logging warnings for this. Let's a TODO to revisit this and decide if we should remove the logging here or make it less severe.
Co-authored-by: Utkarsh Umesan Pillai <66651184+utpilla@users.noreply.github.com>
@@ -234,8 +246,7 @@ mod tests { | |||
.with_simple_exporter(exporter.clone()) | |||
.build(); | |||
|
|||
let layer = layer::OpenTelemetryTracingBridge::new(&logger_provider); | |||
let subscriber = tracing_subscriber::registry().with(layer); | |||
let subscriber = create_tracing_subscriber(exporter.clone(), &logger_provider); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is exporter required to be passed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot to remove as part of updating the method. Will cleanup in subsequent PR.
@@ -513,7 +521,7 @@ mod tests { | |||
let span_id = cx.span().span_context().span_id(); | |||
|
|||
// logging is done inside span context. | |||
log::error!("log from log crate"); | |||
log::error!(target: "my-system", "log from log crate"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need these changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we can remove the target, will follow up in subsequent cleanups.
@@ -147,6 +147,9 @@ impl<AU: AtomicallyUpdate<T>, T: Number, O: Operation> ValueMap<AU, T, O> { | |||
O::update_tracker(&new_tracker, measurement, index); | |||
trackers.insert(STREAM_OVERFLOW_ATTRIBUTES.clone(), Arc::new(new_tracker)); | |||
global::handle_error(MetricsError::Other("Warning: Maximum data points for metric stream exceeded. Entry added to overflow. Subsequent overflows to same metric until next collect will not be logged.".into())); | |||
otel_warn!( name: "ValueMap.measure", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are user actionable logs, so it should use a end user friendly name like "MetricCarinalityLimitHit"/similar,.
Good to follow up separately on all logs.
Co-authored-by: Zhongyang Wu <zhongyang.wu@outlook.com> Co-authored-by: Utkarsh Umesan Pillai <66651184+utpilla@users.noreply.github.com>
I observed obscure build issues with opentelemetry_sdk 0.27.0 and traced them down to this code, which requires tracing 0.1.39 while the workspace specifies 0.1.0 as the minimum version. While that is unfortunate, at least the dependency should reflect it. |
Can you provide more details, where do you see 0.1.39 dependency added? |
Sure. The code is using the name syntax of the tracing macros which was added in tokio-rs/tracing#2699 and released in 0.1.39. Thus, when you |
#2373 Opening a new issue to track and address this. |
Fixes #1146
Changes
Created the draft PR to discuss the approach as mentioned here
Have added comment on some of the open issues.
Sample out of the appender-tracing example:
Scenario 1: only output internal
error
:Scenario 2: output
info
and above:Scenario 3: output 'debug' and above:
Please provide a brief description of the changes here.
Merge requirement checklist
CHANGELOG.md
files updated for non-trivial, user-facing changes