Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve internal opentelemetry logging #2128

Merged
merged 38 commits into from
Oct 4, 2024

Conversation

lalitb
Copy link
Member

@lalitb lalitb commented Sep 19, 2024

Fixes #1146

Changes

Created the draft PR to discuss the approach as mentioned here

Have added comment on some of the open issues.

Sample out of the appender-tracing example:

Scenario 1: only output internal error:

image

Scenario 2: output info and above:
image

Scenario 3: output 'debug' and above:
image

Please provide a brief description of the changes here.

Merge requirement checklist

  • CONTRIBUTING guidelines followed
  • Unit tests added/updated (if applicable)
  • Appropriate CHANGELOG.md files updated for non-trivial, user-facing changes
  • Changes in public API reviewed (if applicable)

@lalitb lalitb requested a review from a team as a code owner September 19, 2024 07:15
@lalitb lalitb marked this pull request as draft September 19, 2024 07:16
Copy link

codecov bot commented Sep 19, 2024

Codecov Report

Attention: Patch coverage is 54.16667% with 22 lines in your changes missing coverage. Please review.

Project coverage is 79.0%. Comparing base (86dd486) to head (3c4d20c).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
opentelemetry-sdk/src/logs/log_processor.rs 19.0% 17 Missing ⚠️
opentelemetry-sdk/src/logs/log_emitter.rs 60.0% 2 Missing ⚠️
opentelemetry-sdk/src/metrics/periodic_reader.rs 0.0% 2 Missing ⚠️
opentelemetry-otlp/src/metric.rs 0.0% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##            main   #2128     +/-   ##
=======================================
- Coverage   79.0%   79.0%   -0.1%     
=======================================
  Files        121     121             
  Lines      20859   20856      -3     
=======================================
- Hits       16490   16482      -8     
- Misses      4369    4374      +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

opentelemetry-sdk/src/logs/log_emitter.rs Outdated Show resolved Hide resolved
@@ -0,0 +1,200 @@
#![allow(unused_macros)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why expose this from the opentelemetry (API) package if it is seemingly only going to be used by SDK or implementation specific crates in the project? Won't changes to this stuff necessitate a possible semver change in opentelemetry API crate itself, unnecessarily?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept it at the level of custom error handler which is within API. I thought there could be the cases where we may want to used within API e.g., debugging Context implementation.

Copy link
Member

@cijothomas cijothomas Sep 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am not sure if there are better ways to "hide" it. It should be public API, so it follows the same sem ver rules, and it can be used by OTel itself + any other components extending otel. For example, if someone authors an exporter, they can also use this.
We can doc it so user end up misusing it for any other logging!

Won't changes to this stuff necessitate a possible semver change in opentelemetry API crate itself, unnecessarily?

We should treat this API as just any other API, so it cannot make breaking changes. As long as we are doing a minimal API (super thin wrapper around tracing), we should be able to stabilize it by 1.0, and keep it non-breaking forever (or until 2.0 ships)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shaun-cox The current macros are lightweight and closely resemble the tracing macros. Since not all opentelemetry-* crates depend on opentelemetry-sdk, exposing these macros from the API package appears to be the most suitable approach.

Let me know if you see any concerns here.

Copy link
Member

@cijothomas cijothomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good! Thank you!.

I'd suggest to removing adding logs throughout into its own separate PR. We should be doing it very carefully (irrespective of the level), so its best to start with adding error level ones for the failure scenarios (that can be used to remove the current global handler), and anything else on a case-by-case basis, few at a time. (I am trying to be conservative here, as using tracing is still a very new thing, so lets start small)

@lalitb
Copy link
Member Author

lalitb commented Oct 3, 2024

I'd suggest to removing adding logs throughout into its own separate PR.

Yes, this PR doesn't add any new logs. Just updates the logs which were added by the previous PR directly using tracing.

Ah, it does add some. I can remove it, but then the diagnostics example won't have much to show.

Builder::default()
}

pub(crate) fn log_processors(&self) -> &[Box<dyn LogProcessor>] {
otel_debug!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be good enough to get the processor count information just once during LoggerProvider build. Is there any value in repeatedly logging it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to unblock and merge this PR quickly, I think its best to remove all internal logging that are not error/warning, and we can selective bring them one by one in future PRs.

@@ -104,6 +128,9 @@ impl LoggerProvider {

/// Shuts down this `LoggerProvider`
pub fn shutdown(&self) -> LogResult<()> {
otel_info!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be logged only in case of a successful shutdown? Only in this case:

if errs.is_empty() {
    Ok(())
}

@@ -221,6 +260,9 @@ impl Logger {
instrumentation_lib: Arc<InstrumentationLibrary>,
provider: LoggerProvider,
) -> Self {
otel_info!(
name: "logger_new"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably add some identification information for the logger such as name, version etc.

@lalitb
Copy link
Member Author

lalitb commented Oct 3, 2024

All the debug/info has been removed now.

@@ -98,16 +99,12 @@ impl LogProcessor for SimpleLogProcessor {
fn emit(&self, record: &mut LogRecord, instrumentation: &InstrumentationLibrary) {
// noop after shutdown
if self.is_shutdown.load(std::sync::atomic::Ordering::Relaxed) {
otel_warn!(
Copy link
Contributor

@utpilla utpilla Oct 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could continue to be the hot path even after shutdown so it might not be a great idea to keep logging warnings for this. Let's a TODO to revisit this and decide if we should remove the logging here or make it less severe.

Co-authored-by: Utkarsh Umesan Pillai <66651184+utpilla@users.noreply.github.com>
@@ -234,8 +246,7 @@ mod tests {
.with_simple_exporter(exporter.clone())
.build();

let layer = layer::OpenTelemetryTracingBridge::new(&logger_provider);
let subscriber = tracing_subscriber::registry().with(layer);
let subscriber = create_tracing_subscriber(exporter.clone(), &logger_provider);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is exporter required to be passed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to remove as part of updating the method. Will cleanup in subsequent PR.

@@ -513,7 +521,7 @@ mod tests {
let span_id = cx.span().span_context().span_id();

// logging is done inside span context.
log::error!("log from log crate");
log::error!(target: "my-system", "log from log crate");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need these changes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we can remove the target, will follow up in subsequent cleanups.

@@ -147,6 +147,9 @@ impl<AU: AtomicallyUpdate<T>, T: Number, O: Operation> ValueMap<AU, T, O> {
O::update_tracker(&new_tracker, measurement, index);
trackers.insert(STREAM_OVERFLOW_ATTRIBUTES.clone(), Arc::new(new_tracker));
global::handle_error(MetricsError::Other("Warning: Maximum data points for metric stream exceeded. Entry added to overflow. Subsequent overflows to same metric until next collect will not be logged.".into()));
otel_warn!( name: "ValueMap.measure",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are user actionable logs, so it should use a end user friendly name like "MetricCarinalityLimitHit"/similar,.
Good to follow up separately on all logs.

@cijothomas cijothomas merged commit 2ff7ec0 into open-telemetry:main Oct 4, 2024
24 of 25 checks passed
cijothomas pushed a commit to cijothomas/opentelemetry-rust that referenced this pull request Oct 4, 2024
Co-authored-by: Zhongyang Wu <zhongyang.wu@outlook.com>
Co-authored-by: Utkarsh Umesan Pillai <66651184+utpilla@users.noreply.github.com>
@cschramm
Copy link
Contributor

I observed obscure build issues with opentelemetry_sdk 0.27.0 and traced them down to this code, which requires tracing 0.1.39 while the workspace specifies 0.1.0 as the minimum version. While that is unfortunate, at least the dependency should reflect it.

@lalitb
Copy link
Member Author

lalitb commented Nov 15, 2024

I observed obscure build issues with opentelemetry_sdk 0.27.0 and traced them down to this code, which requires tracing 0.1.39 while the workspace specifies 0.1.0 as the minimum version. While that is unfortunate, at least the dependency should reflect it.

Can you provide more details, where do you see 0.1.39 dependency added?

@cschramm
Copy link
Contributor

Sure. The code is using the name syntax of the tracing macros which was added in tokio-rs/tracing#2699 and released in 0.1.39. Thus, when you cargo update -p tracing --precise 0.1.38 (or to 0.1.37 if you like, as 0.1.38 got yanked), cargo check -p opentelemetry_sdk fails.

@cijothomas
Copy link
Member

#2373 Opening a new issue to track and address this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Proposal to improve internal error logging
6 participants