Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OTel Tracing implementation #24133

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

sureshbabu-areekara
Copy link

@sureshbabu-areekara sureshbabu-areekara commented Nov 25, 2024

Description

PR to enhance tracing with Open Telemetry.
RFC :- prestodb/rfcs#33

Motivation and Context

Enhancing the tracing with Open Telemetry satisfies the need for observability in Presto so that customer can do Performance monitoring, debugging, Collection and export of telemetry data such as traces to observability dashboards.

Impact

Test Plan

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* ... :pr:`12345`
* ... :pr:`12345`

Hive Connector Changes
* ... :pr:`12345`
* ... :pr:`12345`

If release note is NOT required, use:

== NO RELEASE NOTE ==

Co-Authors

@bentonyjoe191
@siddhuoo7

Copy link

linux-foundation-easycla bot commented Nov 25, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

@prestodb-ci prestodb-ci requested review from a team, pdabre12 and czentgr and removed request for a team November 25, 2024 05:17
@sureshbabu-areekara sureshbabu-areekara marked this pull request as draft November 25, 2024 05:20
@sureshbabu-areekara sureshbabu-areekara force-pushed the otel-impl-oss branch 2 times, most recently from 93d1192 to 3eaccc6 Compare January 6, 2025 18:59
@sureshbabu-areekara
Copy link
Author

Hi @agrawalreetika As of now refactored the code to remove the direct open telemetry sdk dependency from presto-main. But still we added the presto-open-telemetry module in presto-main/pom.xml. Will try to make it as pluggable so that code won't fail if we remove the dependency.

cc @ethanyzhang @bentonyjoe191

@sureshbabu-areekara
Copy link
Author

sureshbabu-areekara commented Jan 10, 2025

Hi @agrawalreetika We are facing issues proceeding with the pluggable approach. As of now we propagate TracingSpan and ScopedSpan (Wrappers of SDK Span) throughout the presto-main code, the module dependency on presto-open-telemetry in presto-main cannot be removed. It would be great if you could advise an alternate solution for this issue. Thanks!!

Or else we could keep the open telemetry sdk in presto-main and go for plugin approach.

So the possible solutions would be.

  1. Remove all open telemetry sdk from main and add presto-open-telemetry module as dependency.
  2. Add open telemetry sdk in presto-main and go for plugin approach.

cc @bentonyjoe191 @ethanyzhang

@sureshbabu-areekara
Copy link
Author

Hi @agrawalreetika Pushed the changes to implement Plugin approach.
cc @ethanyzhang @bentonyjoe191

@ethanyzhang
Copy link
Contributor

@sureshbabu-areekara can you address the conflict with master?

@sureshbabu-areekara sureshbabu-areekara force-pushed the otel-impl-oss branch 3 times, most recently from bcd7aeb to 406c6c1 Compare January 27, 2025 03:39
@sureshbabu-areekara
Copy link
Author

@sureshbabu-areekara can you address the conflict with master?

Hi @ethanyzhang addressed the conflicts. Thanks!!

@ethanyzhang
Copy link
Contributor

Hi @sureshbabu-areekara, please check the tests... they all failed.

@sureshbabu-areekara
Copy link
Author

Hi @ethanyzhang @agrawalreetika All the tests except 1 passed now. The failed one seems not cos of OTel changes and which has passed before. @agrawalreetika could you please help me to rerun the failed test as I don't have options/access.

@sureshbabu-areekara
Copy link
Author

Hi @agrawalreetika All passed now. Can you start reviewing it?
@ethanyzhang

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sureshbabu-areekara
Copy link
Author

In order for this new page of doc to appear on https://prestodb.io/docs/current/plugin.html, please update https://github.com/prestodb/presto/blob/master/presto-docs/src/main/sphinx/plugin.rst.

Hi @steveburnett Added.. Thanks!!

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the doc! Some nits of punctuation and a couple of minor suggestions of phrasing.

Something I want to suggest that you think about - in the Configuration Properties table, consider adding a third column Default for default values. A property with no default values, such as tracing-backend-url, would have a blank cell in its row of that column. But several of these seem to me to have default values that a reader might find helpful to have them shown here. Just an idea to think about.

@sureshbabu-areekara sureshbabu-areekara force-pushed the otel-impl-oss branch 2 times, most recently from 5b70260 to e32d0c5 Compare February 7, 2025 05:07
@sureshbabu-areekara
Copy link
Author

Thanks for the doc! Some nits of punctuation and a couple of minor suggestions of phrasing.

Something I want to suggest that you think about - in the Configuration Properties table, consider adding a third column Default for default values. A property with no default values, such as tracing-backend-url, would have a blank cell in its row of that column. But several of these seem to me to have default values that a reader might find helpful to have them shown here. Just an idea to think about.

Hi @steveburnett Thanks correction and suggestion. Added default values.. Thanks!!

@sureshbabu-areekara sureshbabu-areekara marked this pull request as ready for review February 7, 2025 05:10
steveburnett
steveburnett previously approved these changes Feb 7, 2025
Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

Pull updated branch, new local doc build, looks great. Thanks!

<groupId>org.jetbrains.kotlin</groupId>
<artifactId>kotlin-stdlib</artifactId>
</exclusion>
</exclusions>
Copy link
Member

@agrawalreetika agrawalreetika Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is causing test failure about related class not found, please check - https://github.com/prestodb/presto/actions/runs/13199217100/job/36866051461?pr=24133

java.lang.NoClassDefFoundError: kotlin/jvm/internal/Intrinsics

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is causing test failure about related class not found, please check - https://github.com/prestodb/presto/actions/runs/13199217100/job/36866051461?pr=24133

java.lang.NoClassDefFoundError: kotlin/jvm/internal/Intrinsics

Hi @agrawalreetika Resolved. https://github.com/prestodb/presto/actions/runs/13238120590/job/36947725249?pr=24133

@@ -21,7 +21,6 @@ public final class PrestoHeaders
public static final String PRESTO_SCHEMA = "X-Presto-Schema";
public static final String PRESTO_TIME_ZONE = "X-Presto-Time-Zone";
public static final String PRESTO_LANGUAGE = "X-Presto-Language";
public static final String PRESTO_TRACE_TOKEN = "X-Presto-Trace-Token";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we removing the trace token client option?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Older OTel implementation cannot extend to support the features implemented in new version. So we removed the older implementation completely and gone for a new one. The X-Presto-Trace-Token was part of older one and which is not used now.

private final int length;
private int pos;
private int beg;
private int end;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use descriptive names for variables in this class, these are not clear.
Also we can take out extra comments in different places.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


// attribute type chars
pos++;
for (; pos < length && chars[pos] != '=' && chars[pos] != ' '; pos++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think readability wise using while is better than this.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

end = pos;

if (chars[pos] == ' ') {
for (; pos < length && chars[pos] != '=' && chars[pos] == ' '; pos++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think readability wise using while is better than this?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


pos++; //skip '=' char

for (; pos < length && chars[pos] == ' '; pos++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think readability wise using while is better than this.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -530,13 +530,14 @@ public String toString()
{
return toStringHelper(this)
.add("queryId", queryId)
.add("querySpan", TracingManager.spanString(querySpan).orElse(null))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use static import

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -562,21 +563,21 @@ public static SessionBuilder builder(Session session)
public static class SessionBuilder
{
private QueryId queryId;
private BaseSpan querySpan = TracingManager.getInvalidSpan(); //do not initialize with null
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use static import

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -562,21 +563,21 @@ public static SessionBuilder builder(Session session)
public static class SessionBuilder
{
private QueryId queryId;
private BaseSpan querySpan = TracingManager.getInvalidSpan(); //do not initialize with null
private BaseSpan rootSpan = TracingManager.getInvalidSpan(); //do not initialize with null
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use static import

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

private Integer scheduleDelay;
private Double samplingRatio;
private Boolean tracingEnabled = false;
private Boolean spanSampling = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explicit initialization is not needed

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved.

@@ -246,7 +245,6 @@ public final class SystemSessionProperties
public static final String RESOURCE_AWARE_SCHEDULING_STRATEGY = "resource_aware_scheduling_strategy";
public static final String HEAP_DUMP_ON_EXCEEDED_MEMORY_LIMIT_ENABLED = "heap_dump_on_exceeded_memory_limit_enabled";
public static final String EXCEEDED_MEMORY_LIMIT_HEAP_DUMP_FILE_DIRECTORY = "exceeded_memory_limit_heap_dump_file_directory";
public static final String DISTRIBUTED_TRACING_MODE = "distributed_tracing_mode";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we removing existing tracing mode?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Older OTel implementation cannot extend to support the features implemented in new version. So we removed the older implementation completely and gone for a new one. The distributed_tracing_mode was part of older one and which is not used now.

@ethanyzhang ethanyzhang requested a review from aaneja February 17, 2025 03:31
@sureshbabu-areekara sureshbabu-areekara force-pushed the otel-impl-oss branch 3 times, most recently from 78ae2d9 to 893aab8 Compare February 17, 2025 15:29
Copy link
Contributor

@aaneja aaneja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a very quick first pass. I want to get a better handle on the SPI changes, before doing a full review


public interface TracerHandle
/**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Javadocs are not adding any real value. They simply are spelling out the function names. Please either add a valid description, or remove

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be updated.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

import java.util.Optional;

/**
* The interface Telemetry tracing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These Javadocs are not adding any value IMO. Please add more detailed descriptions

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be updated.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@@ -25,7 +25,6 @@ public class QueryMetadata
{
private final String queryId;
private final Optional<String> transactionId;
private final Optional<String> tracingId;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should remove support for this mechanism of tracing - this is unrelated to OTel ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is related to old OTel implementation. The old implementation was removed in the PR and redesigned as the old one is Map based and cannot be extended to support new features like inner spans. Also previously there were only less number of spans(<10) for particular query. Now there are 150+ spans which cannot be handled in a Map based implementation.

import io.opentelemetry.api.trace.Span;

/**
* The type Tracing span.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is a TracingSpan and how does it differ from a ScopedSpan conceptually ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TracingSpan and ScopedSpan is just a wrapper class for the actual SDK span. TracingSpan we can use it where we know the scope and can manually start and end. ScopedSpan implements AutoCloseable and can be used within a try-with-resources block.

/**
* Load configured open telemetry.
*/
void loadConfiguredOpenTelemetry();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to loadTelemetryPlugin ?

Copy link
Author

@sureshbabu-areekara sureshbabu-areekara Feb 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TelemetryPlugin is actually loaded by PluginManager before this method call. loadConfiguredOpenTelemetry() is for creating the opentelemetry and tracer instance using the loaded plugin. So I think the name can be change to something like setOpenTelemetry() or createOpenTelemetry().

OSS Presto OTel Tracing changes
Review comments.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
from:IBM PR from IBM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants