Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPCC-32248 Add tracing to rowservice #19314

Open
wants to merge 1 commit into
base: candidate-9.8.x
Choose a base branch
from

Conversation

jpmcmu
Copy link
Contributor

@jpmcmu jpmcmu commented Nov 25, 2024

  • Added opentelemetry tracing to rowservice

Signed-off-by: James McMullan James.McMullan@lexisnexis.com

Type of change:

  • This change is a bug fix (non-breaking change which fixes an issue).
  • This change is a new feature (non-breaking change which adds functionality).
  • This change improves the code (refactor or other change that does not change the functionality)
  • This change fixes warnings (the fix does not alter the functionality or the generated code)
  • This change is a breaking change (fix or feature that will cause existing behavior to change).
  • This change alters the query API (existing queries will have to be recompiled)

Checklist:

  • My code follows the code style of this project.
    • My code does not create any new warnings from compiler, build system, or lint.
  • The commit message is properly formatted and free of typos.
    • The commit message title makes sense in a changelog, by itself.
    • The commit is signed.
  • My change requires a change to the documentation.
    • I have updated the documentation accordingly, or...
    • I have created a JIRA ticket to update the documentation.
    • Any new interfaces or exported functions are appropriately commented.
  • I have read the CONTRIBUTORS document.
  • The change has been fully tested:
    • I have added tests to cover my changes.
    • All new and existing tests passed.
    • I have checked that this change does not introduce memory leaks.
    • I have used Valgrind or similar tools to check for potential issues.
  • I have given due consideration to all of the following potential concerns:
    • Scalability
    • Performance
    • Security
    • Thread-safety
    • Cloud-compatibility
    • Premature optimization
    • Existing deployed queries will not be broken
    • This change fixes the problem, not just the symptom
    • The target branch of this pull request is appropriate for such a change.
  • There are no similar instances of the same problem that should be addressed
    • I have addressed them here
    • I have raised JIRA issues to address them separately
  • This is a user interface / front-end modification
    • I have tested my changes in multiple modern browsers
    • The component(s) render as expected

Smoketest:

  • Send notifications about my Pull Request position in Smoketest queue.
  • Test my draft Pull Request.

Testing:

Copy link

Jira Issue: https://hpccsystems.atlassian.net//browse/HPCC-32248

Jirabot Action Result:
Workflow Transition To: Merge Pending
Updated PR

@@ -162,6 +162,69 @@ static ISecureSocket *createSecureSocket(ISocket *sock, bool disableClientCertVe
}
#endif

//------------------------------------------------------------------------------
Copy link
Contributor Author

@jpmcmu jpmcmu Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ActiveSpanScope is very similar to ThreadedSpanScope described by Gavin here: https://hpccsystems.atlassian.net/jira/software/c/projects/HPCC/issues/HPCC-32982. I liked the name ActiveSpanScope because I believe the class has utility outside of multithreaded contexts, IE: time slicing. Would it be worthwhile to move this out of dafilesrv into jtrace?

@@ -366,13 +366,31 @@ version: 1.0
detail: 100
)!!";

IPropertyTree * loadConfigurationWithGlobalDefault(const char * defaultYaml, Owned<IPropertyTree>& globalConfig, const char * * argv, const char * componentTag, const char * envPrefix)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is similar to work Jake has done in HPCC-32991, might be worthwhile to retarget to master and call the overloaded doLoadConfiguration instead?

std::string traceParent = fullTraceContext ? fullTraceContext : "";
traceParent = traceParent.substr(0,traceParent.find_last_of("-"));

if (!traceParent.empty() && requestTraceParent != traceParent)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I am checking if the traceParent has changed every time process is called here because the client side may use multiple spans during the lifetime a single CRemoteRequest. See below screenshots for an example.

@jpmcmu
Copy link
Contributor Author

jpmcmu commented Nov 25, 2024

Goal:
The goal of this PR is to add initial tracing support to the row service in dafilesrv, which will improve debuggability for downstream row service clients as well as reducing the time the platform team spends debugging issues.

Current Tracing Limitations:
There is limited support for intercepting errors and adding them to the tracing spans, adding annotations and/or statistics to spans, and no internal spans tracking work within the row service. These limitations are intentional to keep the initial PR as simple as possible, and will be addressed in future PRs.

Exported Tracing example:
Note that during the read the client side creates more than one span over the lifetime of connection to the row service. The row service tracing supports this and correct handles the batching the client side is doing.
Screenshot 2024-11-25 at 1 39 44 PM

- Added opentelemetry tracing to rowservice

Signed-off-by: James McMullan James.McMullan@lexisnexis.com
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant