Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Addresses opencensus-shim trace issues under otel javaagent #4900

Merged
merged 7 commits into from
Mar 9, 2023

Conversation

dmarkwat
Copy link
Contributor

Follows suggestion from open-telemetry/opentelemetry-java-instrumentation#6875 (comment) and related issue to address OC shim's breakage under otel javaagent utilization.

This approach pushes the concern of delegating calls to the underlying Span upstream rather than instrument the hidden (package-private classes/non-public api) classes. Based on my analysis of the problem and understanding of the shim, the breakage is rooted in the "incomplete" (lack of a better term) proxying of calls to the io.opentelemetry.api.trace.Span api by the OpentelemetrySpanImpl class. In short, the javaagent requires the instance it minted, AgentSpan, be used in all context-related interactions so the javaagent can bridge the context and changes across the javaagent/application boundary, and the current state of opencensus-shim doesn't do that universally and breaks down.

The integration test here is just to prove the case: revert the change on the OpentelemetrySpanImpl class and run it and the logger shows the breakage. (The test is also copy-paste from the original PR in the javaagent). I'm not especially sure what's the correct way to handle this: upstream (opentelemetry-java) or instrumentation (opentelemetry-java-instrumentation), as I mention in the source issue.

Anyway, appreciate the attention and patience to this issue. Just wanted to put some art to illustrate the preferred alternative to my original PR.

Builds based on main weren't working due to jcenter issues (I think?), so I based on the latest release.

@dmarkwat dmarkwat requested a review from a team October 31, 2022 04:23
@dmarkwat dmarkwat changed the title Improves ocshim trace proxy Addresses opencensus-shim trace issues under otel javaagent Oct 31, 2022
@jkwatson
Copy link
Contributor

jkwatson commented Nov 2, 2022

please don't include commits from other streams of work. And, it looks like the build is failing because the code has not been run through the formatter.

@codecov
Copy link

codecov bot commented Nov 3, 2022

Codecov Report

Patch coverage: 95.00% and project coverage change: +0.01 🎉

Comparison is base (f797537) 90.91% compared to head (b4290da) 90.93%.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #4900      +/-   ##
============================================
+ Coverage     90.91%   90.93%   +0.01%     
- Complexity     4884     4902      +18     
============================================
  Files           550      551       +1     
  Lines         14476    14495      +19     
  Branches       1370     1370              
============================================
+ Hits          13161    13181      +20     
+ Misses          917      915       -2     
- Partials        398      399       +1     
Impacted Files Coverage Δ
...o/opentelemetry/opencensusshim/DelegatingSpan.java 92.85% <92.85%> (ø)
...elemetry/opencensusshim/OpenTelemetrySpanImpl.java 96.36% <100.00%> (+7.30%) ⬆️
...ry/opencensusshim/OpenTelemetryContextManager.java 77.77% <0.00%> (-11.12%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@dmarkwat
Copy link
Contributor Author

dmarkwat commented Nov 3, 2022

Sorry about that -- I was based on a tag bc my local gradle was struggling. Should be better now?

@jkwatson
Copy link
Contributor

jkwatson commented Nov 5, 2022

@jsuereth as our local census guru, can you take a look?

@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Nov 19, 2022
@jkwatson
Copy link
Contributor

@trask I wasn't involved in the initial discussion about this, and I don't have any OC expertise. Not sure how we should proceed on this.

@github-actions github-actions bot removed the Stale label Nov 25, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Dec 9, 2022

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Dec 9, 2022
@dmarkwat
Copy link
Contributor Author

Bump. I've moved back onto work that could really use this. It's but one of the alternatives as I've pointed out, but it's really dealer's choice: a) make this a central change (this PR) allowing it to coexist up front with the behavior of otel javaagent and its requirement for the specific type of Span or b) instrument it such as with open-telemetry/opentelemetry-java-instrumentation#6875. a) cleaner but less clear without inspection (+ no out of box tests without a "no op" integration that simply tests it); b) arguably messier (and just more code) but explicit and has the benefit of tests by virtue of being an instrumentation to tell us it no longer works.

@github-actions github-actions bot removed the Stale label Dec 19, 2022
@jkwatson
Copy link
Contributor

I'm having trouble getting people who are knowledgeable about this issue to review it. Still pinging @trask about it. :)

Copy link
Member

@trask trask left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this solves the opencensus-javaagent bridging issue I have no objection to it.

if you want to add tests on the javaagent side (but no actual instrumentation), there's precedent for that, e.g. see https://github.com/open-telemetry/opentelemetry-java-instrumentation/tree/main/instrumentation/cdi-testing

* SPDX-License-Identifier: Apache-2.0
*/

package io.opentelemetry.api.trace;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this the right package for this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. No, this package is not correct as it is already used in the API itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I move it over to the opencensusshim package or somewhere else?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, please

*/
// todo make this unnecessary
@SuppressWarnings("UngroupedOverloads")
public interface ProxyingSpan extends Span {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might be clearer as an abstract class

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but in this case it can't be as written: the opencensus shim code extends io.opencensus.trace.Span which is itself an abstract class. I suppose there's other ways of working that out, but as written it's not as easy as just changing it out.

Want me to rework this into an abstract class all the same? It may involve changes upstream though just given how it gets used -- hard to say without digging in.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would this work?

abstract class ProxyingSpan extends io.opencensus.trace.Span implements io.opentelemetry.api.trace.Span

also, does ProxyingSpan need to be public?

Copy link
Contributor Author

@dmarkwat dmarkwat Jan 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point: in this case, no I don't think it needs to be public. (See original thinking below)

That should work, yes. I'll make the change.

My original thinking on this was that ProxyingSpan could be used by the opentracing setup as well -- necessitating keeping it as decoupled from anything in opencensus -- if only to ensure all shims were correctly proxying the calls to the various otel Span, etc. methods to their agent-minted ApplicationSpan objects. But, stepping back that seems 1) ahead of myself and 2) acting without evidence that opentracing has a similar issue (I suspect it might, but it can also be fixed later).

So at this point I'm happy with either approach. But do lmk if you concur with my original thinking and think this should be pulled out and made usable for any other shims.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spoke too soon: there's a conflict between the end() methods of the OC and otel Span types: OC's is marked final giving us no way to intercept that call for the otel side without plugging in the final proxying method, end() into the implementation... Bit of a betrayal of the intent I had in mind, I fear, and muddies the implementation details by blurring the responsibility.

Do you still think the abstract class approach is worth it? I'm not so sure anymore and would personally want to keep the types decoupled, but I'm a newcomer to this project and am happy to defer.

@dmarkwat
Copy link
Contributor Author

dmarkwat commented Dec 30, 2022

OK that push with a ton of extra commits was my bad. Did not pay attention. Cleaning this up now.

dmarkwat added a commit to dmarkwat/opentelemetry-java-instrumentation that referenced this pull request Dec 30, 2022
@dmarkwat
Copy link
Contributor Author

@trask thanks for the suggestion 👍 ; is open-telemetry/opentelemetry-java-instrumentation#7488 like what you had in mind with adding test-only to the otel-javaagent repo?

import io.opentelemetry.context.Scope;
import org.junit.jupiter.api.Test;

// todo remove if/once the underlying issue is agreed upon
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this todo intended to be merged? If so, then we need to have a GH issue created to fix it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the reminder; no, it is not intended to be merged. With open-telemetry/opentelemetry-java-instrumentation#7488 in play, this class is solved elsewhere and I can remove it. I'm going to spend a little time adding to any existing tests in both PRs to ensure better overall coverage of cases I've either uncovered or can conceive.

Copy link
Contributor Author

@dmarkwat dmarkwat Jan 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is however, this todo, which exists as a result of comments I made in the original issue. Summarized as a question: "can and should the javaagent be changed to not require the ApplicationSpan type internally and thereby relieve shim/others implementations of needing to proxy calls to the specific javaagent-minted Spans?"

Not a small body of work, to be sure. But if the policy is one-issue-per-todo, wanted to either a) ensure adequate coverage or b) decide it's not worth tracking and remove it. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this test doesn't seem to have any assertions in it, could we beef it up to assert what is expected, at least? I'm really not familiar with this issue much at all... I just want to make sure we don't leave un-tracked work left to be done.

Copy link
Contributor Author

@dmarkwat dmarkwat Jan 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfectly understandable. This test was just a stand-in to verify some assumptions before we introduced the dedicated integration tests PR. I've now removed it in favor of a unit test which ensures the methods are all appropriately proxied.

@jkwatson
Copy link
Contributor

jkwatson commented Jan 8, 2023

Before we would want to merge this, we would need to make sure that we have test coverage on uncovered lines.

@dmarkwat
Copy link
Contributor Author

Looks like that last round of tests filled out the coverage, and I've rebased against main.

dmarkwat added a commit to dmarkwat/opentelemetry-java-instrumentation that referenced this pull request Jan 16, 2023
Copy link
Contributor

@jkwatson jkwatson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems ok to me; I'm not all that familiar with OC, though.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 7, 2023

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Feb 7, 2023
@dmarkwat
Copy link
Contributor Author

dmarkwat commented Feb 8, 2023

Bump

@github-actions github-actions bot removed the Stale label Feb 8, 2023
@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added Stale and removed Stale labels Feb 22, 2023
@dmarkwat dmarkwat force-pushed the improved-ocshim-trace-proxy branch from 326f9d2 to b4290da Compare March 8, 2023 00:56
@dmarkwat
Copy link
Contributor Author

dmarkwat commented Mar 8, 2023

Rebased on main now that it looks like I was able to get to all the comments.

@jack-berg jack-berg merged commit 4532648 into open-telemetry:main Mar 9, 2023
dmarkwat added a commit to dmarkwat/opentelemetry-java-instrumentation that referenced this pull request Mar 11, 2023
dmarkwat added a commit to dmarkwat/opentelemetry-java-instrumentation that referenced this pull request Mar 13, 2023
dmarkwat added a commit to dmarkwat/opentelemetry-java-instrumentation that referenced this pull request Mar 15, 2023
dmarkwat added a commit to dmarkwat/opentelemetry-java-instrumentation that referenced this pull request Apr 13, 2023
dmarkwat added a commit to dmarkwat/opentelemetry-java-instrumentation that referenced this pull request Apr 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants