Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JUL log correlation #2724

Merged
merged 32 commits into from
Oct 5, 2022
Merged

Conversation

SylvainJuge
Copy link
Member

@SylvainJuge SylvainJuge commented Jul 25, 2022

What does this PR do?

Fixes #2492
Depends on elastic/ecs-logging-java#190 and elastic/ecs-logging-java#193

Checklist

  • merge dependency PR on ecs-logging-java : Simplify + Allow custom MDC lookup ecs-logging-java#193
  • release ecs-logging-java + use release version in this PR (build will fail until this is done)
  • decide if the native MDC implementation is enough for now or we have to invest more efforts on this (which will likely require another update in ecs-logging-java if we change this): for now we avoid having any MDC impl and delegate to actual tracer state through CorrelationIdMapAdapter.
  • Decide if we want/need to explicitly exclude instrumentation of internal plugins classes (currently breaks javalin tests): delegated to Exclude agent plugins from instrumentation #2757
  • Test with simple app that relies on JUL (with/without agent + ECS lib)
  • This is an enhancement of existing features, or a new feature in existing plugins
    • I have updated CHANGELOG.asciidoc
    • I have added tests that prove my fix is effective or that my feature works
    • Added an API method or config option? Document in which version this will be introduced
    • I have made corresponding changes to the documentation

@apmmachine
Copy link
Contributor

apmmachine commented Jul 25, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-10-05T09:39:23.767+0000

  • Duration: 59 min 13 sec

Test stats 🧪

Test Results
Failed 0
Passed 3163
Skipped 35
Total 3198

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • run benchmark tests : Run the benchmark tests.

  • run jdk compatibility tests : Run the JDK Compatibility tests.

  • run integration tests : Run the Agent Integration tests.

  • run end-to-end tests : Run the APM-ITs.

  • run windows tests : Build & tests on windows.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

Copy link
Contributor

@eyalkoren eyalkoren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the log correlation instrumentation is independent of the ECS-reformatting one, we need to distinguish between two use cases:

  1. ECS-logging-JUL is used as an application dependency, in which case we need to update the JulMdc that is loaded by the agent CL.
  2. ECS-reformatting - where we need to add to the JulMdc that is loaded by the agent plugin CL.

I think the proper way to achieve that is by having two instrumentations (within two plugins):

  1. one that is only applied if the application CL can load JulMdc. Its advice adds to the JulMdc that is loaded by the application class loader. In order to achieve that, you may need to create a separate plugin (so that the plugin class loader does not contain the ECS logging classes bundled with the agent) AND you may need to update the list that says what's loaded from the agent class loader and what's not.
  2. the second is only applied if the application CL cannot load JulMdc, in which case the advice would add to the agent-bundled JulMdc. This one needs to have the JulMdc loaded by the plugin CL, which is the case with the current logging plugin, so your JulEcsLogCorrelationInstrumentation should be fit for this purpose.

Sorry if I confused or mislead you, it is all based on static code analysis and better be verified with tests of both scenarios.

@SylvainJuge
Copy link
Member Author

Since the log correlation instrumentation is independent of the ECS-reformatting one, we need to distinguish between two use cases:

  1. ECS-logging-JUL is used as an application dependency, in which case we need to update the JulMdc that is loaded by the agent CL.
  2. ECS-reformatting - where we need to add to the JulMdc that is loaded by the agent plugin CL.

I think the proper way to achieve that is by having two instrumentations (within two plugins):

  1. one that is only applied if the application CL can load JulMdc. Its advice adds to the JulMdc that is loaded by the application class loader. In order to achieve that, you may need to create a separate plugin (so that the plugin class loader does not contain the ECS logging classes bundled with the agent) AND you may need to update the list that says what's loaded from the agent class loader and what's not.
  2. the second is only applied if the application CL cannot load JulMdc, in which case the advice would add to the agent-bundled JulMdc. This one needs to have the JulMdc loaded by the plugin CL, which is the case with the current logging plugin, so your JulEcsLogCorrelationInstrumentation should be fit for this purpose.

Sorry if I confused or mislead you, it is all based on static code analysis and better be verified with tests of both scenarios.

The way I understand it, there are three cases:

  • jul-ecs-formatter is used by the application, without agent, the JulMdc is not read and is always empty
  • jul-ecs-formatter is used by the application with the agent with log reformatting, in this case we should update the JulMdc that is used by the application and ignore the one internal to the agent.
  • jul-ecs-formatter is not used by the application with the agent + log reformatting, in this case we just rely on agent JulMdc and agent EcsFormatter.

So, in a sense, we just need to alter which instance of the JulMdc we update depending on the scenario.

@eyalkoren
Copy link
Contributor

So, in a sense, we just need to alter which instance of the JulMdc we update depending on the scenario.

Exactly, but this may be trickier than it seems.
What I wrote above is what it would take to do in the "proper" way, where your code can directly deal with both JulMdc classes as types.
Another alternative is to extend your current JulEcsLogCorrelationInstrumentation to add the ability to update the one loaded by the app class loader by loading it explicitly through the application class loader based on class name and get MethodHandles for the put and remove methods. Then you can always update the one loaded by the agent and in addition update the one loaded by the app CL if there is such.

@SylvainJuge
Copy link
Member Author

If we manage to update the JulMdc of the application, it means that we can make ECS logging output ((of the application) contain the trace.id error.id and transaction.id fields that are populated by the agent.

However, if we limit ourselves to the MDC attributes, the other ECS logging attributes service.name, service.version, ... will remain with the values configured at the application level.

In the case of overriding logs, I think it would be simpler to just replace the EcsFormatter of the application with the internal one of the agent, in other words it could be as simple as allowing EcsFormatter for Jul to allow delegating to another instance that would be set by the agent through instrumentation.

@eyalkoren
Copy link
Contributor

eyalkoren commented Aug 2, 2022

However, if we limit ourselves to the MDC attributes, the other ECS logging attributes service.name, service.version, ... will remain with the values configured at the application level.

Why is that a problem? User configured values are always the best, we only decorate the logs with the tracing IDs if the user chooses that. Logging correlation is an independent feature from ECS-reformatting, even if they can cooperate.
BTW, there is also a apm-ecs-logging-plugin that automatically sets the discovered service name, though not implemented to JUL.

In the case of overriding logs, I think it would be simpler to just replace the EcsFormatter of the application with the internal one of the agent, in other words it could be as simple as allowing EcsFormatter for Jul to allow delegating to another instance that would be set by the agent through instrumentation.

But the EcsFormatter that the application sees is a different class than the one loaded by the agent, as they are loaded by different class loaders.

@SylvainJuge
Copy link
Member Author

PR updated with a different approach:

  • agent has its own MDC : AgentMDC, which currently duplicates the one in ECS Logging
  • calls to JulMdc within ECS logging are re-directed to AgentMDC through instrumentation.
  • log-correlation (within the agent) only uses AgentMDC for reading & writing, thus there is no confusion from having two copies of JulMdc in two distinct classloaders.

As a result, when using ECS log formatter in the application, the agent can now automatically inject the log correlation attributes into the logs.

@eyalkoren
Copy link
Contributor

eyalkoren commented Aug 14, 2022

Forgot to add to the review comment - what about trace correlation when applying ECS-reformatting?
Do we rely on instrumenting our embedded ECS-logging classes?
I think it would be safer to explicitly exclude ECS logging classes from instrumentation and instead create a custom EcsFormatter in ECS-reformatting that implements getMdcEntries().

Copy link
Contributor

@eyalkoren eyalkoren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more thing - please add JUL support for correlation and error capturing to the supported technologies page.

@SylvainJuge SylvainJuge marked this pull request as ready for review September 19, 2022 09:50
@github-actions
Copy link

/test

@@ -77,7 +77,7 @@
<dependency>
<groupId>co.elastic.logging</groupId>
<artifactId>log4j2-ecs-layout</artifactId>
<version>${version.log4j2-ecs-layout}</version>
<version>${version.ecs.logging}</version>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[for reviewer] we now have a single version to target ecs-logging and its sub-modules.

@Override
public ElementMatcher.Junction<ClassLoader> getClassLoaderMatcher() {
// ECS formatter that is loaded within the agent should not be instrumented
return not(CustomElementMatchers.isInternalPluginClassLoader());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[for reviewer] this filtering means that the agent copy of ecs-logging won't be instrumented, hence we have to explicitly override co.elastic.logging.jul.EcsFormatter#getMdcEntries in co.elastic.apm.agent.jul.reformatting.JulEcsReformattingHelper#createEcsFormatter

@SylvainJuge SylvainJuge enabled auto-merge (squash) October 5, 2022 10:00
@SylvainJuge SylvainJuge merged commit 06f171f into elastic:main Oct 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add log correlation to JUL
4 participants