Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add safeguard for J9-specific jmethodid unloading behaviour (version-dependent) #164

Merged
merged 2 commits into from
Jan 8, 2025

Conversation

MattAlp
Copy link
Contributor

@MattAlp MattAlp commented Jan 7, 2025

What does this PR do?:
Adds a safeguard to prevent crashes on certain versions of J9 for which we don't have certain API compliance guarantees assumed to be present by async-profiler / java-profiler.

Motivation:
Incident 32141

Additional Notes:
See incident channel and private Slack for further details, (private) customer info and correspondence informed this change.

How to test the change?:

For Datadog employees:

  • If this PR touches code that signs or publishes builds or packages, or handles
    credentials of any kind, I've requested a review from @DataDog/security-design-and-guidance.
  • This PR doesn't touch any of that.
  • JIRA: PROF-11094

Unsure? Have a question? Request a review!

@MattAlp MattAlp requested review from jbachorik and r1viollet January 7, 2025 20:00
Copy link

github-actions bot commented Jan 7, 2025

🔧 Report generated by pr-comment-scanbuild

Scan-Build Report

User:runner@fv-az1334-408
Working Directory:/home/runner/work/java-profiler/java-profiler/ddprof-lib/src/test/make
Command Line:make -j4 clean all
Clang Version:Ubuntu clang version 18.1.3 (1ubuntu1)
Date:Wed Jan 8 14:07:12 2025

Bug Summary

Bug TypeQuantityDisplay?
All Bugs5
Logic error
Dereference of null pointer3
Suspicious operation
Bitwise shift1
Unused code
Dead nested assignment1

Reports

Bug Group Bug Type ▾ File Function/Method Line Path Length
Suspicious operationBitwise shiftvmStructs.cppfind87216
Unused codeDead nested assignmentvmStructs.cppcheckNativeBinding9771
Logic errorDereference of null pointersafeAccess.hload3318
Logic errorDereference of null pointersymbols_linux.hElfParser12928
Logic errorDereference of null pointerflightRecorder.cppflush15118

Copy link

github-actions bot commented Jan 7, 2025

🔧 Report generated by pr-comment-cppcheck

CppCheck Report

Warnings (7)

Style Violations (403)

@r1viollet
Copy link
Collaborator

Thanks for looking into this.
Is it possible to add a test for this behaviour ?

@jbachorik
Copy link
Collaborator

Is it possible to add a test for this behaviour ?

This would probably be almost impossible. We are getting the wrong method ID from a JVMTI call. Hence, we would need to stub the JNI call with some complex patching just for the duration of the test case (patching via LD_PRELOAD, if at all possible, would be too wide and we might fail way before we actually get to the code we want to test).

One thing that is fishy is that jvmti->GetMethodDeclaringClass(method, &method_class) is returning -1 in method_class but not returning an error code. By the spec if this function is not able to properly handle the provided methodid it should return [JVMTI_ERROR_INVALID_METHODID](https://docs.oracle.com/javase/7/docs/platform/jvmti/jvmti.html#JVMTI_ERROR_INVALID_METHODID)

@MattAlp
Copy link
Contributor Author

MattAlp commented Jan 8, 2025

My understanding from the internally shared docs is that this error return code is guaranteed only for more recent patches of J9, and can return OK with a reference to -1. This lines up with the incident reporting that the JVMTI calls succeed but then on the crash stack, we see

Registers

rip 0x7fb9e9766ef6 <jvmtiGetClassSignature+198>
rbp 0xffffffffffffffff (-1)

Ops

0x00007fb9e9766e3c <+12>: mov %rsi,%rbp <-- save the second parameter jclazz to rbp
0x00007fb9e9766ed5 <+165>: test %rbp,%rbp <-- check if jclazz is null
0x00007fb9e9766ede <+174>: mov 0x0(%rbp),%rbp <-- dereference jclazz and save to rbp
0x00007fb9e9766ee2 <+178>: test %rbp,%rbp <-- check if rbp is null
=> 0x00007fb9e9766ef6 <+198>: mov 0x4(%rbp,%rax,1),%r10 <-- crash here because rbp is not valid.

@MattAlp
Copy link
Contributor Author

MattAlp commented Jan 8, 2025

This would probably be almost impossible. We are getting the wrong method ID from a JVMTI call. Hence, we would need to stub the JNI call with some complex patching just for the duration of the test case (patching via LD_PRELOAD, if at all possible, would be too wide and we might fail way before we actually get to the code we want to test).

Agreed. The surface area of stubbing and testing something like this would quickly require finicky setup and/or stubbing out a lot of JVMTI and/or core JVM functionality.

Copy link
Collaborator

@jbachorik jbachorik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one thing - could you add a comment about the J9 using -1 for unloaded methods so it is visible that this is actually a bug in J9 JVMTI implementation where they fail to return error code for unloaded method but rather segfault?

@MattAlp MattAlp merged commit 69ca3b5 into main Jan 8, 2025
38 checks passed
@MattAlp MattAlp deleted the mattalp/add-j9-crash-safeguard branch January 8, 2025 14:28
@github-actions github-actions bot added this to the 1.18.0 milestone Jan 8, 2025
// when a classloader is unloaded, the jmethodIDs are not freed, but instead marked as -1.
// The nested check below is to mitigate these crashes.
// In more recent versions, the condition above will short-circuit safely.
((!VM::isOpenJ9() || method_class != reinterpret_cast<jclass>(-1)) && jvmti->GetClassSignature(method_class, &class_name, NULL) == 0) &&
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor / style: we could have declared the -1 value as a constexpr value (k_invalid_j9_method_class ?)
Is checking the -1 value not enough (to avoid the isOpenJ9 call) ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, now when rereading the comment, this does not make much sense.
The info from J9 engineers is about invalid jmethodid becoming -1 - but that's not what we were seeing, right? It was the jclass returned from GetMethodDeclaringClass on such invalid jmethodid that was -1.
I mean, the workaround would work but the root cause seems to be J9 not handling the invalid jmethodids consistently in the JVMTI calls itself (eg. GetMethodDeclaringClass when called on an invalid jmethodid should return an error code instead of putting -1 to a jclass variable 🤷

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the root cause seems to be J9 not handling the invalid jmethodids consistently in the JVMTI calls itself

I agree with this. I believe that this fixed in later versions of J9, but is considered acceptable for others (hence version-dependent in the PR title). It would be more accurate to do a specific J9 version check here rather than simply !VM::isOpenJ9(), but that opens up the possibility of us missing some versions in our known-to-be-buggy list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@r1viollet

  1. I'll constexpr, it doesn't hurt
  2. I checked isOpenJ9() earlier - it returns a boolean based on a member that is set once at profiler startup, so I think the call is negligible. I added the check prior to adding the comments for the sake of self-documenting code, but also to remove the comparison for non-J9 VMs. WDYT - remove it?

static bool isOpenJ9() { return _openj9; }

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. As a general guidance we would want to avoid reinterpret casts when possible. The constexpr part is just a good habit to take (though sometimes hard to apply if we use old C++ versions)
  2. The j9 call is more a question of taste and obviously very minor.

@r1viollet
Copy link
Collaborator

This would probably be almost impossible. We are getting the wrong method ID from a JVMTI call. Hence, we would need to stub the JNI call with some complex patching just for the duration of the test case (patching via LD_PRELOAD, if at all possible, would be too wide and we might fail way before we actually get to the code we want to test).

Agreed. The surface area of stubbing and testing something like this would quickly require finicky setup and/or stubbing out a lot of JVMTI and/or core JVM functionality.

Capturing JVM invariants per distributions seems like something very interesting.
I was not particularly precise on what type of test I was referring to. Do you think a unit test mocking the interactions would be difficult to write ?

@jbachorik
Copy link
Collaborator

jbachorik commented Jan 9, 2025

Do you think a unit test mocking the interactions would be difficult to write ?

The problem is that it is the JVMTI function itself which behaves not according to the spec. I mean, theoretically we could create tests for spec conformance of the JVMTI implementation per JDK distribution but that would be no small task (not even considering the cost of running and maintaining the suite)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants