Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hs_err [libjvm.so+0x7ec967] SIGSEGV #59

Open
cgw2023 opened this issue Apr 16, 2024 · 7 comments
Open

hs_err [libjvm.so+0x7ec967] SIGSEGV #59

cgw2023 opened this issue Apr 16, 2024 · 7 comments

Comments

@cgw2023
Copy link

cgw2023 commented Apr 16, 2024

Describe the bug

I am running a minecraft server that use amazon corretto 21, suddenly , it crash and have a hs_err_pid, I haven't see such report when i was using openjdk. I use amazon corretto because it is recommended by server community.

To Reproduce

run Minecraft server, wait.

Expected behavior

Server run normally

Screenshots

No screenshot

Platform information

OS: Ubuntu 22.04.3 LTS
Version OpenJDK Runtime Environment Corretto-21.0.2.14.1 (build 21.0.2+14-LTS)

Additional context

hs_err_pid:
link: https://paste.denizenscript.com/View/122028
file: hs_err_pid27351.log

with

    java -Xms24576M -Xmx24576M --add-modules=jdk.incubator.vector -XX:+UseG1GC -XX:+ParallelRefProcEnabled -XX:MaxGCPauseMillis=200 -XX:+UnlockExperimentalVMOptions -XX:+DisableExplicitGC -XX:+AlwaysPreTouch -XX:G1HeapWastePercent=5 -XX:G1MixedGCCountTarget=4 -XX:InitiatingHeapOccupancyPercent=15 -XX:G1MixedGCLiveThresholdPercent=90 -XX:G1RSetUpdatingPauseTimePercent=5 -XX:SurvivorRatio=32 -XX:+PerfDisableSharedMem -XX:MaxTenuringThreshold=1 -Dusing.aikars.flags=https://mcflags.emc.gs -Daikars.new.flags=true -XX:G1NewSizePercent=40 -XX:G1MaxNewSizePercent=50 -XX:G1HeapRegionSize=16M -XX:G1ReservePercent=15 -jar luminol-1.20.4-bundler.jar --nogui
@olivergillespie
Copy link
Contributor

Sorry about the issue. How often has it happened?

Looks very similar to https://www.minecraftforum.net/forums/support/java-edition-support/3179109-minecraft-java-running-on-linux-crashes-with-exit which happened on a Microsoft build of OpenJDK 17, so it may not be specific to Corretto or 21.

@cgw2023
Copy link
Author

cgw2023 commented Apr 17, 2024

Sorry about the issue. How often has it happened?

Looks very similar to https://www.minecraftforum.net/forums/support/java-edition-support/3179109-minecraft-java-running-on-linux-crashes-with-exit which happened on a Microsoft build of OpenJDK 17, so it may not be specific to Corretto or 21.

It happen randomly, it maybe 1 times per day or 1-2 times a week.
The link you posted is Minecraft player client instead of server. It maybe unrelated.

@olivergillespie
Copy link
Contributor

Thanks. If you post other hs_err files it might be helpful in case we can see something obvious, but without a reliable reproducer we can run ourselves we're unlikely to investigate.
I think it might be similar to the client crash I linked because of the stack trace and other details of the crash itself, not just because it's Minecraft, so I think the server/client difference might be okay.

@cgw2023
Copy link
Author

cgw2023 commented Apr 17, 2024

Thanks. If you post other hs_err files it might be helpful in case we can see something obvious, but without a reliable reproducer we can run ourselves we're unlikely to investigate. I think it might be similar to the client crash I linked because of the stack trace and other details of the crash itself, not just because it's Minecraft, so I think the server/client difference might be okay.

another hs_err report is created, a crash occurred just now.
hs_err_pid96170.log

@shipilev
Copy link
Member

shipilev commented Apr 17, 2024

This thing:

Stack: [0x0000733a11cfe000,0x0000733a11dfe000],  sp=0x0000733a11dfc990,  free space=1018k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x823129]  G1ParScanThreadState::trim_queue_to_threshold(unsigned int)+0x209
V  [libjvm.so+0x839fd3]  G1ScanHRForRegionClosure::scan_heap_roots(HeapRegion*)+0x903
V  [libjvm.so+0x837188]  G1RemSet::scan_heap_roots(G1ParScanThreadState*, unsigned int, G1GCPhaseTimes::GCParPhases, G1GCPhaseTimes::GCParPhases, bool)+0x1f8
V  [libjvm.so+0x85498b]  G1EvacuateRegionsTask::scan_roots(G1ParScanThreadState*, unsigned int)+0x4b
V  [libjvm.so+0x854bbe]  G1EvacuateRegionsBaseTask::work(unsigned int)+0x7e
V  [libjvm.so+0x103e3e3]  WorkerThread::run()+0x83
V  [libjvm.so+0xf8417f]  Thread::call_run()+0x9f
V  [libjvm.so+0xcee76a]  thread_native_entry(Thread*)+0xda

Registers:
...
R12=0x0000000000400000, 
...

siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x0000000000400000

Looks like a GC crash when it tried to walk the memory that is remarkably similar to null (all zeroes), except one bit. This superficially looks like a bitflip. R12 register has the same value. Disassembly shows we are indeed crashing accessing through that register:

        movzbl  1(%rsi), %r10d                  # encoding: [0x44,0x0f,0xb6,0x56,0x01]
        testb   %r10b, %r10b                    # encoding: [0x45,0x84,0xd2]
        js      767                             # encoding: [0x0f,0x88,A,A,A,A]
                                        #   fixup A - offset: 2, value: 767-4, kind: FK_PCRel_4
        movq    (%r12), %r14                    # encoding: [0x4d,0x8b,0x34,0x24]  <---- crash here
        movq    %r14, %r8                       # encoding: [0x4d,0x89,0xf0]
        andl    $3, %r8d                        # encoding: [0x41,0x83,0xe0,0x03]
        cmpq    $3, %r8                         # encoding: [0x49,0x83,0xf8,0x03]
        je      3722                            # encoding: [0x0f,0x84,A,A,A,A]

...and disassembly suggests it is from here:

} else if (task.is_oop_ptr()) {
do_oop_evac(task.to_oop_ptr());

bool is_oop_ptr() const {
return (raw_value() & (NarrowOopTag | PartialArrayTag)) == 0;
}

Given we have just yanked this from the task queue, it means this 0x0000000000400000 was read from memory.

So, sanity question: have you ran memtest on the affected machine recently?

@cgw2023
Copy link
Author

cgw2023 commented Apr 18, 2024

This thing:

Stack: [0x0000733a11cfe000,0x0000733a11dfe000],  sp=0x0000733a11dfc990,  free space=1018k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x823129]  G1ParScanThreadState::trim_queue_to_threshold(unsigned int)+0x209
V  [libjvm.so+0x839fd3]  G1ScanHRForRegionClosure::scan_heap_roots(HeapRegion*)+0x903
V  [libjvm.so+0x837188]  G1RemSet::scan_heap_roots(G1ParScanThreadState*, unsigned int, G1GCPhaseTimes::GCParPhases, G1GCPhaseTimes::GCParPhases, bool)+0x1f8
V  [libjvm.so+0x85498b]  G1EvacuateRegionsTask::scan_roots(G1ParScanThreadState*, unsigned int)+0x4b
V  [libjvm.so+0x854bbe]  G1EvacuateRegionsBaseTask::work(unsigned int)+0x7e
V  [libjvm.so+0x103e3e3]  WorkerThread::run()+0x83
V  [libjvm.so+0xf8417f]  Thread::call_run()+0x9f
V  [libjvm.so+0xcee76a]  thread_native_entry(Thread*)+0xda

Registers:
...
R12=0x0000000000400000, 
...

siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x0000000000400000

Looks like a GC crash when it tried to walk the memory that is remarkably similar to null (all zeroes), except one bit. This superficially looks like a bitflip. R12 register has the same value. Disassembly shows we are indeed crashing accessing through that register:

        movzbl  1(%rsi), %r10d                  # encoding: [0x44,0x0f,0xb6,0x56,0x01]
        testb   %r10b, %r10b                    # encoding: [0x45,0x84,0xd2]
        js      767                             # encoding: [0x0f,0x88,A,A,A,A]
                                        #   fixup A - offset: 2, value: 767-4, kind: FK_PCRel_4
        movq    (%r12), %r14                    # encoding: [0x4d,0x8b,0x34,0x24]  <---- crash here
        movq    %r14, %r8                       # encoding: [0x4d,0x89,0xf0]
        andl    $3, %r8d                        # encoding: [0x41,0x83,0xe0,0x03]
        cmpq    $3, %r8                         # encoding: [0x49,0x83,0xf8,0x03]
        je      3722                            # encoding: [0x0f,0x84,A,A,A,A]

...and disassembly suggests it is from here:

} else if (task.is_oop_ptr()) {
do_oop_evac(task.to_oop_ptr());

bool is_oop_ptr() const {
return (raw_value() & (NarrowOopTag | PartialArrayTag)) == 0;
}

Given we have just yanked this from the task queue, it means this 0x0000000000400000 was read from memory.

So, sanity question: have you ran memtest on the affected machine recently?

I ran memtest this Tuesday

20240415_174109

@cgw2023
Copy link
Author

cgw2023 commented Apr 23, 2024

other new log
hs_err_pid700955.log
hs_err_pid1079403.log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants