hs_err [libjvm.so+0x7ec967] SIGSEGV #59

cgw2023 · 2024-04-16T13:24:37Z

Describe the bug

I am running a minecraft server that use amazon corretto 21, suddenly , it crash and have a hs_err_pid, I haven't see such report when i was using openjdk. I use amazon corretto because it is recommended by server community.

To Reproduce

run Minecraft server, wait.

Expected behavior

Server run normally

Screenshots

No screenshot

Platform information

OS: Ubuntu 22.04.3 LTS
Version OpenJDK Runtime Environment Corretto-21.0.2.14.1 (build 21.0.2+14-LTS)

Additional context

hs_err_pid:
link: https://paste.denizenscript.com/View/122028
file: hs_err_pid27351.log

with

    java -Xms24576M -Xmx24576M --add-modules=jdk.incubator.vector -XX:+UseG1GC -XX:+ParallelRefProcEnabled -XX:MaxGCPauseMillis=200 -XX:+UnlockExperimentalVMOptions -XX:+DisableExplicitGC -XX:+AlwaysPreTouch -XX:G1HeapWastePercent=5 -XX:G1MixedGCCountTarget=4 -XX:InitiatingHeapOccupancyPercent=15 -XX:G1MixedGCLiveThresholdPercent=90 -XX:G1RSetUpdatingPauseTimePercent=5 -XX:SurvivorRatio=32 -XX:+PerfDisableSharedMem -XX:MaxTenuringThreshold=1 -Dusing.aikars.flags=https://mcflags.emc.gs -Daikars.new.flags=true -XX:G1NewSizePercent=40 -XX:G1MaxNewSizePercent=50 -XX:G1HeapRegionSize=16M -XX:G1ReservePercent=15 -jar luminol-1.20.4-bundler.jar --nogui

The text was updated successfully, but these errors were encountered:

olivergillespie · 2024-04-16T13:36:25Z

Sorry about the issue. How often has it happened?

Looks very similar to https://www.minecraftforum.net/forums/support/java-edition-support/3179109-minecraft-java-running-on-linux-crashes-with-exit which happened on a Microsoft build of OpenJDK 17, so it may not be specific to Corretto or 21.

cgw2023 · 2024-04-17T04:34:13Z

Sorry about the issue. How often has it happened?

Looks very similar to https://www.minecraftforum.net/forums/support/java-edition-support/3179109-minecraft-java-running-on-linux-crashes-with-exit which happened on a Microsoft build of OpenJDK 17, so it may not be specific to Corretto or 21.

It happen randomly, it maybe 1 times per day or 1-2 times a week.
The link you posted is Minecraft player client instead of server. It maybe unrelated.

olivergillespie · 2024-04-17T09:07:40Z

Thanks. If you post other hs_err files it might be helpful in case we can see something obvious, but without a reliable reproducer we can run ourselves we're unlikely to investigate.
I think it might be similar to the client crash I linked because of the stack trace and other details of the crash itself, not just because it's Minecraft, so I think the server/client difference might be okay.

cgw2023 · 2024-04-17T15:12:41Z

Thanks. If you post other hs_err files it might be helpful in case we can see something obvious, but without a reliable reproducer we can run ourselves we're unlikely to investigate. I think it might be similar to the client crash I linked because of the stack trace and other details of the crash itself, not just because it's Minecraft, so I think the server/client difference might be okay.

another hs_err report is created, a crash occurred just now.
hs_err_pid96170.log

shipilev · 2024-04-17T17:28:31Z

This thing:

Stack: [0x0000733a11cfe000,0x0000733a11dfe000],  sp=0x0000733a11dfc990,  free space=1018k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x823129]  G1ParScanThreadState::trim_queue_to_threshold(unsigned int)+0x209
V  [libjvm.so+0x839fd3]  G1ScanHRForRegionClosure::scan_heap_roots(HeapRegion*)+0x903
V  [libjvm.so+0x837188]  G1RemSet::scan_heap_roots(G1ParScanThreadState*, unsigned int, G1GCPhaseTimes::GCParPhases, G1GCPhaseTimes::GCParPhases, bool)+0x1f8
V  [libjvm.so+0x85498b]  G1EvacuateRegionsTask::scan_roots(G1ParScanThreadState*, unsigned int)+0x4b
V  [libjvm.so+0x854bbe]  G1EvacuateRegionsBaseTask::work(unsigned int)+0x7e
V  [libjvm.so+0x103e3e3]  WorkerThread::run()+0x83
V  [libjvm.so+0xf8417f]  Thread::call_run()+0x9f
V  [libjvm.so+0xcee76a]  thread_native_entry(Thread*)+0xda

Registers:
...
R12=0x0000000000400000, 
...

siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x0000000000400000

Looks like a GC crash when it tried to walk the memory that is remarkably similar to null (all zeroes), except one bit. This superficially looks like a bitflip. R12 register has the same value. Disassembly shows we are indeed crashing accessing through that register:

        movzbl  1(%rsi), %r10d                  # encoding: [0x44,0x0f,0xb6,0x56,0x01]
        testb   %r10b, %r10b                    # encoding: [0x45,0x84,0xd2]
        js      767                             # encoding: [0x0f,0x88,A,A,A,A]
                                        #   fixup A - offset: 2, value: 767-4, kind: FK_PCRel_4
        movq    (%r12), %r14                    # encoding: [0x4d,0x8b,0x34,0x24]  <---- crash here
        movq    %r14, %r8                       # encoding: [0x4d,0x89,0xf0]
        andl    $3, %r8d                        # encoding: [0x41,0x83,0xe0,0x03]
        cmpq    $3, %r8                         # encoding: [0x49,0x83,0xf8,0x03]
        je      3722                            # encoding: [0x0f,0x84,A,A,A,A]

...and disassembly suggests it is from here:

corretto-21/src/hotspot/share/gc/g1/g1ParScanThreadState.cpp

Lines 291 to 292 in 7a8c187

    
           } else if (task.is_oop_ptr()) { 
        
             do_oop_evac(task.to_oop_ptr());

corretto-21/src/hotspot/share/gc/shared/taskqueue.hpp

Lines 634 to 636 in 7a8c187

    
           bool is_oop_ptr() const { 
        
             return (raw_value() & (NarrowOopTag | PartialArrayTag)) == 0; 
        
           }

Given we have just yanked this from the task queue, it means this 0x0000000000400000 was read from memory.

So, sanity question: have you ran memtest on the affected machine recently?

cgw2023 · 2024-04-18T06:31:09Z

This thing:
Stack: [0x0000733a11cfe000,0x0000733a11dfe000],  sp=0x0000733a11dfc990,  free space=1018k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x823129]  G1ParScanThreadState::trim_queue_to_threshold(unsigned int)+0x209
V  [libjvm.so+0x839fd3]  G1ScanHRForRegionClosure::scan_heap_roots(HeapRegion*)+0x903
V  [libjvm.so+0x837188]  G1RemSet::scan_heap_roots(G1ParScanThreadState*, unsigned int, G1GCPhaseTimes::GCParPhases, G1GCPhaseTimes::GCParPhases, bool)+0x1f8
V  [libjvm.so+0x85498b]  G1EvacuateRegionsTask::scan_roots(G1ParScanThreadState*, unsigned int)+0x4b
V  [libjvm.so+0x854bbe]  G1EvacuateRegionsBaseTask::work(unsigned int)+0x7e
V  [libjvm.so+0x103e3e3]  WorkerThread::run()+0x83
V  [libjvm.so+0xf8417f]  Thread::call_run()+0x9f
V  [libjvm.so+0xcee76a]  thread_native_entry(Thread*)+0xda

Registers:
...
R12=0x0000000000400000, 
...

siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x0000000000400000
Looks like a GC crash when it tried to walk the memory that is remarkably similar to null (all zeroes), except one bit. This superficially looks like a bitflip. R12 register has the same value. Disassembly shows we are indeed crashing accessing through that register:
        movzbl  1(%rsi), %r10d                  # encoding: [0x44,0x0f,0xb6,0x56,0x01]
        testb   %r10b, %r10b                    # encoding: [0x45,0x84,0xd2]
        js      767                             # encoding: [0x0f,0x88,A,A,A,A]
                                        #   fixup A - offset: 2, value: 767-4, kind: FK_PCRel_4
        movq    (%r12), %r14                    # encoding: [0x4d,0x8b,0x34,0x24]  <---- crash here
        movq    %r14, %r8                       # encoding: [0x4d,0x89,0xf0]
        andl    $3, %r8d                        # encoding: [0x41,0x83,0xe0,0x03]
        cmpq    $3, %r8                         # encoding: [0x49,0x83,0xf8,0x03]
        je      3722                            # encoding: [0x0f,0x84,A,A,A,A]
...and disassembly suggests it is from here:

corretto-21/src/hotspot/share/gc/g1/g1ParScanThreadState.cpp

Lines 291 to 292 in 7a8c187

} else if (task.is_oop_ptr()) {

do_oop_evac(task.to_oop_ptr());

corretto-21/src/hotspot/share/gc/shared/taskqueue.hpp

Lines 634 to 636 in 7a8c187

bool is_oop_ptr() const {

return (raw_value() & (NarrowOopTag | PartialArrayTag)) == 0;

}

Given we have just yanked this from the task queue, it means this 0x0000000000400000 was read from memory.

So, sanity question: have you ran memtest on the affected machine recently?

I ran memtest this Tuesday

cgw2023 · 2024-04-23T01:27:37Z

other new log
hs_err_pid700955.log
hs_err_pid1079403.log

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hs_err [libjvm.so+0x7ec967] SIGSEGV #59

hs_err [libjvm.so+0x7ec967] SIGSEGV #59

cgw2023 commented Apr 16, 2024 •

edited

Loading

olivergillespie commented Apr 16, 2024

cgw2023 commented Apr 17, 2024

olivergillespie commented Apr 17, 2024

cgw2023 commented Apr 17, 2024

shipilev commented Apr 17, 2024 •

edited

Loading

cgw2023 commented Apr 18, 2024

cgw2023 commented Apr 23, 2024

hs_err [libjvm.so+0x7ec967] SIGSEGV #59

hs_err [libjvm.so+0x7ec967] SIGSEGV #59

Comments

cgw2023 commented Apr 16, 2024 • edited Loading

Describe the bug

To Reproduce

Expected behavior

Screenshots

Platform information

Additional context

olivergillespie commented Apr 16, 2024

cgw2023 commented Apr 17, 2024

olivergillespie commented Apr 17, 2024

cgw2023 commented Apr 17, 2024

shipilev commented Apr 17, 2024 • edited Loading

cgw2023 commented Apr 18, 2024

cgw2023 commented Apr 23, 2024

cgw2023 commented Apr 16, 2024 •

edited

Loading

shipilev commented Apr 17, 2024 •

edited

Loading