Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using SuperPMI with crossgen2 (aot). SIGSEGV. #38430

Closed
0xfk0 opened this issue Jun 26, 2020 · 11 comments
Closed

Using SuperPMI with crossgen2 (aot). SIGSEGV. #38430

0xfk0 opened this issue Jun 26, 2020 · 11 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@0xfk0
Copy link
Contributor

0xfk0 commented Jun 26, 2020

How can we use SuperPMI tool with crossgen2 (aot) ?

As I understood, we should not use environment variables COMPlus_AltJitName and COMPlus_AltJitNgen, because these variables analyzed only by zapper class (crossgen1), but instead crossgen2 provide --codegenopt comand line option. And where is no AltJitName option in Jit interface, instead we should use command option --jitpath, like this:

--jitpath=/home/sysop/bin/clr/libsuperpmi-shim-simple.so   --codegenopt "AltJitNgen=*"

Also we should provide following environment variables (these options used by libsuperpmi library):

env SuperPMIShimLogPath=/tmp/2 SuperPMIShimPath=/home/sysop/clr/libclrjit.so

Unfortunately, this doesn't work. Crossgen2 crashes with SIGSEGV.

Full command line:

env SuperPMIShimLogPath=/tmp/2 SuperPMIShimPath=/home/sysop/bin/clr/libclrjit.so COMPlus_AltJitNgen="*" gdb --args ~/bin/clr/corerun /home/sysop/dotnet-runtime/artifacts/bin/coreclr/Linux.x64.Checked/crossgen2/crossgen2.dll --jitpath=/home/sysop/bin/clr/libsuperpmi-shim-simple.so --codegenopt "AltJitNgen=*" -r:/home/sysop/bin/clr/System.*.dll -r:/home/sysop/bin/clr/Microsoft.*.dll -r:/home/sysop/bin/clr/mscorlib.dll -r:/home/sysop/bin/clr/netstandard.dll -O --map --dmgllog dgml --fulllog -o bug10.ni.dll bug10.dll

Additional diagnostics:

#0  sigsegv_handler (code=11, siginfo=0x7ffff7fd34b0, context=0x7ffff7fd3380)
    at /home/sysop/dotnet-runtime/src/coreclr/src/pal/src/exception/signal.cpp:511
#1  <signal handler called>
#2  0x00007fff4548397c in FindEnvVarValue (name=<optimized out>)
    at /home/sysop/dotnet-runtime/src/coreclr/src/pal/src/misc/environ.cpp:920
#3  EnvironGetenv (name=0x555555cc5e00 "HOME", copyValue=0)
    at /home/sysop/dotnet-runtime/src/coreclr/src/pal/src/misc/environ.cpp:974
#4  0x00007fff454838a3 in GetEnvironmentVariableA (lpName=0x555555cc5e00 "HOME", lpBuffer=0x0, nSize=0)
    at /home/sysop/dotnet-runtime/src/coreclr/src/pal/src/misc/environ.cpp:121
#5  0x00007fff45483d14 in GetEnvironmentVariableW (lpName=0x7fff454db77e u"HOME", lpBuffer=0x0, nSize=0)
    at /home/sysop/dotnet-runtime/src/coreclr/src/pal/src/misc/environ.cpp:214
#6  0x00007fff4542332a in GetEnvironmentVariableWithDefaultW (envVarName=0x7fff454db77e u"HOME",
    defaultValue=0x7fff454f1890 u".")
    at /home/sysop/dotnet-runtime/src/coreclr/src/ToolBox/superpmi/superpmi-shared/spmiutil.cpp:70
#7  0x00007fff453baa35 in SetDefaultPaths ()
    at /home/sysop/dotnet-runtime/src/coreclr/src/ToolBox/superpmi/superpmi-shim-simple/superpmi-shim-simple.cpp:29
#8  0x00007fff453baef2 in jitStartup (host=0x555555cc8190)
    at /home/sysop/dotnet-runtime/src/coreclr/src/ToolBox/superpmi/superpmi-shim-simple/superpmi-shim-simple.cpp:100
#9  0x00007fff7dd07b8a in ?? ()
#10 0xffffffffffffffff in ?? ()
#11 0x9abcdef012345678 in ?? ()
#12 0x00007ffff6bebc58 in vtable for InlinedCallFrame () from /home/sysop/bin/clr/libcoreclr.so
#13 0x00007fffffffb7a8 in ?? ()
#14 0x00007fff7dff0900 in ?? ()
...


(gdb) frame 2
#2  0x00007fff4548397c in FindEnvVarValue (name=<optimized out>)
    at /home/sysop/dotnet-runtime/src/coreclr/src/pal/src/misc/environ.cpp:920
920         for (int i = 0; palEnvironment[i] != nullptr; ++i)

(gdb) disassemble $pc
Dump of assembler code for function EnvironGetenv(char const*, BOOL):
  ...
   0x00007fff45483975 <+69>:    mov    0x2f4f4c(%rip),%rsi        # 0x7fff457788c8 <palEnvironment>
=> 0x00007fff4548397c <+76>:    mov    (%rsi),%rbx

(gdb) info registers
...
rsi            0x0      0

(gdb) p siginfo._sifields._sigfault.si_addr
$36 = (void *) 0x0

(gdb) x/llx 0x7fff457788c8
0x7fff457788c8 <palEnvironment>:        0x0000000000000000

(gdb) p palEnvironment
$37 = (char **) 0x55555578abc0

(gdb) info address palEnvironment
Symbol "palEnvironment" is static storage at address 0x7ffff6c6ee88.

(gdb) x/gx 0x7ffff6c6ee88
0x7ffff6c6ee88 <palEnvironment>:        0x000055555578abc0

(gdb) p *(char**)0x000055555578abc0@8
$40 = {
  0x55555578ae30 "LS_COLORS=no=00:fi=00:di=04:ln=36:pi=40;33:so=35:do=35:bd=40;33:cd=40;33:or=40;31:ex=32:*.c=33:*.C=33:*.cc=33:*.c++=33:*.cpp=33:*.CPP=33:*.h=33:*.H=33:*.hpp=33:*.HPP=33:*.s=33:*.S=33:*.ASM=33:*.asm=33"...,
  0x555555775930 "MC_SID=3447", 0x5555557764f0 "SSH_CONNECTION=106.210.109.248 25744 106.109.128.209 22",
  0x55555578b0f0 "LESSCLOSE=/usr/bin/lesspipe %s %s", 0x555555775240 "_=/usr/bin/env",
  0x555555778c90 "LANG=ru_RU.UTF-8", 0x555555775b50 "HISTCONTROL=ignoreboth",
  0x555555779b40 "SuperPMIShimPath=/home/sysop/bin/clr/libclrjit.so"}


(gdb) info files
...
0x00007ffff6c30580 - 0x00007ffff6c81920 is .bss in /home/sysop/bin/clr/libcoreclr.so
0x00007fff45760050 - 0x00007fff45783a00 is .bss in /home/sysop/bin/clr/libsuperpmi-shim-simple.so

As you can see there is two instances of palEnvironment variable, each one belongs to different libraries (libcoreclr.so, as I guess, and libsuperpmi-shim-simple.so). And second one isn't initialized.

I think there is some issue with dynamic library loading. SuperPMI works fine with crossgen1 or coreclr itself, the issue exists only with crossgen2.

The issue is reproduced on x64 and ARM platforms.

category:eng-sys
theme:super-pmi
skill-level:intermediate
cost:medium

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-crossgen2-coreclr untriaged New issue has not been triaged by the area owner labels Jun 26, 2020
@0xfk0
Copy link
Contributor Author

0xfk0 commented Jun 26, 2020

@jkotas, @alpencolt.

@viewizard -- I want to ask you for help, how two copies of the PAL can work within one process?

@jkotas
Copy link
Member

jkotas commented Jun 26, 2020

@dotnet/jit-contrib Does superpmi work on Linux?

@CarolEidt
Copy link
Contributor

Does superpmi work on Linux?

Yes, it does. I've used it on both x64 and ARM64 Linux.

@BruceForstall
Copy link
Member

I'll add that SuperPMI runs on any platform supported by .NET Core, and is built and unit tested in Pri-0 (innerloop) CI testing.

SuperPMI collection is designed so you can set a few environment variables and then run any set of managed apps. If crossgen2 breaks that model and requires altering crossgen2 command lines to allow collection, that is a problem: SuperPMI shouldn't need to know anything about what things will be run and shouldn't have to alter their invocations.

Minimally, it seems that src\coreclr\src\ToolBox\superpmi\readme.md should be updated with directions appropriate for crossgen2, and possibly src\coreclr\scripts\superpmi.py should be taught new things.

@dotnet/crossgen-contrib

@MichalStrehovsky
Copy link
Member

SuperPMI collection is designed so you can set a few environment variables and then run any set of managed apps. If crossgen2 breaks that model and requires altering crossgen2 command lines to allow collection, that is a problem: SuperPMI shouldn't need to know anything about what things will be run and shouldn't have to alter their invocations

Could this be clashing because we have two instances of the JIT in the process? Crossgen2 is a managed process so there's a JIT that is compiling crossgen2 itself just-in-time and there's another JIT that crossgen2 is using as a code generator to compile stuff.

@BruceForstall
Copy link
Member

Looks like there's some issue initializing the PAL when loading libsuperpmi-shim-simple.so. Did DllMain() get called (with DLL_PROCESS_ATTACH)?

BTW, @viewizard I don't think we ever test this; do you want to use libsuperpmi-shim-collector.so instead, to do a collection?

@BruceForstall
Copy link
Member

Could this be clashing because we have two instances of the JIT in the process?

Depending how how the JIT (which in the SuperPMI case is a shim) and the AltJit and SuperPMI variables are handled, perhaps.

@jkotas
Copy link
Member

jkotas commented Jun 26, 2020

The SIGENV crash that you are seeing may be caused by a regression that was fixed by #38254 . Do you see this SEGENV after this fix?

@0xfk0
Copy link
Contributor Author

0xfk0 commented Jul 3, 2020

No, #38254 doesn't changes anything.

The main problem, is that DllMain function isn't called, then crossgen2 loads JIT library with call to NativeLibrary.Load. I have prepared PR which fixes it: #38746.

Unfortunately this doesn't help much:

  1. libsuperpmi-shim-simple.so always crashed with the following backtrace:
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007ffff6cc2801 in __GI_abort () at abort.c:79
#2  0x00007ffff76b5957 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007ffff76bbab6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007ffff76bbaf1 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007ffff76bbd79 in __cxa_rethrow () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007fffeccf40be in jitNativeCode (methodHnd=0x420000, classPtr=0x420008, compHnd=0x7fff2fffdf68,
    methodInfo=0x7fff2fffe1e8, methodCodePtr=0x7fff2fffdef8, methodCodeSize=0x7fff2fffe1c8,
    compileFlags=0x7fff2fffdf10, inlineInfoPtr=0x0) at /home/sysop/dotnet-runtime/src/coreclr/src/jit/compiler.cpp:6795
#7  0x00007fffeccfd656 in CILJit::compileMethod (this=<optimized out>, compHnd=0x7fff2fffdf68,
    methodInfo=0x7fff2fffe1e8, flags=<optimized out>, entryAddress=0x7fff2fffe1d0, nativeSizeOfCode=0x7fff2fffe1c8)
    at /home/sysop/dotnet-runtime/src/coreclr/src/jit/ee_il_dll.cpp:274
#8  0x00007fff4528fe05 in interceptor_ICJC::compileMethod (this=<optimized out>, comp=<optimized out>, info=0x0,
    flags=4140568215, nativeEntry=0x0, nativeSizeOfCode=0x7fff2fffd600)
    at /home/sysop/dotnet-runtime/src/coreclr/src/ToolBox/superpmi/superpmi-shim-simple/icorjitcompiler.cpp:21
#9  0x00007fff45661836 in JitCompileMethod (ppException=0x7fff2fffe1d8, pJit=0x7fff2800c920,
    thisHandle=0x7fff2fffe1e0, callbacks=0x7fff2800dfe0, methodInfo=0x7fff2fffe1e8, flags=4294967295,
    entryAddress=0x7fff2fffe1d0, nativeSizeOfCode=0x7fff2fffe1c8)
    at /home/sysop/dotnet-runtime/src/coreclr/src/tools/aot/jitinterface/jitwrapper.cpp:86
#10 0x00007fff7e12f794 in ?? ()
#11 0x00007fff2fffe1d0 in ?? ()
#12 0x00007fff2fffe1c8 in ?? ()
#13 0x9abcdef012345678 in ?? ()
#14 0x00007ffff6c0b1b8 in vtable for InlinedCallFrame () from /home/sysop/bin/clr/libcoreclr.so
#15 0x00007fff2fffec98 in ?? ()
#16 0x00007fff7dfd6d80 in ?? ()
  1. libsuperpmi-shim-collector.so not crashed, but unable to compile any functions with the following error:
Warning: Method `[bug10]bug10.Program.Main(string[])` was not compiled because: [TEMPORARY EXCEPTION MESSAGE] ClassLoadGeneral: System.RuntimeArgumentHandle, System.Private.CoreLib, Version=5.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e

When crossgen2 used without --jitpath option (so it directly loads libclrjit.so but not libsuperpmi-shim-collector.so) where is no error messages and all function compiled normally.

Can anybody help me, why error listed above happens, how can I discover the error resason?

@mangod9 mangod9 added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI and removed area-crossgen2-coreclr labels Sep 2, 2020
@mangod9 mangod9 added this to the 6.0.0 milestone Sep 2, 2020
@mangod9 mangod9 removed the untriaged New issue has not been triaged by the area owner label Sep 2, 2020
@mangod9
Copy link
Member

mangod9 commented Sep 2, 2020

Moving to codegen per the referenced issue: #41639

@BruceForstall BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020
@BruceForstall BruceForstall removed the JitUntriaged CLR JIT issues needing additional triage label Nov 7, 2020
@sandreenko
Copy link
Contributor

It does not fail with sigsegv anymore, the usability is tracked by #41639

@ghost ghost locked as resolved and limited conversation to collaborators Dec 22, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

No branches or pull requests

9 participants