Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compiler seg faults while building Linux kernel #42

Open
connorkuehl opened this issue Mar 11, 2019 · 13 comments
Open

Compiler seg faults while building Linux kernel #42

connorkuehl opened this issue Mar 11, 2019 · 13 comments
Labels
bug Something isn't working

Comments

@connorkuehl
Copy link

Kees found this in testing.

I pulled latest changes from LLVM/Clang and applied the ASM goto series. Also applied Randstruct patches.

With Randstruct disabled the kernel builds fine.

With Randstruct enabled the compiler will segfault.

I find this part of the stack trace particularly interesting:

 #4 0x0000000003a4df10 clang::TagType::getDecl() const (/home/kuehlcon/src/git/llvm-project/build/bin/clang-9+0x3a4df10)
 #5 0x0000000001fdf305 (anonymous namespace)::ConstStructBuilder::Finalize(clang::QualType) (/home/kuehlcon/src/git/llvm-project/build/bin/clang-9+0x1fdf305)

But when I look at that method nothing is glaringly obviously wrong about it. I think there's got to be some corruption in the DeclContext as a result of our rearranging.

Stack dump:
0.	Program arguments: /home/kuehlcon/src/git/llvm-project/build/bin/clang-9 -cc1 -triple x86_64-unknown-linux-gnu -S -disable-free -disable-llvm-verifier -discard-value-names -main-file-name init_task.c -mrelocation-model static -mthread-model posix -fno-delete-null-pointer-checks -mllvm -warn-stack-size=2048 -relaxed-aliasing -fmath-errno -masm-verbose -no-integrated-as -mconstructor-aliases -fuse-init-array -mcode-model kernel -target-cpu x86-64 -target-feature +retpoline-indirect-calls -target-feature +retpoline-indirect-branches -target-feature -sse -target-feature -mmx -target-feature -sse2 -target-feature -3dnow -target-feature -avx -target-feature -x87 -target-feature +retpoline-external-thunk -disable-red-zone -dwarf-column-info -debugger-tuning=gdb -momit-leaf-frame-pointer -coverage-notes-file /home/kuehlcon/src/git/linux/init/init_task.gcno -nostdsysteminc -nobuiltininc -resource-dir /home/kuehlcon/src/git/llvm-project/build/lib/clang/9.0.0 -dependency-file init/.init_task.o.d -MT init/init_task.o -sys-header-deps -isystem /home/kuehlcon/src/git/llvm-project/build/lib/clang/9.0.0/include -include ./include/linux/kconfig.h -include ./include/linux/compiler_types.h -I ./arch/x86/include -I ./arch/x86/include/generated -I ./include -I ./arch/x86/include/uapi -I ./arch/x86/include/generated/uapi -I ./include/uapi -I ./include/generated/uapi -D __KERNEL__ -D CONFIG_AS_CFI=1 -D CONFIG_AS_CFI_SIGNAL_FRAME=1 -D CONFIG_AS_CFI_SECTIONS=1 -D CONFIG_AS_FXSAVEQ=1 -D CONFIG_AS_SSSE3=1 -D CONFIG_AS_AVX=1 -D CONFIG_AS_AVX2=1 -D CONFIG_AS_AVX512=1 -D CONFIG_AS_SHA1_NI=1 -D CONFIG_AS_SHA256_NI=1 -D KBUILD_BASENAME="init_task" -D KBUILD_MODNAME="init_task" -O2 -Wall -Wundef -Werror=strict-prototypes -Wno-trigraphs -Werror-implicit-function-declaration -Werror=implicit-int -Wno-format-security -Wno-sign-compare -Wno-format-invalid-specifier -Wno-gnu -Wno-address-of-packed-member -Wno-tautological-compare -Wno-unused-const-variable -Wdeclaration-after-statement -Wvla -Wno-pointer-sign -Werror=date-time -Werror=incompatible-pointer-types -Wno-initializer-overrides -Wno-unused-value -Wno-format -Wno-sign-compare -Wno-format-zero-length -Wno-uninitialized -std=gnu89 -fno-dwarf-directory-asm -fdebug-compilation-dir /home/kuehlcon/src/git/linux -ferror-limit 19 -fmessage-length 0 -fwrapv -stack-protector 2 -mstack-alignment=8 -fno-builtin-bcmp -fwchar-type=short -fno-signed-wchar -fobjc-runtime=gcc -fno-common -fdiagnostics-show-option -vectorize-loops -vectorize-slp -o /tmp/init_task-bbbe6b.s -x c init/init_task.c 
1.	<eof> parser at end of file
2.	Per-file LLVM IR generation
3.	init/init_task.c:17:29: Generating code for declaration 'init_signals'
/tmp/vdso32-setup-a99b3b.s: Assembler messages:
/tmp/vdso32-setup-a99b3b.s:146: Error: invalid operands (.data..read_mostly and *ABS* sections) for `&'
clang-9: error: assembler command failed with exit code 1 (use -v to see invocation)
make[3]: *** [arch/x86/entry/vdso/vdso32-setup.o] Error 1
make[3]: *** Waiting for unfinished jobs....
 #0 0x0000000001e2ee94 PrintStackTraceSignalHandler(void*) (/home/kuehlcon/src/git/llvm-project/build/bin/clang-9+0x1e2ee94)
 #1 0x0000000001e2cd1e llvm::sys::RunSignalHandlers() (/home/kuehlcon/src/git/llvm-project/build/bin/clang-9+0x1e2cd1e)
 #2 0x0000000001e2f278 SignalHandler(int) (/home/kuehlcon/src/git/llvm-project/build/bin/clang-9+0x1e2f278)
 #3 0x00007f8c39297890 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x12890)
 #4 0x0000000003a4df10 clang::TagType::getDecl() const (/home/kuehlcon/src/git/llvm-project/build/bin/clang-9+0x3a4df10)
 #5 0x0000000001fdf305 (anonymous namespace)::ConstStructBuilder::Finalize(clang::QualType) (/home/kuehlcon/src/git/llvm-project/build/bin/clang-9+0x1fdf305)
 #6 0x0000000001fdc9d4 clang::StmtVisitorBase<std::add_pointer, (anonymous namespace)::ConstExprEmitter, llvm::Constant*, clang::QualType>::Visit(clang::Stmt*, clang::QualType) (/home/kuehlcon/src/git/llvm-project/build/bin/clang-9+0x1fdc9d4)
 #7 0x0000000001fdaeda clang::CodeGen::ConstantEmitter::tryEmitPrivate(clang::Expr const*, clang::QualType) (/home/kuehlcon/src/git/llvm-project/build/bin/clang-9+0x1fdaeda)
 #8 0x0000000001fdbff8 clang::CodeGen::ConstantEmitter::tryEmitPrivateForMemory(clang::Expr const*, clang::QualType) (/home/kuehlcon/src/git/llvm-project/build/bin/clang-9+0x1fdbff8)
 #9 0x0000000001fe3440 (anonymous namespace)::ConstStructBuilder::Build(clang::InitListExpr*) (/home/kuehlcon/src/git/llvm-project/build/bin/clang-9+0x1fe3440)
#10 0x0000000001fdc9c3 clang::StmtVisitorBase<std::add_pointer, (anonymous namespace)::ConstExprEmitter, llvm::Constant*, clang::QualType>::Visit(clang::Stmt*, clang::QualType) (/home/kuehlcon/src/git/llvm-project/build/bin/clang-9+0x1fdc9c3)
#11 0x0000000001fdadc5 clang::CodeGen::ConstantEmitter::tryEmitPrivateForVarInit(clang::VarDecl const&) (/home/kuehlcon/src/git/llvm-project/build/bin/clang-9+0x1fdadc5)
#12 0x0000000001fdbf10 clang::CodeGen::ConstantEmitter::tryEmitForInitializer(clang::VarDecl const&) (/home/kuehlcon/src/git/llvm-project/build/bin/clang-9+0x1fdbf10)
#13 0x000000000204b84f clang::CodeGen::CodeGenModule::EmitGlobalVarDefinition(clang::VarDecl const*, bool) (/home/kuehlcon/src/git/llvm-project/build/bin/clang-9+0x204b84f)
#14 0x000000000204533b clang::CodeGen::CodeGenModule::EmitGlobalDefinition(clang::GlobalDecl, llvm::GlobalValue*) (/home/kuehlcon/src/git/llvm-project/build/bin/clang-9+0x204533b)
#15 0x000000000203d38f clang::CodeGen::CodeGenModule::EmitDeferred() (/home/kuehlcon/src/git/llvm-project/build/bin/clang-9+0x203d38f)
#16 0x000000000203c687 clang::CodeGen::CodeGenModule::Release() (/home/kuehlcon/src/git/llvm-project/build/bin/clang-9+0x203c687)
#17 0x00000000028988b4 (anonymous namespace)::CodeGeneratorImpl::HandleTranslationUnit(clang::ASTContext&) (/home/kuehlcon/src/git/llvm-project/build/bin/clang-9+0x28988b4)
#18 0x0000000002896157 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (/home/kuehlcon/src/git/llvm-project/build/bin/clang-9+0x2896157)
#19 0x00000000030278e3 clang::ParseAST(clang::Sema&, bool, bool) (/home/kuehlcon/src/git/llvm-project/build/bin/clang-9+0x30278e3)
#20 0x000000000241ade7 clang::FrontendAction::Execute() (/home/kuehlcon/src/git/llvm-project/build/bin/clang-9+0x241ade7)
#21 0x00000000023c20a8 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/home/kuehlcon/src/git/llvm-project/build/bin/clang-9+0x23c20a8)
#22 0x00000000024ae245 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/home/kuehlcon/src/git/llvm-project/build/bin/clang-9+0x24ae245)
#23 0x0000000000920c94 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/home/kuehlcon/src/git/llvm-project/build/bin/clang-9+0x920c94)
#24 0x000000000091ef68 main (/home/kuehlcon/src/git/llvm-project/build/bin/clang-9+0x91ef68)
#25 0x00007f8c37f40b97 __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21b97)
#26 0x000000000091c42a _start (/home/kuehlcon/src/git/llvm-project/build/bin/clang-9+0x91c42a)
clang-9: error: unable to execute command: Segmentation fault (core dumped)
clang-9: error: clang frontend command failed due to signal (use -v to see invocation)
clang version 9.0.0 (git@github.com:clang-randstruct/llvm-project.git deb4b7d7f012ad7bf83c988c1979eb746ac5dc6d)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/kuehlcon/src/git/llvm-project/build/bin
clang-9: note: diagnostic msg: PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script.
clang-9: note: diagnostic msg: 
********************

PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
clang-9: note: diagnostic msg: /tmp/init_task-728893.c
clang-9: note: diagnostic msg: /tmp/init_task-728893.sh
clang-9: note: diagnostic msg: 
@connorkuehl connorkuehl added the bug Something isn't working label Mar 11, 2019
@connorkuehl
Copy link
Author

connorkuehl commented Mar 11, 2019

Here's a setup procedure:

  1. Switch to our rfcv2 branch
  2. Merge the asm-goto branch into it (git merge --no-ff asm-goto -- you might need to check this branch out locally or prepend origin/ to the branch name) OR rebase rfcv2 onto asm-goto.
  3. If this is your first time building release, clear out your build folder rm -rf /home/your-user/path/to/llvm-project/build/*
  4. Go into your clean build folder cd /home/your/path/to/llvm-project/build
  5. Generate the Makefile cmake -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS='clang;lld' ../llvm
  6. Build Clang make -j $(nproc)
  7. Make a directory somewhere unrelated to our LLVM repository (mkdir /home/user/kernels)
  8. cd /home/user/kernels
  9. Clone Kees's Linux tree (git clone --single-branch --branch kssp/clang/randstruct https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git)
  10. cd /where/you/cloned/the/linux/repo/linux
  11. Install the packages required to build the kernel -- On Ubuntu you can get by with build-essential, libssl-dev, bison, flex, bc
  12. Make your kernel config (make defconfig should suffice for our purposes)
  13. make -j $(nproc) CC=/path/to/our/llvm/repo/llvm-project/build/bin/clang

@connorkuehl
Copy link
Author

connorkuehl commented Mar 12, 2019

Ty->getAs<RecordType>() is returning a nullptr when randstruct is active but I'm not sure why.

I confirmed this by printing out a message when that method returns a nullptr.

Commenting out the call to reorganize results in a functioning compiler but introduces a compiler error in one of the kernel modules

Commenting out the call to commit allows the kernel to compile successfully but that's less than ideal since our new order will not be committed to the DeclContext.

I suspect we are destroying pieces of the DeclContext by the way we update it. I sent an e-mail to the cfe-dev mailing list asking for advice on how we can safely manipulate the DeclContext's order of a RecordDecl's fields, but no response yet.

I've also tried updating our commit method to remove the fields from the RecordDecl's context with removeDecl and then adding them in the new order with addDecl but we end up experiencing the same segmentation fault anyways.

@connorkuehl
Copy link
Author

connorkuehl commented Mar 17, 2019

I built the kernel with make -k with Clang build with debug symbols so that it would try to build as much of the kernel as it possibly could so that we can see what other structures cause the compiler to abort and hopefully see what they all have in common. The names of the structures suggest that the compiler fails to emit code for structures that are used primarily as dispatch tables.

I sampled 5 of them randomly. The ones I randomly selected are declaring instances of a struct like so:

static const struct <name of struct> <name of instance> = { ...snipped inits... }

Code generation for these structures are causing the Clang compiler to abort.

The compiler aborts due to failing this assertion:

clang-9: /home/connor/src/llvm-project/llvm/include/llvm/Support/Casting.h:254: typename cast_retty<X, Y *>::ret_type llvm::cast(Y *) [X = llvm::PointerType, Y = llvm::Type]: Assertion `isa(Val) && "cast() argument of incompatible type!"' failed.

I'm unable to reproduce this on our simple testing C program with something like this:

struct ptrs {
        int (* hi)();
        int (* bye)();
        int (* why)();
} __attribute__((randomize_layout));

static const struct ptrs global = {
        .hi = 0,
        .bye = 0,
        .why = 0
};

Here's the list of structures (actually, instances of the structures):

  • abi_root_table2
  • init_uts_ns
  • init_signals
  • severities_coverage_fops
  • microcode_fops
  • mtrr_fops
  • irq_affinity_proc_fops
  • pm_qos_power_fops
  • snapshot_fops
  • init_sync_kiocb
  • autofs_root_operations
  • debugfs_noop_file_operations
  • pty_root_table
  • _dev_ioctl_fops
  • efivarfs_file_operations
  • msr_fops
  • cpuid_fops
  • itmt_root_table
  • init_ipc_ns
  • posix_clock_file_operations
  • tk_debug_sleep_time_fops
  • mqueue_file_operations
  • hugetlbfs_file_operations
  • key_type_dead
  • key_type_keyring
  • tracing_stat_fops
  • key_type_request_key_auth
  • ftrace_formats_fops
  • key_type_user
  • key_sysctls
  • sel_load_ops
  • uprobe_events_ops
  • sysctl_base_table
  • lockd_end_grace_operations
  • init_user_ns
  • nfs_dir_operations
  • init_cred
  • user_table
  • bsg_fops
  • futex_q_init
  • nfs_cb_sysctl_root
  • acpi_system_wakeup_device_fops
  • nfs4_file_operations
  • snd_hwdep_f_ops
  • nfs4_cb_sysctl_root
  • inotify_table
  • proc_pid_cmdline_ops
  • proc_fd_operations
  • proc_cpuinfo_operations
  • proc_stat_operations
  • cdrom_root_table
  • proc_ns_dir_operations
  • sysctl_mount_point
  • rbtree_fops
  • proc_net_seq_fops
  • regmap_name_fops
  • proc_kcore_operations
  • component_devices_fops
  • proc_kmsg_operations
  • proc_kpagecount_operations
  • sys_table
  • rps_sock_flow_sysctl
  • ramfs_file_operations
  • tracefs_file_operations
  • generic_ro_fops
  • misc_fops
  • hpet_fops
  • nvram_fops
  • bad_file_ops
  • init_sync_kiocb
  • rt_cache_seq_fops
  • dma_buf_fops
  • ns_file_operations
  • proc_mounts_operations
  • epoll_table
  • signalfd_fops
  • debugfs_ei_ops
  • timerfd_fops
  • eventfd_fops
  • aio_ring_fops
  • misc_format
  • script_format
  • elf_format
  • compat_elf_format
  • udplite_prot
  • i915_forcewake_fops
  • proc_tcp_available_congestion_control
  • ntp_servers_seq_fops
  • xfrm4_policy_table
  • nf_ct_frag6_sysctl_table
  • udplitev6_prot
  • ip6_frags_ctl_table
  • pingv6_prot
  • ipv6_rotable
  • xfrm6_policy_table
  • hidraw_ops
  • generic_sysctl_table
  • tcp_sysctl_table
  • udp_sysctl_table
  • drm_debugfs_fops
  • drm_crtc_crc_control_fops
  • packet_proto
  • sunrpc_table
  • machine_cred
  • evdev_fops
  • unix_table
  • mac_hid_root_dir
  • _ctl_fops
  • rpc_proc_fops
  • socket_file_ops
  • xfrm_table
  • proc_bus_pci_operations
  • pps_cdev_fops
  • trace_fops
  • rtc_dev_fops
  • usblp_fops
  • scsi_root_table
  • proc_scsi_fops
  • vcs_fops
  • usb_fops
  • usbfs_devices_fops
  • sg_fops
  • mon_fops_stat
  • mon_fops_text_t
  • mon_fops_binary
  • xhci_ring_fops

@donhinton
Copy link

I'm unable to reproduce this on our simple testing C program with something like this:

struct ptrs {
        int (* hi)();
        int (* bye)();
        int (* why)();
} __attribute__((randomize_layout));

static const struct ptrs global = {
        .hi = 0,
        .bye = 0,
        .why = 0
};

I looked at a few examples from the list you provided, and though I didn't look at all of them, they seem to include nested structures as members. Have you tried to reproduce the problem using with a nested structure test? Perhaps even cut-n-paste an actual instance that fails?

@connorkuehl
Copy link
Author

connorkuehl commented Mar 18, 2019

Have you tried to reproduce the problem using with a nested structure test?

Yes, it compiles cleanly and runs.

Perhaps even cut-n-paste an actual instance that fails?

Not yet, but I'll start looking through for a structure that fails but that isn't encumbered by a lot of "scaffolding"

@donhinton
Copy link

donhinton commented Mar 20, 2019

Although I haven't quite figured it out, I believe your problem is in bool ConstStructBuilder::Build(InitListExpr *ILE). See ElementNo in the loop...

The initializer list is created during parsing and uses the pre-randomized order. Perhaps a call to getASTRecordLayout() before creating the initializer list would solve the problem.

I need to recompile with your code in order to test this, but if this is the case, you should be able to print out an initialized structure with incorrect values without the need to make it assert or crash.

@donhinton
Copy link

Verified that initialization is busted when reordering fields. Here's a little program that demonstrates the problem. Try it with/without the randomize_layout attribute to see the difference.

I'll dig a little deeper, but at this point, you should probably go back to the list and ask for advice on how to deal with designator lists while reordering fields. Richard Smith is probably the right guy to ask.

#include <iostream>

struct foo {
  double i;
  int j;
} __attribute__((randomize_layout));

int main(int argc, char *argv[]) {
  foo f = { .i=1.1, .j=2};
  std::cout << "1.1 = " << f.i << "\n2 = " << f.j << std::endl;
  return 0;
}

@donhinton
Copy link

donhinton commented Mar 20, 2019

I don't think messing around with the initializer list creation is a good idea -- probably couldn't permute the field order there anyway. But, if you could attach a map (old_index->new->index) to the RecordDecl when you reorder it, codegen could use the map to match them back up.

The key here is that while designator lists can be sparse, the holes get filled in via InitListChecker::FillInEmptyInitializations(), so you can safely iterate over the fields and grab the correct initializer from the map.

@connorkuehl
Copy link
Author

Wow, thank you so much for all your help! We'll circle back to implement this. Sincerely appreciate your help diagnosing this. You rock.

@donhinton
Copy link

You're more than welcome. Looking forward to this patch landing...

@donhinton
Copy link

Btw, I'm not an authority on which approach should or should not be taken, so before spending a lot of time implementing something, I'd encourage you to seek advice from the list and/or D59254.

Again, looking forward to this landing...

@connorkuehl
Copy link
Author

Btw, I'm not an authority on which approach should or should not be taken

Don't worry, same here.

I'd encourage you to seek advice from the list and/or D59254.

Absolutely. :-)

@da-x
Copy link

da-x commented Jun 4, 2019

Calling Context.getASTRecordLayout(Record) at the end of Sema::ActOnFields is one way to fix this, though not sure it's complete, but it should get most of the kernel compilation units going. Although, I guess it may be better to take only the relevant randomization bits from getASTRecordLayout and not the whole thing, and if we intend to support C++, remember the declaration order somehow so that initializers list would function properly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants