-
Notifications
You must be signed in to change notification settings - Fork 120
Rust to C is required #179
Comments
This is a concern, yes. One approach I've been thinking about is using DWARF debuginfo or something to verify that structure layout and alignment and function ABIs are the same between the C components and the Rust components. That has the significant advantage of letting us use the real Rust compiler - mrustc doesn't have a borrow checker, it trusts that you've done a borrow check run with real rustc, so I'm a little hesitant in terms of robustness to rely on it for real use. The other simple approach, of course, is to say that this is only supported when compiling the kernel with clang, and it's merely best-effort otherwise. Since bindgen uses libclang and rustc itself uses LLVM just like Clang does, it should interpret C code the exact same way. If you make sure you're using the libclang and LLVM that compiled your kernel, you should have bug-for-bug compatibility.
FWIW, my understanding is this is no longer true, the problems happened primarily around the gcc 2.95 era - see #20 (comment) (Broadly, yes, the concern is compiler-version-specific bugs.)
Oh, I'm really curious about this - having bindgen be able to automatically tag things as "this is protected with this lock" or "this must only be dereferenced inside an RCU critical section" would be very useful. |
There is really no way that is dependable to verify the structure layout of the Linux kernel this is a security requirement that everything you can use to attempt to verify the Linux kernel structures is classed as internal and can be removed and changed without notice.
This exist limitation of mrust does not really have to remain. Why I said first step port mrust to rust so it can rust to c convert itself. This is stage 1 of this process. In my eyes rust to c should come the same as rust to binary. Just at times like building Linux kernel modules a C output is more useful than a binary one.
Even this plan is long term path to trouble. So will rustc support C structure randomisation plugins in it LLVM backend??? If not it not suitable for building Linux kernel modules going forwards. Rust to C its not really a optional feature particularly when you are looking at what the kernel security guys want to-do. Rust to C means what ever the security guys do to the C compiler like having a plugin randomise structures is no longer your problem. rustc to use LLVM to produce a binary results in structure randomisation/security padding being your problem.
The Linux kernel mailing list has a long list of different gcc bugs that have cause memory alignment issues mostly breaking the Nvidia binary blob. It did not stop at the 2.95 era. The lack of Linux kernel binary modules has meant the number of effected things have been quite light.
Really I would like to have a standard formatting to put sparse tags in C that when sparse runs over the C results in rust borrow checker being run to find faults in the C code. This is also why we need rust to C in the short term. Sparse in the Linux kernel is designed to check C files it cannot process rust. https://sparse.wiki.kernel.org/index.php/Main_Page Porting sparse to rust will be a hell lot of work. Rust to C at least in the translation stage for the Linux kernel is the best path. This allows security guys to keep on working on the C compiler plugins to alter how run-time Linux kernel appears without shooting what you are doing all the time. For rust to become a Linux kernel module writing language it has to have dependable building with upstream kernel with what they are likely to change. Rust to C gets you lot closer than rustc to binary does. |
Do you have a source for this? My understanding is that debuginfo for use with things like perf/ftrace/eBPF/gdb/etc. needs to have a complete understanding of all the structures in the kernel (or at least all the structures used as APIs to modules, which is all we care about), otherwise they'd pick up incorrect data. It is true that the kernel API and ABI isn't guaranteed across versions of the kernel. But we're not doing that, we're building for a specific kernel.
My understanding of randstruct is this is not true, as long as you have the same seed you're fine. From the LWN article you linked: "...to build third party or out-of-tree kernel modules against a kernel with randomized structures, the randomization seed is required. Therefore, those distributing kernel binaries (such as Linux distributions) will need a way to expose the randomization seed to users that install the kernel headers or other kernel development package...." The seed is stored in a file exposed to modules (scripts/gcc-plugins/randomize_layout_seed.h), and from reading the randstruct source (scripts/gcc-plugins/randomize_layout_plugin.c, see the
Yes, see http://lists.llvm.org/pipermail/cfe-dev/2019-March/061607.html . Notably, the GCC randstruct plugin is open source, there's nothing to fundamentally prevent the LLVM implementation from randomizing structures in the same way. It's not like we don't know how GCC does it. And GCC has to do it in a reproducible way, otherwise you couldn't build C kernel modules for a randstruct kernel because they'd pick a different order. (It is true that we don't support compiling against randstruct kernels right now, because support hasn't landed in LLVM. However, that is in fact one reason I've been curious about debuginfo, because debuginfo will have the randomized layout for a particular structure. So we don't even need a compatible reimplementation of randstruct, we just need to ask the existing kernel what order it picked.)
This seems to not be about a compiler bug, this is about buggy C code, if I'm reading it right. That is, it's not even that two compilers compiled it differently - any correct C compiler should have made the same optimization, given that input code, because it's an easy and obvious optimization to make. Do you have references to code that compiled incorrectly? (I do think that it's not-great that it's so easy to write incorrect code in C and not realize it and risk that one compiler will optimize it in the future in unexpected ways, but, uh, that's why we're here. :) ) Don't get me wrong, I agree we have been lucky so far in that everything has worked. And we absolutely need to document / justify why we think what we're doing is sound. But I think that matching another compiler's layout is a fundamentally solvable problem. There's no proprietary code involved. We know that compiling modules in C works, therefore all the information exists somewhere. It's just a matter of getting it. (I asked this question on Twitter recently and the consensus of the folks who replied was that there's no fundamental reason this shouldn't work, everyone just has anecdotal stories of compiler bugs. I am very much concerned about this problem, it's just that everything I've learned from digging into it over the past year+ is that it's a solvable problem.) Also, something occurred to me the other day: we already, in the wild, have a very common case of people compiling kernel modules with a different compiler than their kernel - DKMS. Since DKMS modules are built on the user's system, they're built with whatever compiler the user has, which is quite likely at a different patch release from whatever was on the build system of the original kernel. No Linux distro makes any attempt to provide / pin the compiler version of their kernels so users can use them for building modules. And Linux distros absolutely do fix (and, perhaps, introduce...) compiler bugs in compiler updates during stable releases. If our ABI compatibility story is "It works as well as existing third-party modules," I don't think I'll be unhappy. :) |
perf/ftrace/eBPF/gdb/etc. in fact can use ORC or DWARF at this stage with different patches. Due to debuginfo being classed as internal structure another developer could get the idea of changing it again if someone comes up with a better format for storing the information that performs better. This risks very quickly turning into sylvester trying to catch speedy gonzales. You avoid this chase problem just by being able to use the same compiler with same configuration as everything else.
You have no clue how deep this rabbit hole is.
Not all the randomising structure plugins are the same. You have to remember grsecurity has their own version. Closed source these days and undocumented. Please note the plugins bit was not a typo and LLVM was because I was direct referring to something LLVM side not gcc side I gave the randstruct plugin link of gcc to give a overview that you could look at that you can see what was done. The LLVM structure randomisation plugins I am referring to are basically a many closed source black boxs. There are multi different vendors who provide open and closed source structure randomisation plugins. I asked about the LLVM ones for a reason. There is a lot of work to get Linux kernel building with clang some of it is so more companies can provide closed source structure randomisation plugins. The problem attempting to make third party modules avoiding the compiler used to build the kernel is going to get way harder.
The C standard does not state how optimisations work. So same C compiler with different flags is allowed to-do the same optimisation in totally different ways. At times this results in totally different structure padding and memory alignments. Any correct compiler was not required to-do the same optimisation at all this when the pit of hell open up under you and basically eats your idea.
You have the wrong idea. Matching another compiler is a fundamentally solvable problem. But the Linux kernel allows those building kernels if they wish to make their compiler unique or closed source third party module is you process is screwed if it not using the same compiler in these cases. Because when building third party modules you are meant to use the same compiler as the kernel used and that information is recorded in the build process. Linux kernel in some cases can be built by closed source compiler so not just clang or gcc modified So the no proprietary code involved is in fact wrong. Yes the information exists but the layout information the compiler may be a compiler trade secret that you have to apply reverse engineering,
Asking on twitter is not asking people who have dealt with the problem. https://github.com/torvalds/linux/blob/master/Documentation/process/stable-api-nonsense.st The why not answer is straight in the Linux kernel source itself. Greg Kroah-Hartman in this old document gives overview of the fundamental problem. Its not just compiler bugs historic security hardened distributions also custom patched their compiler as well so modifying the binary output this altered padding and alignment of things and is not against the rules. Linux kernel does not have a stable ABI for kernel modules. This means the rules around ABI can be altered how ever a person building the kernel wants as as long as they document and provide the compiler and the configuration used you can build third party modules. People attempting to customise android kernels find themselves between rock and hard place a lot due to missing the compiler that was used to build the kernel.
That idea you had about DKMS is completely wrong. Reality is most DKMS modules look up extract the compiler used and match compiler. DKMS modules cannot find complier that was used to build the kernel display a error because there is a high risk of unstable results or just simply refuse to build at all. It is a rare case that people are compiling kernel modules with different compiler than their kernel was in fact built with. Linux kernel mailing list will not fix any fault caused by mixing compilers. As those attempting to get clang to build the Linux kernel found out Linux kernel source uses a lot of gcc extensions to C standard for memory alignments and other things.
https://www.kernel.org/doc/Documentation/kbuild/modules.txt Yes the compiler version and the package name of the compiler is in /proc/version of the current running kernel. There are ways of looking this up on non running kernels as well. |
People who make kernels cut to the bone matching their hardware exactly will have bpf and debugging stuff off completely in their kernel builds to increase performance. So debuginfo existing is optional in the Linux kernel. Sorry but you plan to get around the Linux kernel ABI unstable state does not exactly work. You will end up having to define particular Linux kernel configurations will have to be enabled and there are many people who will want to turn those options off. Rust to C would allows covering all build options even ones that you cannot solve out the Linux kernel ABI. |
Well, yes, but it's still much less work than the solutions you've proposed! Debuginfo changes rarely, and handling those changes is a significantly easier task than getting mrustc to parity with rustc. Also, debuginfo isn't actually an internal structure, it's an interoperability format. You can't usefully change how you emit debuginfo without sending a patch to gdb. And, worst case, we can always write a gdb script to dump the structure layout of the structures we bind, which means we don't have to actually parse the latest fancy debuginfo format ourselves. Or, you know, we can write a kernel module in C that does some compile-time assertions that the structure layout matches what we expect. Then no debuginfo is required, just the ability to compile (and not even load) C code for the target kernel.
We do not support loading modules into kernels built with proprietary or partially-proprietary compilers. That's outside the scope of this project. If it works for you, great; if it doesn't, we can't debug it, even if we emitted C. I agree that doing a Rust-to-C compilation with mrustc would be very interesting (I had already mentioned this in #112) but there's value in supporting other platforms in the meantime. BTW, an interesting project for you might be writing kernel modules in Nim, which compiles to C.
It is true that the C standard does not say this, but platforms have an ABI standard, which does specify these things, precisely. If you have an example of two versions of either gcc or clang (or one of each) for the same platform compiling the same, correct C code with different padding or alignment due to adjusted optimizations or other internals (and not due to specifying a command-line argument to intentionally change the ABI), please provide it.
I've maintained large fleets of machine using DKMS, but okay, let's pretend I've never used it. The link you provided demonstrates that a) DKMS only checks the major and minor version of the compiler, b) says it's only "recommended," not required, and c) there's an option to disable the check that many people do in practice. These hypothetical changes to compiler behavior (which you have yet to show a single example of) can happen easily within a patch release to a compiler, and distributions absolutely do ship patch releases. If there were problems, we would know about it already. Also, as it turns out, that code isn't in DKMS at all. That code is in the Nvidia driver:
(And I would bet money that that code was written before the GCC versioning scheme change in 5.x, i.e., the intention was to check only for what we call today the major version. An upgrade like 7.3.0 to 7.4.0 is the same scale of upgrade as 4.8.3 to 4.8.4 was. So it wouldn't have tripped the check anyway.)
I'm perfectly happy to not support these kernels either. This project does not have a goal of supporting every possible Linux kernel in existence. We're targeting the specific cases of relatively recent kernels built with gcc or clang, and at the moment we're only targeting x86_64, anyway, and only targeting gcc without plugins. I'm happy to restrict it even further to just kernels built with clang, if necessary. At that point we can check that we're using the same compiler version as the target kernel, and these problems simply don't exist. (And, in fact, we would support clang with proprietary randomization patches - just make sure you're passing the same patched libclang to bindgen, and it will do the same thing.) But I will need to see concrete evidence that gcc (without plugins) and clang can produce incompatible objects before doing so. |
No that is out of date. You use to have to send a patch to gdb when you changed debuginfo format these days you can change it with a python script for gdb. Same way you use the kernel debugging helpers python scripts with gdb. Due to the fact the kernel debugging helpers in python are with the Linux kernel these days a developer can change the debugging format and put the changes in the Linux kernel source without needing a gdb upstream change. I know I have pulled the next quote out of order
There is a bit of information missing from debuginfo even using a C written test module it will be hard to find. Like that sse4 one where items had to be placed on particular memory alignments. BPF gets away with using debuginfo because is not creating any structure new it is modifying. Debugging you are not creating things. There is information the compiler has from knowing how it will optimise things does have effects on how structures are created. Wrong padding values around what you create will cause failures. Basically debuginfo gives you the structure information. But you need as well alignment padding of structures the compiler decided on when it was not directed to match the optimisation flags it was using. data structure padding that Greg Kroah-Hartman mentions is you biggest problem. You do not get this out of debugging data simply or dependably. Extract with a C test module is not going to work either. Basically it would be disassemble and check all the uses of the structures or use the same compiler or extremely limit functionality.
If you output C with annotations so it was reversable to the rust item the next comes possible items icc provide a debug interface compadible gdb standard so you would be able to see what line of C a problem in a kernel module happened at with the reversible you would be able to say its X line of rust code.
I am not just talking about rust to c done in one direction. We need to done mirror, mrustc is only really start.
That needs a lot of work. Nim has garbage collected in kernel space you really want to avoid using any extra allocation system. Rust 1.0 where it avoid garbage collector is what makes it a very suitable language for kernel work. Trying to find a language that exports to C with the features like rust without stupidity of garbage collector is quite hard. nim also has the problem it been design to be a one direction conversion.
https://github.com/torvalds/linux/blob/master/Documentation/process/stable-api-nonsense.st
Same platform this is where you have gone wrong. Linux kernel does not mandate when building that you gcc uses the same platform settings as userspace. Those attempting to modify soc vendor provided kernels find this out out. Some soc vendors modify the gcc specs file when building the kernel. So gcc spitting out two different alignment does not have to be a command-line argument. Everything you can do in a command-line argument of gcc you can do in the specs file. Part of the problem here kernel space ABI is changed on some soc chips totally intentionally to be different to user-space. Items like that normally provide you with 2 compliers one for making user-space applications one for making kernel-space/ring 0 stuff. So you really do need if possible to find the right compiler with gcc it would be take the settings from the kernel build and the gcc specs file that built the kernel to make sure you had all the settings the compiler was working with and that is without custom optimisation patches.
The distributions put the compiler version and package in the /proc/version info. (gcc version 8.3.0 (Debian 8.3.0-21)) << That 8.3.0-21 on by debian tracks to exact package. Thjis exact information is not there for no reason. There have been different versions of gcc packages that a patch has either messed up specs or patch does not exact work out right. These does not stop those gcc versions from successfully building a working Linux kernel in all cases. Yes even cases where the gcc cannot build any userspace applications due to ABI being off. The problem is these issue do exist. They are not common ones. But a person could lose massive amounts of time attempting to debug something and its just because the X kernel they are using was built with Y slightly screwed balled compiler and your solution has not allowed for it. Nvidia warning that you choose to ignore means any bugs is then at your own risk not Nvidias.
I do lots of stuff outside x86_64 or in hardened x86_64 stuff. Yes it very common on your normal x86_64 to be using a complier with ABI set the same a userspace to build kernel. But there are many cases where its not. The compiler version right down to package is not there for no reason. You get away with ignoring it at the moment. In presentation I saw it said you were supporting a lot more than x86_64. Supporting platforms I work with in fact where not having c out is going to be a hell of a fight due to what different soc vendors do. |
Can you give an example of this happening (on non-x86_64 if needed)? |
@oiaohm @geofft Language for Rust -> C C -> Rust Tooling/Compiler |
Sparse int attribute((address_space(1)) x; Yes is using the C preprocessor to keep those code tidy is option. Generating C for Linux kernel would mean having to put in attribute tags of sparse so sparse is happy at min and most likely the #define stuff around attribute to keep readability.
attribute((rust_something(value))) on the C side should be usable. rust at start to make sure C compiler should not create it own attribute with same name. Basically once standard of what information need in attribute for C to rust perfectly the same can be used in export to C. So a perfect mirror.
Linux kernel sparse is explicit tagging static analyzer for C. Of course there is no reason why sparse processing over C files with rust attribute values could not be checking them long term with rust rules of usage. Yes it possible to add plugins to llvm and gcc to have them process extra attribute values as well. If there is a standard how rust information is written in C files bindgen could also be picking this stuff up so possibly make binding smaller by knowing where extra protections are not required. This is another reason for rust to C. Linux kernel developers add rules to sparse to check for things that have caused security problems. Rust might be type safe but is it address space safe as in kernel memory not being being exported to userspace without special processing that my example I gave was. Other rules like this can appear as well. Rust to C would allow rust generated code to be run past the same static checks as the all the kernel code has had. Rust only is very much the old saying of don't throw the baby out with the bath water. The C static analyzer sparse that the Linux kernel uses functionality for different things need to be applied to modules written in rust or the security faults sparse prevents will turn up in rust coded modules. |
Could you shortly elaborate how to move forward on this?
Is there tooling like tab-completion for sparse-code insertion, so it does not cost (a lot) extra to the developer?
Macro lines should have a special symbol in a comment/rule to indicate Sparse rules.
This looks ugly and forcing another language into the kernel will probably not work.
I believe that getting upstream would require this to a large degree. The problem is that you first need to enhance all C with the strong typing of Rust to make this work in Sparse for the things you use in modules.
Rust has alot more safety rules, which are build upon the (yet in C missing) strong typing system. Only if a 1:1 map looks feasible, one could apply some to C (as optional check).
Again, forcing another language probably will not work.
Sorry, I do not quite understand.
From the few documentation I understand that the semantic checks are only for to type-checks and I believe a semantic-enriched parse tree for use by other tools. The functionality of these other tools are basically reimplemented into Rust. TLDR; I like your enthusiasm, but I feel the road would look something like this for Linux Kernel-level upstream code. |
I think this issue isn't actionable for this project - we're not going to drop everything and go write a bunch of code for mrustc. I've tried to spin out the actionable items into two other issues: #203 covers asserting that Rust sees the ABI the same way as C and #204 covers trying to make use of sparse annotations as an enhancement. @oiaohm, if you want to expand the facilities of mrustc (or the LLVM C backend), I think that would be an extremely cool project but I think it's better done in a separate project. If you can find a specific case of us mis-compiling for some particular kernel built with an open-source compiler, then that's a bug in this project of course and should still be reported (with instructions on how to reproduce). But I don't think it makes sense to keep this open because there might be bugs. |
Basic overview.
You have been lucky at the moment but history of the Linux kernel has many examples of where the gcc has a bug and does a different memory alignment on something resulting in kernel module not working to completely crashing the kernel.
Linux kernel core and Linux kernel modules use to be two different tar.gz files build-able by different versions of gcc. Not any more it caused too many problems.
Number of compilers that can make Linux kernel binaries and every work allowing for minor compiler bugs the answer is 1.
You are currently using 2 compilers so are being lucky. Remember research places have paid at different times for high performance C compilers to built the Linux kernel. These are items you are unlikely to be able to test against.
Solution in my eyes is the follow.
https://www.kernel.org/doc/html/latest/dev-tools/sparse.html
Sparse does have method for adding extra typing information that a normal c compiler will ignore. C++ and objc both use #ifdef to hide stuff from the normal C build "__cplusplus" tag for C++ and objc for objc. So #ifdef rust could go into c files containing rust particular information. So I don't see any particularly reason why you cannot have rustified C that will covert to rust or rust that will not covert to rustified C.
Remember the issues rust stops programmers from doing are the same ones sparse are attempting to prevent c developers from doing. So why can we not make rust logic apply-able to C source like sparse does?
Finally not all C in the Linux kernel will be wise to covert to rust. You need at times evidence that X feature was provided by X company so X company cannot use their patents. Modify those C files yes but convert would be no.
Basically we need a bastard child of C and rust. That bastard child file format looks like C for C compilers but has all the metadata to perform rust validation of sanity and can auto convert to and from rust without screwing anything up.
Yes this bastard format would allow adding specialist rust stuff into the Linux kernel .h files so that you have less items to sync.
Generating a .h file from rust for C to use is asking for sync issue where the rust has been updated yet the generated .h has not. With Linux kernel you have the reverse someone patches the .h file and you generated rust interfaces are out of sync. We need interface file between rust and c to be 1 file only not the current 2. Rust to C with a C to Rust would allow a single interface file. Due to C being the older one the interface file has to be C.
This is why I would want to make the rust meta data part of Linux kernel sparse as well. So that someone updates a .h misses the meta data and sparse gets upset. This way upstream Linux kernel would be maintaining rust compatibility and gaining the rust ability to validate usage of things on C code.
Lets try to make this process win win. So those who remain developing in C with Linux kernel are having their code audited for faults better and in the process make it for those wanting to use rust dependable.
Big problem here we have duplication of compilers and of interface files both of these need to be solved.
The text was updated successfully, but these errors were encountered: