-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
memcmp may be miscompiled by GCC #823
Comments
AFAIU (and I'm not 100% sure) this affects comparisons with fixed byte arrays which include zero bytes. The check may then early terminate at the first zero byte.
In the tests we have many many |
I don't think it's explicit anywhere that the tag even needs to be a string, someone could use the timestamp of their product launch datetime, or their birthday as the tag, and that will contain zeros. (I have 0 idea if this is realistic or not, I do not know of products that use tagged hashes so don't know if they used string/int/bytes as tag) |
@elichai |
Is it sufficient to add a (I have verified that my tests in #822 pass when |
We could even include a self-test but I'm not sure about this. And I'm still not sure if shipping our function is just better, even though it's "extremely dumb" as to your adequate summary. Pragmatically, the implementation is trivial and will just work everywhere without any compiler detection magic etc. |
If we do that will we be able to detect indirect calls to memcmp (I'm thinking via other libc calls)? In that sense |
how does a |
Well we're not going to be able to re-build libc. If there is an indirect bug due to libc being compiled with gcc and its memcmp calls working incorrectly, that would be a bug in the resulting libc library, and nothing we can do about that (except minimizing how much we rely on libc). The compiler can in some cases "emit" memcmp calls automatically, though. I don't know if that's the case in our codebase, and I don't know if that's strictly for situations where a builtin wouldn't/can't be used - but if that is somehow subject to the same bug, having a custom memcmp function may not be enough. |
Parallel Bitcoin Core issue: bitcoin/bitcoin#20005 |
Oh sorry I didn't mean fixing libc. I mean that the compiler hypothetically inlining some call to libc that in turn calls memcmp, that then gets optimized. I'm not familiar enough with C to have anything in particular in mind, so maybe it just isn't a thing to worry about. TBH I was really thinking of something analogous to |
The compiler can't inline calls to libc, as it doesn't know what is in the called functions. What is possible is that some functionality is implemented in libc headers, through inline functions or other builtins, that directly call memcmp/__builtin_memcmp. I can't find any instances of this in my system C headers (except the definition of memcmp itself), though there are a few in STL C++ headers. |
That is good to hear. |
I don't believe this is true, a compiler target also encodes the exact libc variant that is used, and the compiler can use that knowledge to its advantage (see #/776 for example) a few examples of how without headers the compiler can remove calls to libc and replace them with equivalent instructions while assuming those calls don't have side effects. (and are well known) EDIT:
I still believe this could happen, just like gcc turns this code into a "return 0": #include <stddef.h>
void *calloc(size_t nmemb, size_t size);
int zeroed_alloc(int num) {
int* p = calloc(num, sizeof(int));
int ret = *p;
free(p);
return ret;
} |
Given that we have some evidence that this does not happen when the GCC knows that the return value is compared with Of course #825 is a simple fix but it's somewhat arbitrary then. See also bitcoin/bitcoin#20005 (comment), which does not show any potential issues in secp256k1. |
@elichai Sure, the compiler may know things about how C standard library functions behave (because they're specified by the standard, or because it knows additional promises the specifically used C standard library used makes). But (in general) it cannot actually look at the compiled library object code and inline it (in theory LTO could change that, but there is no LTO done for glibc IIRC). So just the fact that a particular function inside glibc is written using memcmp isn't relevant - except to the extent that it may be miscompiled inside glibc itself - and there is nothing we can do about that. |
@real-or-random I don't know - even with evidence that the current codebase is unaffected, it's still scary - evidenced by the fact that we hit it randomly in PR #822 (thankfully in test-only code, but it could have been elsewhere). |
If people feel we should do #825, then I'm not against this. It certainly won't hurt. |
Can you also try this on libsecp with all the features on + tests? |
We can make |
Still nothing prevents reintroduction of |
Let's keep this open to discuss how an accidental memcmp can be prevent. CI could do a simple grep for the word |
For reference real-or-random posted a clang-query command at #825 (comment). |
Hm, do we really want to restrict ourselves to a small list of standard library functions? I don't think the standard library is bad per se. Compiler bugs can happen everywhere, not only in calls to the standard library. Moreover, new calls are easily spotted in code review. I think memcmp is simply different because one needs to remember that memcmp is special. If you want to give it a try: |
If we are going to whitelist standard library functions, and I'm not arguing here that we should or shouldn't, one possible solution is to write our own header of standard library prototypes from our whitelist and disallow all system include files. That said I don't know how to enforce that system includes are disallowed. |
|
See also #833 |
This add a simple static checker based on clang-query, which is a tool that could be described as a "clever grep" for abstract syntax trees (ASTs). As an initial proof of usefulness, this commit adds these checks: - No uses of floating point types. - No use of certain reserved identifiers (e.g., "mem(...)", bitcoin-core#829). - No use of memcmp (bitcoin-core#823). The checks are easily extensible. The main purpose is to run the checker on CI, and this commit adds the checker to the Travis CI script. This currently requires clang-query version at least 10. (However, it's not required not compile with clang version 10, or with clang at all. Just the compiler flags must be compatible with clang.) Clang-query simply uses the clang compiler as a backend for generating the AST. In order to determine the compile in which the code is supposed to be compiled (e.g., compiler flags such as -D defines and include paths), it reads a compilation database in JSON format. There are multiple ways to generate this database. The easiest way to obtain such a database is to use a tool that intercepts the make process and build the database. On Travis CI, we currently use "bear" for this purpose. It's a natural choice because there is an Ubuntu package for it. If you want to run this locally, bear is a good choice but other tools such as compiledb (Python) are available.
This add a simple static checker based on clang-query, which is a tool that could be described as a "clever grep" for abstract syntax trees (ASTs). As an initial proof of usefulness, this commit adds these checks: - No uses of floating point types. - No use of certain reserved identifiers (e.g., "mem(...)", bitcoin-core#829). - No use of memcmp (bitcoin-core#823). The checks are easily extensible. The main purpose is to run the checker on CI, and this commit adds the checker to the Travis CI script. This currently requires clang-query version at least 10. (However, it's not required not compile with clang version 10, or with clang at all. Just the compiler flags must be compatible with clang.) Clang-query simply uses the clang compiler as a backend for generating the AST. In order to determine the compile in which the code is supposed to be compiled (e.g., compiler flags such as -D defines and include paths), it reads a compilation database in JSON format. There are multiple ways to generate this database. The easiest way to obtain such a database is to use a tool that intercepts the make process and build the database. On Travis CI, we currently use "bear" for this purpose. It's a natural choice because there is an Ubuntu package for it. If you want to run this locally, bear is a good choice but other tools such as compiledb (Python) are available.
As fanquake noted in bitcoin/bitcoin#20005 (comment), this is fixed in GCC 10.3 and above. |
I just lost an afternoon trying to debug a valgrind false positive. https://github.com/bitcoin-core/secp256k1/pull/1140/files#diff-3fe8f8fa0b765ad49f70d6c32f6a865be48faeb3c8d6dd5f8c274ca546ef5b61R1111 (click 'Load diff' on tests_impl.h). The line
resulted in this CI failure: https://github.com/bitcoin-core/secp256k1/actions/runs/11285462388/job/31388294841
I could not get valgrind to point me to the right line, so bisecting the line took a very long time. I don't think anything is actually wrong with the code there. In the end changing This happened on macOS x86 for both gcc and clang. The valgrind check on linux worked fine. Consider adding a CI check that forbids the use uf edit: above you mention
This turned out to be a wrong assumption - in my case, no one pointed it out to me in review, and it was very hard for me to figure this one out. If the CI check had been added as discussed above, I would not have lost so much time on this. |
@benma I see that a CI check could also be helpful to avoid a Valgrind false positive, but in these cases, you should probably report to https://github.com/LouisBrunner/valgrind-macos. Valgrind ships with a bunch of standard suppressions, but these need to be updated from time to time with new macOS versions. And we're probably one of the main users of this Valgrind macOS fork, so last time @hebasto updated the suppressions, see LouisBrunner/valgrind-macos#114. @hebasto Do you think this one should also be submitted to upstream, even though we strictly speaking won't need it due to |
By the way, here's a godbolt link to check which GCC versions are affected by the original bug: It would be nice to get rid of |
What about the other memcmp's we have in actual production code? (bip340 tag, tweak add check, scratch impl, sha256 selftest) does this bug affect those too?
Originally posted by @elichai in #822 (comment)
context: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95189
The text was updated successfully, but these errors were encountered: