-
Notifications
You must be signed in to change notification settings - Fork 523
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++] Undefined behaviour in generated code #506
Comments
The schema, as per the SBE specification, can use the |
Currently our unit tests do not pass due to this issue and we had to disable alignment check in ubsan. By using the memcpy trick we can inform the compiler that the load is unaligned and make sure it will not use instructions that require alignment. On amd64 the resulting binary should be the same, except the code is guaranteed not to break if some new optimization pass is introduced that would generate SSE or AVX instructions for this code. |
Have you used the |
I don't control the specification, I'm using what is provided by the exchange etc. But that is not the problem. The problem is that the generated code is relying on undefined behaviour. The compiler will generate correct instructions on amd64 most of the time. By using |
Would love to see a PR that does this for gets and sets. But it would need to be optional so that the original intent (of using I'll put it on my list to get to. But may be a while. Lots of stuff ahead of it. |
I think in this case it could be
btw, what is the reason to use void for function in C++ code? It's so old C fashion .... But for modern compiler it's completely redundant. |
Haven't exterminated all the |
I think @myrgy code will generate optimal code on amd64 for any alignment. On platforms without unaligned loads it would probably generate correct, but less efficient code. If loads are guaranteed to be aligned, the current code is better. What about adding a template like this to sbe.h support header: template <typename T>
// Add enable_if to only work with integer types
// Maybe add force inline attributes for supported compilers
inline T load(const char *buffer) {
#ifdef SBE_ASSUME_ALIGNED
assert(buffer & (alignof(T) - 1) == 0 && "loads need to be aligned");
return *reinterpret_cast<const T*>(buffer);
#else
T val;
std::memcpy(&val, buffer, sizeof(val));
return val;
#endif
} |
Or maybe we can let people on these esoteric platform deal with this problem? AMD64 and ARM have efficient loads for unaligned integers and the memcpy version should generate the correct instructions. |
memcpy will have to at least check for alignment so it will always be slower to some extent. It is a difficult choice of how far to go with support with the costs it will incur in development and at runtime. |
@mjpt777 , please take a look at compilation results: gcc and clang generate almost equal code for memcpy and reinterpret_cast |
There is zero cost to using this correct implementation. It also has the benefit of working on all CPUs without crashing. Given that the overhead is 0 I don't see why we should use a implementation which allows the compiler to generate code that will crash my system. Minimal example: #include <stdlib.h>
#include <memory.h>
typedef unsigned int u32;
u32 reada32(const void* ptr) { return *(const u32*) ptr; }
u32 readu32(const void* ptr)
{
u32 result;
memcpy(&result, ptr, 4);
return result;
} On amd64 (-std=c++14 -O3):
On arm64 (-std=c++14 -O3):
On arm (-std=c++14 -O3 -mcpu=cortex-m7 -march=armv6):
|
Thanks this is interesting to know. BTW how does memcpy know the address in the buffer is aligned? |
I can see looking from your example it is easy for the compiler to determine what it is copying is aligned. Have you tried it with the address coming from an opaque buffer the it is likely to happen in a real application? |
I do agree that this will mostly be optimized and thus have no cost. However, implying this would always be optimized on all systems, all CPUs, and with all compilers is a strong statement. That I am pretty sure will be false. When adding in that the buffer being used may or may not be aligned it is guaranteed to not always be performant. SBE is a performance oriented tool. That means some requirements get pushed back onto the user to make the correct choice. I am not convinced that Although, I think a |
yes, can work, but if compiler decided to vectorize operation on such data - it will crash application. because vector registers don't support unaligned access. memcpy inform compiler that access might be unaligned - so prevent such unsafe operation in that case. here is example of crashing app: https://godbolt.org/g/TbVKee
|
@mjpt777 The whole point of introducing the memcpy is to tell the compiler that this load is unaligned. Per the C standard memcpy has no alignment requirement and thus the compiler needs to ensure the load is done in a way that no alignment is assumed. So memcpy specifically doesn't know about alignment at compile time, it will use alignment at runtime to copy large blocks using vector instructions, but that is irrelevant here. @tmontgomery I'm saying that using memcpy will cause correct code (sans compiler bugs) to be generated on all platforms. Currently a new optimization pass can break existing code, since SBE is relying on undefined behavior. Be default we should not be relying on undefined behavior (ie use memcpy). If someone want to use this on some old SPARC platform etc and take advantage of a properly aligned schema, I would say it is up to them to implement it. Currently the code is causing issues for the 99.9% of users not using old esoteric platforms. Also for the version without memcpy there should be asserts verifying the alignment requirements when loading and storing values. @myrgy It actually generated unaligned vector loads in this case. |
@rigtorp , yep, my fault - I saw xmm registers usage and didn't check if load instructions support unaligned access. |
Processors with AVX support allow unaligned access to XMM registers having relaxed the older rule. http://www.agner.org/optimize/optimizing_assembly.pdf However this discussion is interesting. We'll look at how we can generate the memcpy option. |
Discussing it, memcpy should be the default. We use bounds checking on as the default and this goes along with that "safe by default" mentality. But we do want to allow the current method to be generated if desired. However, when it is generated, all alignment is up to the app. |
…ts so that alignment isn't an issue. For #530 used a union to do byte swap on float and double accesses for single values and arrays.
Moved to memcpy on access of all fields for safety. Decided not to make this optional as it was just too much of a nightmare. |
The C++ code generator typically generates code like this for accessing integer fields:
The compiler assumes that a pointer to a int32_t has the correct alignment. With this generated code the alignment requirement might not fulfilled. On amd64 this is fine as long as the compiler doesn't try to use SSE or AVX instructions, but it's not safe in general to assume it will work.
The solution is to use memcpy like this (https://chromium.googlesource.com/chromium/src.git/+/master/base/bit_cast.h):
This should optimize to a single load load on amd64, guaranteed not to use instructions requiring alignment.
The text was updated successfully, but these errors were encountered: