-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfault using Pkg on ARM #8314
Comments
Are you able to shell out at all? It looks like there is some memory corruption happening in the GC. |
Yes, I am able to do basic things, including a |
|
Cc: @dmbates |
Link to some valgrind issues that make |
The freelist pointer is consistently corrupted to the same address (
Which is something I already know - the pool is already corrupted at that point. The crash still occurs if It seems very odd that the pointer is always set to the same (incorrect) value: allocations don't usually feel that deterministic. I must be missing some simple explanation for that. |
I linked with electric fence and it crashes during startup.
|
It seems that |
Here's a
|
How are you linking EF? If it is statically linked into Julia, then maybe malloc and free are unhooked in some invocations, so EF throws the error when it sees a free without a corresponding malloc - even though malloc was actually called. (in other words, I think this might be a red herring) I believe the recommended way to do this is to The problem is that makes everything glacially slow (I've been running cc @Keno |
I just added the libefence.a to |
It is surprising that you were able to start Julia with efence loaded. That does not happen for me - perhaps for the reasons you mentioned? |
efence is supposed to crash on the first untracked The crash starts here: where libuv is trying to (I can't start Julia either; was prematurely testing with |
See explanation here: https://bugzilla.mozilla.org/show_bug.cgi?id=760227#c1 |
You can get past that efence stop by dynamically linking libuv - and then LLVM. But after that, it seems that there is a semaphore block between efence and LLVM. |
Ok - so the search continues then. |
cc: @vtjnash |
I believe I have isolated this to |
|
that tends to indicate that you've messed up your compiler flags between libuv and julia, resulting in the wrong size (stack-allocated) |
Nevermind, I was looking at the size of the kernel stat structs which obviously don't match libuv's stat struct. |
So, is this a libuv issue or a julia compiler flags issue? |
In
|
I don't think the allocator uses a freelist pool with MEMDEBUG, so the corruption might not show up in the same place, if at all. The size 72 freelist is consistently corrupted when using @vtjnash far as I can tell, the struct size is consistently 104 (as it should be since we compile with |
It gave me some comfort that the system image is building ok, and the issue is localized to the allocator, rather than some other corruption manifesting here. |
I stepped through with valgrind's gdb server and got some useful info (vgdb forces sequential execution so it is slow but at least doesn't hang). The corruption happens here (see this gist for more context)
where r0 holds the address we just got from
Note that the corrupted memory is the next (currently free) block at $r0 + 72
If I am reading the assembly and the ARM instruction docs correctly, this:
does a 64-bit store at r0+68. That shouldn't happen because the StatStruct that is represented at that address should only be 68 bytes total. Here is the IR for that function:
which seems correct. So if the above analysis is accurate, there may be something wrong with how LLVM is calculating the offsets for the one thing I'm not sure about is what the |
Should we try with llvm-svn? |
Already using it - pulled master two days ago. |
@ihnorton what is the code_native for that function? the comment 0x44 == 68, so it's just giving the hex representation of a number earlier on the line (in a comment) |
it looks like it may have double-counted the |
@vtjnash, thanks for that observation, that is extremely helpful. I can't access the system right now but I'll post the full disassembly later and also look at what Clang thinks the struct alignment should be on ARM (based on your hint and some very brief reading on ARM alignment rules, I think we might be under-allocating). |
Alignment should be 4, if I read the doc for the instruction right, which aligns with Julia's assumption for alignment. Regardless, |
Because ... rules.
@JeffBezanson is there an assumption in allocobj that type Z
data::Float64
end we would probably want: struct Z {
int32_t type;
int32_t pad;
double data;
} or instead, so that the data offset is constant: struct Z {
union {
void *type;
MAX_ALIGN_TYPE pad;
};
double data;
} not the following, which is what I assume it does now struct Z {
int32_t *type;
double data;
} of course, in some respects, this is just a question of whether it is valid for the compiler to choose aligned SIMD instructions (vmovsd, etc) on x64, which would expect a 16-byte aligned data field. |
Doing Pkg operations crashes julia on ARM. Perhaps this is a sign of something else that needs to be addressed as part of the port.
The text was updated successfully, but these errors were encountered: