Fix relocation overflows by implementing preallocation in the memory manager #1009

gmarkall · 2023-11-14T15:57:10Z

This implements a memory manager based on the MCJIT SectionMemoryManager, with a preallocation strategy that ensures all segments of an object are placed within a single block of mapped memory. This is intended to resolve the relocation overflow issues on AArch64 (numba/numba#8567, numba/numba#9001), which occur when the GOT segment is far from the code segment.

The changes are based on those by @MikaelSmith in llvm/llvm-project#71968 and his code in https://github.com/MikaelSmith/impala/blob/ac8561b6b69530f9fa2ff2ae65ec7415aa4395c6/be/src/codegen/mcjit-mem-mgr.cc - there is additional discussion / background in the LLVM Discourse thread and on the aforementioned Numba issues.

I believe this is now ready for some review - notes to reviewers:

Only the last commit is substantial change, and adds the preallocation strategy. The others are incorporating the SectionMemoryManager "as-standard" into llvmlite.
The changes here don't exactly match the ones in the PR to LLVM, but are substantially similar - as the review of that proceeds I expect to align this with the changes upstream as necessary / appropriate.
I don't understand how the memory allocation / mapping really works, in particular what pending memory is and pending prefix indices are - I had the idea to just clear the free memory vectors for each memory group which seemed to work (and was picked up in the LLVM PR) but I'm not sure this approach is 100% correct or could be made better.
Testing with this branch with the reproducer in https://github.com/gmarkall/numba-issue-9001 allows it to run apparently indefinitely on my Jetson AGX Xavier and Orin systems - previously it would crash at 10 or fewer iterations.
The memory manager is enabled for all platforms in this branch - this is really good for pipecleaning / exposing potential issues, but might not be what we want in production. For a final / ready PR, I'd expect to always build the memory manager and make it available, but enable it only by default on AArch64 systems.

cc @sjoerdmeijer for review.

Copied verbatim from llvm/llvm-project@f28c006a5895, files: ``` llvm/include/llvm/ExecutionEngine/SectionMemoryManager.h llvm/lib/ExecutionEngine/SectionMemoryManager.cpp ```

This makes them compliant with our C++ style check.

Notes on the changes: - The memory manager is added to the build system. - The `LlvmliteMemoryManager` class is exported as a public interface. - When creating an execution engine, we set it to use our memory manager.

gmarkall · 2023-11-15T16:42:28Z

I hit an assertion in the Numba test suite on an M2 system:

Assertion failed: (false && "All memory must be pre-allocated"), function allocateSection, file memorymanager.cpp, line 107.
Fatal Python error: Aborted

Looking into which test caused this now.

gmarkall · 2023-11-15T17:43:25Z

To reproduce:

python runtests.py numba.tests.test_array_reductions.TestArrayReductions.test_nanquantile_basic

gmarkall · 2023-11-15T17:48:04Z

Looks like we somehow don't quite reserve enough space for code mem - with the -debug-only=llvmlite-memory-manager flag set, I see:

Reserving 0xC000 bytes
Code mem starts at 0x0000000129BD0000, size 0x4000
Rwdata mem starts at 0x0x0000000129BD4000, size 0x4000
Allocating 0x4008 bytes for CodeMem at Assertion failed: (false && "All memory must be pre-allocated"), function allocateSection, file memorymanager.cpp, line 107.
Fatal Python error: Aborted

MikaelSmith · 2023-11-15T17:56:34Z

ffi/memorymanager.cpp

+    // Use the same calculation as allocateSection because we need to be able to
+    // satisfy it.
+    uintptr_t RequiredSize =
+        Alignment * ((Size + Alignment - 1) / Alignment + 1);


This uses the same oversizing that's used in allocateSection (which is done to let it align the address as well), but it can only do it on the full reservation. I was hoping the caller would ensure they're calculation had sufficient buffer for multiple calls to allocateDataSection/allocateCodeSection, but it looks like there may be circumstances where that's not true.

Ah, thanks for looking at this PR! Tracing through things a bit I just noticed https://github.com/numba/llvmlite/pull/1009/files#r1394577484

I think the reason for the discrepancy is that the reservation is made with the alignment of the code section's alignment (4), but the actual allocation is made with the max of the alignment of the code section (4) and the stub alignment (which is 8 on AArch64 on macOS, and maybe other platforms). So I think the code segment preallocation needs to be aligned to max(code section alignment, stub alignment) too.

gmarkall · 2023-11-15T18:05:05Z

ffi/memorymanager.cpp

+           "Alignment must be a power of two.");
+
+    uintptr_t RequiredSize =
+        Alignment * ((Size + Alignment - 1) / Alignment + 1);


This computation is pushing a request for 16379 bytes with alignment 8 up to a size of 16392 bytes, which is larger than the 16384 bytes reserved, in the case where the assertion is being hit in the tests on M2.

A couple of other things I just noticed:

For reserveAllocationSpace(), the requested code size is 16380 bytes with an align of 4

When the allocateDataSection() call is made for code, the request is for 16379 bytes with an align of 8

This discrepancy leads to the request being just slightly over what was reserved for code - I think the next step is to look into why the alignment and requested sizes differ.

Output from the extra debugging prints I just pushed:

Code size / align: 0x3FFC / 4 ROData size / align: 0x0 / 1 RWData size / align: 0x2360 / 16 Reserving 0xC000 bytes Code mem starts at 0x0000000132DDC000, size 0x4000 Rwdata mem starts at 0x0x0000000132DE0000, size 0x4000 Requested size / alignment: 0x3FFB / 8 Allocating 0x4008 bytes for CodeMem at Assertion failed: (false && "All memory must be pre-allocated"), function allocateSection, file memorymanager.cpp, line 109.

61ae2b0 appears to resolve this issue and allow the test to run to completion.

ffi/memorymanager.cpp

sjoerdmeijer · 2023-11-16T09:11:51Z

ffi/memorymanager.cpp

+//
+// This file implements the section-based memory manager used by the MCJIT
+// execution engine and RuntimeDyld
+//


Nit: maybe some more rationale here why we are switching to this.

Yes, agreed.

sjoerdmeijer · 2023-11-16T09:39:43Z

ffi/memorymanager.h

+                                 StringRef SectionName) override;
+
+    /// Allocates a memory block of (at least) the given size suitable for
+    /// executable code.


Nit: executable code -> data.
(looks a copy-paste typo from the previous method)

ffi/memorymanager.cpp

sjoerdmeijer · 2023-11-16T11:20:22Z

ffi/memorymanager.cpp

+    // allocated due to page alignment, but if we have insufficient free memory
+    // for the request this can lead to allocating disparate memory that can
+    // violate the ARM ABI. Clear free memory so only the new allocations are
+    // used, but do not release allocated memory as it may still be in-use.


I had to read this a couple of times, but think I am getting it now. Let me check my understanding/logic here, perhaps it can be used to make the problem description /solution a bit more crisp.

The objective is allocate memory (blocks) that are "near" to each other. Keeping blocks near makes it less likely that the distance between different memory addresses would become too large and e.g. violate ARM ABI relocation restrictions. If a code/rodata/rwdata memory space has been allocated, but not all space is used (e.g. excess blocks that were allocated due to page alignment), then we do mark all memory as being used by clearing the "free memory" here in that space. This has the effect that a next allocation request is not going to try and scavenge some free blocks from somewhere else, thus avoiding that it finds some memory that is potentially "far away".

And further to my previous comment, freeing the free space is the crux of the solution. So I think we need to spend a little bit more time on discussing/documenting the alternatives and pros/cons. The alternative mentioned in the LLVM discourse thread talks about potentially doing this in the finalizeMemory() method, but doing it here has the benefit of being less intrusive, at the cost of wasting some memory.

It wouldn't be too difficult to quantify the waste, I guess. In an experiment we could iterate over the free blocks and sum the sizes and print that for the numba test suite. Don't know if we are going to learn anything, but just an idea.

But given the simplicity of the approach, this definitely looks like the most appealing. I am going to look a bit further in this though, to see what you mean by "pending prefix indices" that you mentioned in the description and how that fits into the picture here.

gmarkall · 2023-11-16T12:35:49Z

Status update - with the commit 61ae2b0 I can get through the whole test suite (with the usually-skipped tests not skipped):

diff --git a/numba/tests/test_array_constants.py b/numba/tests/test_array_constants.py
index a33dacd49..386c1856b 100644
--- a/numba/tests/test_array_constants.py
+++ b/numba/tests/test_array_constants.py
@@ -141,7 +141,6 @@ class TestConstantArray(unittest.TestCase):
         out = cres.entry_point()
         self.assertEqual(out, 86)
 
-    @skip_m1_llvm_rtdyld_failure
     def test_too_big_to_freeze(self):
         """
         Test issue https://github.com/numba/numba/issues/2188 where freezing
diff --git a/numba/tests/test_stencils.py b/numba/tests/test_stencils.py
index 2a65c0370..1e2f8dc77 100644
--- a/numba/tests/test_stencils.py
+++ b/numba/tests/test_stencils.py
@@ -80,7 +80,6 @@ if not _32bit: # prevent compilation on unsupported 32bit targets
         return a + 1
 
 
-@skip_m1_llvm_rtdyld_failure   # skip all stencil tests on m1
 class TestStencilBase(unittest.TestCase):
 
     _numba_parallel_test_ = False

resulting in:

======================================================================
FAIL: test_no_accidental_warnings (numba.tests.test_import.TestNumbaImport)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/gmarkall/work/numbadev/numba/numba/tests/test_import.py", line 103, in test_no_accidental_warnings
    run_in_subprocess(code, flags)
  File "/Users/gmarkall/work/numbadev/numba/numba/tests/support.py", line 1121, in run_in_subprocess
    raise AssertionError(msg % (popen.returncode, err.decode()))
AssertionError: process failed with code 1: stderr follows
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/Users/gmarkall/work/numbadev/numba/numba/__init__.py", line 230, in <module>
    _ensure_llvm()
  File "/Users/gmarkall/work/numbadev/numba/numba/__init__.py", line 169, in _ensure_llvm
    warnings.warn("llvmlite version format not recognized!")
UserWarning: llvmlite version format not recognized!



======================================================================
FAIL: test_unsafe_import_in_registry (numba.tests.test_np_functions.TestRegistryImports)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/gmarkall/work/numbadev/numba/numba/tests/test_np_functions.py", line 6172, in test_unsafe_import_in_registry
    self.assertEquals(b"", error.strip())
AssertionError: b'' != b'/Users/gmarkall/work/numbadev/numba/numba[126 chars]d!")'

======================================================================
FAIL: test_repr_long_list_ipython (numba.tests.test_typedlist.TestTypedList)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/gmarkall/work/numbadev/numba/numba/tests/test_typedlist.py", line 563, in test_repr_long_list_ipython
    self.assertEqual(expected, err)
AssertionError: 'ListType[int64]([0, 1, 2, 3, 4, 5, 6, 7, [4867 chars]..])' != '/Users/gmarkall/work/numbadev/numba/numba[5040 chars]..])'
Diff is 10176 characters long. Set self.maxDiff to None to see it.

======================================================================
FAIL: test_repr_long_list_ipython (numba.tests.test_typedlist.TestTypedList)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/gmarkall/work/numbadev/numba/numba/tests/support.py", line 909, in tearDown
    self.memory_leak_teardown()
  File "/Users/gmarkall/work/numbadev/numba/numba/tests/support.py", line 884, in memory_leak_teardown
    self.assert_no_memory_leak()
  File "/Users/gmarkall/work/numbadev/numba/numba/tests/support.py", line 893, in assert_no_memory_leak
    self.assertEqual(total_alloc, total_free)
AssertionError: 2 != 1

----------------------------------------------------------------------
Ran 10387 tests in 1000.715s

FAILED (failures=4, skipped=639, expected failures=13)

I believe the failures are innocuous:

One because the LLVM version string is not as expected (probably due to slight edits to the llvmlite examples I had in my tree)
One is because the memory leak test gives a spurious fail when a related test fails
The other two I need to look into, but I'm pretty sure they're due to some local weirdness.

gmarkall · 2023-11-16T14:41:54Z

Still an issue on Linux AArch64, although this is maybe a latent bug in cleanup in Numba:

$ gdb --args python runtests.py numba.tests.test_ctypes.TestCTypesUseCases.test_python_call_back -v -m
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.1) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "aarch64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python...
(No debugging symbols found in python)
(gdb) run
Starting program: /home/gmarkall/mambaforge/envs/numbadev/bin/python runtests.py numba.tests.test_ctypes.TestCTypesUseCases.test_python_call_back -v -m
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
[Detaching after vfork from child process 40625]
[New Thread 0xffffee7901e0 (LWP 40626)]
[New Thread 0xffffedf8f1e0 (LWP 40627)]
[New Thread 0xffffeb78e1e0 (LWP 40628)]
[New Thread 0xffffe8f8d1e0 (LWP 40629)]
[New Thread 0xffffe478c1e0 (LWP 40630)]
[New Thread 0xffffe3f8b1e0 (LWP 40631)]
[New Thread 0xffffdf78a1e0 (LWP 40632)]
[New Thread 0xffffdef891e0 (LWP 40633)]
[New Thread 0xffffda7881e0 (LWP 40634)]
[New Thread 0xffffd7f871e0 (LWP 40635)]
[New Thread 0xffffd57861e0 (LWP 40636)]
[Detaching after vfork from child process 40637]
[Detaching after vfork from child process 40638]
[Detaching after vfork from child process 40639]
[Detaching after vfork from child process 40640]
[Detaching after vfork from child process 40641]
/home/gmarkall/numbadev/numba/numba/__init__.py:169: UserWarning: llvmlite version format not recognized!
  warnings.warn("llvmlite version format not recognized!")
Parallel: 1. Serial: 0
[Thread 0xffffdef891e0 (LWP 40633) exited]
[Thread 0xffffda7881e0 (LWP 40634) exited]
[Thread 0xffffe3f8b1e0 (LWP 40631) exited]
[Thread 0xffffedf8f1e0 (LWP 40627) exited]
[Thread 0xffffee7901e0 (LWP 40626) exited]
[Thread 0xffffdf78a1e0 (LWP 40632) exited]
[Thread 0xffffd57861e0 (LWP 40636) exited]
[Thread 0xffffd7f871e0 (LWP 40635) exited]
[Thread 0xffffe478c1e0 (LWP 40630) exited]
[Thread 0xffffe8f8d1e0 (LWP 40629) exited]
[Thread 0xffffeb78e1e0 (LWP 40628) exited]
[Detaching after fork from child process 40642]
[Detaching after fork from child process 40643]
[Detaching after fork from child process 40644]
[Detaching after fork from child process 40645]
[Detaching after fork from child process 40646]
[Detaching after fork from child process 40647]
[Detaching after fork from child process 40648]
[Detaching after fork from child process 40649]
[Detaching after fork from child process 40650]
[Detaching after fork from child process 40651]
[Detaching after fork from child process 40652]
[Detaching after fork from child process 40653]
[New Thread 0xffffd57861e0 (LWP 40654)]
[New Thread 0xffffd7f871e0 (LWP 40655)]
[New Thread 0xffffda7881e0 (LWP 40656)]
Code size / align: 0x4 / 4
ROData size / align: 0x0 / 1
RWData size / align: 0x0 / 1
Reserving 0x3000 bytes
Code mem starts at 0x0000FFFFF7286000, size 0x1000
Code size / align: 0x128 / 4
ROData size / align: 0x130 / 16
RWData size / align: 0x0 / 1
Reserving 0x3000 bytes
Code mem starts at 0x0000FFFFF7283000, size 0x1000
Rodata mem starts at 0x0x0000FFFFF7284000, size 0x1000
Requested size / alignment: 0x128 / 4
Allocating 0x12C bytes for CodeMem at 0x0000FFFFF7283000
Requested size / alignment: 0xE0 / 16
Allocating 0xF0 bytes for RODataMem at 0x0000FFFFF7284000
Requested size / alignment: 0x48 / 8
Allocating 0x50 bytes for RODataMem at 0x0000FFFFF72840E0
Code size / align: 0xD6C / 4
ROData size / align: 0x5B0 / 16
RWData size / align: 0xB0 / 8
Reserving 0x3000 bytes
Code mem starts at 0x0000FFFFEDC8E000, size 0x1000
Rodata mem starts at 0x0x0000FFFFEDC8F000, size 0x1000
Rwdata mem starts at 0x0x0000FFFFEDC90000, size 0x1000
Requested size / alignment: 0xD6C / 4
Allocating 0xD70 bytes for CodeMem at 0x0000FFFFEDC8E000
Requested size / alignment: 0x48D / 16
Allocating 0x4A0 bytes for RODataMem at 0x0000FFFFEDC8F000
Requested size / alignment: 0x8 / 8
Allocating 0x10 bytes for RWDataMem at 0x0000FFFFEDC90000
Requested size / alignment: 0x114 / 8
Allocating 0x120 bytes for RODataMem at 0x0000FFFFEDC8F490
Requested size / alignment: 0x8 / 8
Allocating 0x10 bytes for RWDataMem at 0x0000FFFFEDC90008
Requested size / alignment: 0x30 / 8
Allocating 0x38 bytes for RWDataMem at 0x0000FFFFEDC90010
test_python_call_back (numba.tests.test_ctypes.TestCTypesUseCases) ... ok
[Thread 0xffffd57861e0 (LWP 40654) exited]
[Thread 0xffffda7881e0 (LWP 40656) exited]
[Thread 0xffffd7f871e0 (LWP 40655) exited]

----------------------------------------------------------------------
Ran 1 test in 0.645s

OK

Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x0000fffff73efb48 in dlfree () from /home/gmarkall/mambaforge/envs/numbadev/lib/python3.10/lib-dynload/../../libffi.so.8
(gdb) bt
#0  0x0000fffff73efb48 in dlfree () from /home/gmarkall/mambaforge/envs/numbadev/lib/python3.10/lib-dynload/../../libffi.so.8
#1  0x0000fffff7417768 in CThunkObject_dealloc ()
   from /home/gmarkall/mambaforge/envs/numbadev/lib/python3.10/lib-dynload/_ctypes.cpython-310-aarch64-linux-gnu.so
#2  0x0000aaaaaab3aa88 in free_keys_object ()
#3  0x0000aaaaaab3b398 in dict_dealloc ()
#4  0x0000fffff741060c in PyCFuncPtr_clear ()
   from /home/gmarkall/mambaforge/envs/numbadev/lib/python3.10/lib-dynload/_ctypes.cpython-310-aarch64-linux-gnu.so
#5  0x0000fffff74106c4 in PyCFuncPtr_dealloc ()
   from /home/gmarkall/mambaforge/envs/numbadev/lib/python3.10/lib-dynload/_ctypes.cpython-310-aarch64-linux-gnu.so
#6  0x0000aaaaaab63548 in subtype_dealloc ()
#7  0x0000aaaaaab3aa88 in free_keys_object ()
#8  0x0000aaaaaab3f0b4 in dict_tp_clear ()
#9  0x0000aaaaaac19af8 in gc_collect_main ()
#10 0x0000aaaaaac1a954 in _PyGC_CollectNoFail ()
#11 0x0000aaaaaabef3a0 in finalize_modules ()
#12 0x0000aaaaaabf25c4 in Py_FinalizeEx ()
#13 0x0000aaaaaabf35ec in Py_Exit ()
#14 0x0000aaaaaabf9058 in _PyErr_PrintEx ()
#15 0x0000aaaaaabf98e4 in _PyRun_SimpleFileObject ()
#16 0x0000aaaaaabf9bf0 in _PyRun_AnyFileObject ()
#17 0x0000aaaaaab0f888 in Py_RunMain ()
#18 0x0000aaaaaab0fec4 in Py_BytesMain ()
#19 0x0000fffff7d52e10 in __libc_start_main (main=0xaaaaaab04f90 <main>, argc=5, argv=0xfffffffff208, init=<optimised out>, fini=<optimised out>, 
    rtld_fini=<optimised out>, stack_end=<optimised out>) at ../csu/libc-start.c:308
#20 0x0000aaaaaab0e6b8 in _start ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb)

sjoerdmeijer · 2023-11-16T15:13:07Z

ffi/memorymanager.cpp

+
+    // Look in the list of free memory regions and use a block there if one
+    // is available.
+    for (FreeMemBlock &FreeMB : MemGroup.FreeMem) {


About

I don't understand how the memory allocation / mapping really works, in particular what pending memory is and pending prefix indices are

My understanding is that pending memory is memory that has allocated but not yet "finalised".
I am not sure how importing this prefix index is. It looks like a bit of bookkeeping to keep an index of the next free block.

Also my impression is that this whole loop is skipped because of clearance on lines 137 - 139.

gmarkall · 2023-11-16T16:17:57Z

Still an issue on Linux AArch64, although this is maybe a latent bug in cleanup in Numba:

It turns out that this issue is unrelated to this PR - I need to raise a Numba issue shortly.

gmarkall · 2023-11-16T17:38:35Z

With the offending ctypes tests skipped, on Linux AArch64 the test results are quite similar to those on macOS:

======================================================================
FAIL: test_no_accidental_warnings (numba.tests.test_import.TestNumbaImport)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/gmarkall/numbadev/numba/numba/tests/test_import.py", line 103, in test_no_accidental_warnings
    run_in_subprocess(code, flags)
  File "/home/gmarkall/numbadev/numba/numba/tests/support.py", line 1121, in run_in_subprocess
    raise AssertionError(msg % (popen.returncode, err.decode()))
AssertionError: process failed with code 1: stderr follows
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/gmarkall/numbadev/numba/numba/__init__.py", line 230, in <module>
    _ensure_llvm()
  File "/home/gmarkall/numbadev/numba/numba/__init__.py", line 169, in _ensure_llvm
    warnings.warn("llvmlite version format not recognized!")
UserWarning: llvmlite version format not recognized!



======================================================================
FAIL: test_unsafe_import_in_registry (numba.tests.test_np_functions.TestRegistryImports)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/gmarkall/numbadev/numba/numba/tests/test_np_functions.py", line 6172, in test_unsafe_import_in_registry
    self.assertEquals(b"", error.strip())
AssertionError: b'' != b'/home/gmarkall/numbadev/numba/numba/__ini[120 chars]d!")'

======================================================================
FAIL: test_repr_long_list_ipython (numba.tests.test_typedlist.TestTypedList)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/gmarkall/numbadev/numba/numba/tests/test_typedlist.py", line 563, in test_repr_long_list_ipython
    self.assertEqual(expected, err)
AssertionError: 'ListType[int64]([0, 1, 2, 3, 4, 5, 6, 7, [4867 chars]..])' != '/home/gmarkall/numbadev/numba/numba/__ini[5034 chars]..])'
Diff is 10164 characters long. Set self.maxDiff to None to see it.

======================================================================
FAIL: test_repr_long_list_ipython (numba.tests.test_typedlist.TestTypedList)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/gmarkall/numbadev/numba/numba/tests/support.py", line 909, in tearDown
    self.memory_leak_teardown()
  File "/home/gmarkall/numbadev/numba/numba/tests/support.py", line 884, in memory_leak_teardown
    self.assert_no_memory_leak()
  File "/home/gmarkall/numbadev/numba/numba/tests/support.py", line 893, in assert_no_memory_leak
    self.assertEqual(total_alloc, total_free)
AssertionError: 2 != 1

----------------------------------------------------------------------
Ran 11867 tests in 4344.252s

FAILED (failures=4, skipped=592, expected failures=24)

So as far as I can tell, there are no outstanding issues with the implementation in this PR in its present form.

gmarkall · 2023-11-16T17:41:17Z

As a follow-up on the cause of those fails - they are all rooted in the warning about the llvmlite version not being recognized being produced - not an actual issue.

gmarkall

The next thing on my to-do list is to align the changes in this branch / PR more closely with @MikaelSmith's changes in llvm/llvm-project#71968 and re-test - I think that the change for stub alignment to be taken into account might still be needed, but it would be best to check first.

Based on changes submitted / discussed in: - llvm/llvm-project#71968 - https://github.com/MikaelSmith/impala/blob/ac8561b6b69530f9fa2ff2ae65ec7415aa4395c6/be/src/codegen/mcjit-mem-mgr.cc - https://discourse.llvm.org/t/llvm-rtdyld-aarch64-abi-relocation-restrictions/74616

The implementation of `reserveAllocationSpace()` now more closely follows that in llvm/llvm-project#71968, following some changes made there. The changes here include: - Improved readability of debugging output - Using a default alignment of 8 in `allocateSection()` to match the default alignment provided by the stub alignment during preallocation. - Replacing the "bespoke" `requiredPageSize()` function with computations using the LLVM `alignTo()` function. - Returning early from preallocation when no space is requested. - Reusing existing preallocations if there is enough space left over from the previous preallocation for all the required segments - this can happen quite frequently because allocations for each segment get rounded up to page sizes, which are usually either 4K or 16K, and many Numba-jitted functions require a lot less than this. - Removal of setting the near hints for memory blocks - this doesn't really have any use when all memory is preallocated, and forced to be "near" to other memory. - Addition of extra asserts to validate alignment of allocated sections.

The default is to enable it on 64-bit ARM systems, since it solves the problem they encounter, and disable it elsewhere, to minimise the risk of an unintended side effect on platforms that don't need it. This can be overridden by manually specifying the value of `use_lmm` when creating the MCJIT compiler.

sklam

With my limited background knowledge on what's going on with the memory manager, I reviewed the C++ code based on whether I can understand it. The code is clear and well commented.

The license addition looks good.

Buildfarm has never been happier all thanks to this patch.

numba#9337 further unskipped more test related to M1 RuntimeDyLd issues and I have ran it on the farm. All M1 tests passed.

The PPC64 linker issue

The only outstanding problem is compiler failure on PPC64LE. On the Power machine in the buildfarm, both anaconda-distro and conda-forge packages are failing to link memorymanager.o file with the error:

ld: /opt/conda/envs/cf/lib/libLLVMSupport.a(Error.cpp.o):(.data.rel.ro._ZTVN4llvm13ErrorInfoBaseE[_ZTVN4llvm13ErrorInfoBaseE]+0x40): undefined reference to `llvm::ErrorInfoBase::isA(void const*) const'

After some investigation and following suggestion in https://support.xilinx.com/s/article/20068?language=en_US, I found that adding -mlongcall when compiling memorymanager.cpp fixes the problem. However, this "fix" may introduce some performance issues since it forces all jumps to be long jumps. Since GCC9.5, the doc on mlongcall has this description:

On PowerPC64 ELFv2 and 32-bit PowerPC systems with newer GNU linkers, GCC can generate long calls using an inline PLT call sequence (see -mpltseq). PowerPC with -mbss-plt and PowerPC64 ELFv1 (big-endian) do not support inline PLT calls.

This might be a system linker too old problem or that newer GCC (>=9) can generate alternative longcall sequence to avoid the issue.

We can fix this PPC problem in a separate PR so it's not a blocker for this PR. Adding -mlongcall is probably the easiest fix for now.

sklam

Buildfarm has passed with the latest commit. This is ready for merge!!! Numba's buildfarm has never been happier as this PR will stop random failures on our arm64/aarch64 machines.

Thank you @gmarkall and everyone who reviewed this PR.

Nightly jobs have been [failing](https://github.com/rapidsai/cudf/actions/runs/7382855293/job/20083184931) with a numba segfault. This appears to be a longstanding issue with numba on aarch64 fixed by numba/llvmlite#1009. Technically, the issue exists already in our tests, but it appears that changes from numba 0.58 make the conditions for the issue to occur much more likely, hence the failures occurring after removing the numba 0.58 version constraint recently. The issue should be fixed in numba 0.59. For now however we should skip things so that nightlies can be fixed. Authors: - https://github.com/brandon-b-miller Approvers: - Bradley Dice (https://github.com/bdice) URL: #14702

gmarkall added 3 commits November 15, 2023 13:47

Copy SectionMemoryManager from LLVM 14.0.6

4ff1646

Copied verbatim from llvm/llvm-project@f28c006a5895, files: ``` llvm/include/llvm/ExecutionEngine/SectionMemoryManager.h llvm/lib/ExecutionEngine/SectionMemoryManager.cpp ```

Run clang-format-14 over memory manager files

441320f

This makes them compliant with our C++ style check.

Rename the memory manager and use it in llvmlite

8804667

Notes on the changes: - The memory manager is added to the build system. - The `LlvmliteMemoryManager` class is exported as a public interface. - When creating an execution engine, we set it to use our memory manager.

gmarkall force-pushed the aarch64memorymanager branch from cd0d357 to a7ae8c4 Compare November 15, 2023 14:32

gmarkall added the 3 - Ready for Review label Nov 15, 2023

gmarkall added this to the v0.42.0-rc1 milestone Nov 15, 2023

This was referenced Nov 15, 2023

Linux AArch64 RuntimeDyld relocation overflows (#8567 specific to Linux only) numba/numba#9001

Closed

M1 LLVM Runtimedyld Invalid page reloc value assertion error numba/numba#8567

Closed

gmarkall changed the title ~~[WIP] AArch64 memory manager~~ Fix relocation overflows by implementing preallocation in the memory manager Nov 15, 2023

MikaelSmith reviewed Nov 15, 2023

View reviewed changes

gmarkall commented Nov 15, 2023

View reviewed changes

sjoerdmeijer reviewed Nov 16, 2023

View reviewed changes

ffi/memorymanager.cpp Show resolved Hide resolved

sjoerdmeijer reviewed Nov 16, 2023

View reviewed changes

ffi/memorymanager.cpp Outdated Show resolved Hide resolved

sjoerdmeijer reviewed Nov 16, 2023

View reviewed changes

ffi/memorymanager.cpp Outdated Show resolved Hide resolved

sjoerdmeijer reviewed Nov 16, 2023

View reviewed changes

gmarkall mentioned this pull request Nov 16, 2023

Implement reserveAllocationSpace for SectionMemoryManager llvm/llvm-project#71968

Open

gmarkall commented Nov 17, 2023

View reviewed changes

gmarkall mentioned this pull request Nov 21, 2023

Numba bug on M1/M2 Mac sofarocean/oceanwavespectrum#2

Open

gmarkall force-pushed the aarch64memorymanager branch from 616a057 to 75b103c Compare November 22, 2023 13:15

sklam added the Pending BuildFarm For PRs that have been reviewed but pending a push through our buildfarm label Dec 5, 2023

sklam mentioned this pull request Dec 6, 2023

Revert #8583 which skip tests due to M1 RuntimeDyLd Assertion error numba/numba#9337

Merged

Add LLVM license

b673be6

gmarkall force-pushed the aarch64memorymanager branch from 3ee5574 to b673be6 Compare December 6, 2023 16:52

sklam reviewed Dec 6, 2023

View reviewed changes

gmarkall mentioned this pull request Dec 7, 2023

Aarch64memorymanager ppc debug #1020

Closed

Explicitly force default visibility for llvm::ErrorInfoBase

4ac8e22

sklam approved these changes Dec 7, 2023

View reviewed changes

sklam added BuildFarm Passed For PRs that have been through the buildfarm and passed 5 - Ready to merge and removed 3 - Ready for Review Pending BuildFarm For PRs that have been reviewed but pending a push through our buildfarm labels Dec 7, 2023

sklam merged commit 53488e9 into numba:main Dec 7, 2023
20 checks passed

gmarkall mentioned this pull request Dec 8, 2023

Add configuration variable to force llvmlite memory manager on / off numba/numba#9341

Merged

brandon-b-miller mentioned this pull request Jan 3, 2024

Skip numba test that fails on ARM rapidsai/cudf#14702

Merged

gmarkall mentioned this pull request Apr 30, 2024

Can AArch64 Numba tests be re-renabled? NVIDIA/DALI#5450

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix relocation overflows by implementing preallocation in the memory manager #1009

Fix relocation overflows by implementing preallocation in the memory manager #1009

gmarkall commented Nov 14, 2023 •

edited

Loading

gmarkall commented Nov 15, 2023

gmarkall commented Nov 15, 2023

gmarkall commented Nov 15, 2023

MikaelSmith Nov 15, 2023

gmarkall Nov 15, 2023

gmarkall Nov 16, 2023

gmarkall Nov 15, 2023

gmarkall Nov 15, 2023

gmarkall Nov 15, 2023

gmarkall Nov 16, 2023

sjoerdmeijer Nov 16, 2023

gmarkall Nov 16, 2023

sjoerdmeijer Nov 16, 2023

sjoerdmeijer Nov 16, 2023

sjoerdmeijer Nov 16, 2023

gmarkall commented Nov 16, 2023

gmarkall commented Nov 16, 2023

sjoerdmeijer Nov 16, 2023

gmarkall commented Nov 16, 2023

gmarkall commented Nov 16, 2023

gmarkall commented Nov 16, 2023

gmarkall left a comment

sklam left a comment

sklam left a comment

Fix relocation overflows by implementing preallocation in the memory manager #1009

Fix relocation overflows by implementing preallocation in the memory manager #1009

Conversation

gmarkall commented Nov 14, 2023 • edited Loading

gmarkall commented Nov 15, 2023

gmarkall commented Nov 15, 2023

gmarkall commented Nov 15, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gmarkall commented Nov 16, 2023

gmarkall commented Nov 16, 2023

Choose a reason for hiding this comment

gmarkall commented Nov 16, 2023

gmarkall commented Nov 16, 2023

gmarkall commented Nov 16, 2023

gmarkall left a comment

Choose a reason for hiding this comment

sklam left a comment

Choose a reason for hiding this comment

sklam left a comment

Choose a reason for hiding this comment

gmarkall commented Nov 14, 2023 •

edited

Loading