Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C++ exception when reshaping after transposing on GPU #104

Closed
rounak opened this issue Jun 24, 2024 · 20 comments · Fixed by ml-explore/mlx#1253
Closed

C++ exception when reshaping after transposing on GPU #104

rounak opened this issue Jun 24, 2024 · 20 comments · Fixed by ml-explore/mlx#1253

Comments

@rounak
Copy link
Contributor

rounak commented Jun 24, 2024

The following code (or this sample project):

var arr = MLXArray([0,1,2,3,4,5,6,7,8,9,10,11]).reshaped([1, 2, 2, 3])
print(arr)

arr = arr.transposed(0, 2, 1, 3)
print(arr)
print(arr.reshaped([1, 4, 3]))

results in a c++ vector index out of bounds exception on the last reshape operation with the following backtrace:

Click to expand
* thread #5, stop reason = signal SIGABRT
    frame #0: 0x0000000186ff15e0 libsystem_kernel.dylib`__pthread_kill + 8
    frame #1: 0x0000000103e3bfa8 libsystem_pthread.dylib`pthread_kill + 288
    frame #2: 0x0000000186f36908 libsystem_c.dylib`abort + 128
  * frame #3: 0x0000000100075cfc MLXPlayground`std::__1::vector<unsigned long, std::__1::allocator<unsigned long>>::operator[][abi:de180100](this=0x00006000024fb3d8 size=3, __n=3) const at vector:1400:3
    frame #4: 0x0000000100558778 MLXPlayground`std::__1::tuple<std::__1::vector<int, std::__1::allocator<int>>, std::__1::vector<std::__1::vector<unsigned long, std::__1::allocator<unsigned long>>, std::__1::allocator<std::__1::vector<unsigned long, std::__1::allocator<unsigned long>>>>> mlx::core::collapse_contiguous_dims<unsigned long>(shape=size=4, strides=size=2) at utils.h:76:32
    frame #5: 0x0000000100b38ec0 MLXPlayground`void mlx::core::copy_gpu_inplace<unsigned long>(in=0x00006000028a8400, out=0x00006000028b4140, data_shape=size=4, strides_in_pre=size=4, strides_out_pre=size=3, inp_offset=0, out_offset=0, ctype=General, s=0x00006000009a76d0) at copy.cpp:59:27
    frame #6: 0x0000000100b38da0 MLXPlayground`mlx::core::copy_gpu_inplace(in=0x00006000028a8400, out=0x00006000028b4140, ctype=General, s=0x00006000009a76d0) at copy.cpp:147:10
    frame #7: 0x0000000100b38ccc MLXPlayground`mlx::core::copy_gpu(in=0x00006000028a8400, out=0x00006000028b4140, ctype=General, s=0x00006000009a76d0) at copy.cpp:40:3
    frame #8: 0x0000000100b38dfc MLXPlayground`mlx::core::copy_gpu(in=0x00006000028a8400, out=0x00006000028b4140, ctype=General) at copy.cpp:44:3
    frame #9: 0x0000000100ba81ac MLXPlayground`mlx::core::Reshape::eval_gpu(this=0x00006000009a76c8, inputs=size=1, out=0x00006000028b4140) at primitives.cpp:823:5
    frame #10: 0x00000001004293a4 MLXPlayground`mlx::core::UnaryPrimitive::eval_gpu(this=0x00006000009a76c8, inputs=size=1, outputs=size=1) at primitives.h:145:5
    frame #11: 0x0000000100b88448 MLXPlayground`mlx::core::metal::make_task(mlx::core::array, bool)::$_0::operator()(this=0x0000600002ac4608) at metal.cpp:81:23
    frame #12: 0x0000000100b881a4 MLXPlayground`decltype(std::declval<mlx::core::metal::make_task(mlx::core::array, bool)::$_0&>()()) std::__1::__invoke[abi:de180100]<mlx::core::metal::make_task(mlx::core::array, bool)::$_0&>(__f=0x0000600002ac4608) at invoke.h:344:25
    frame #13: 0x0000000100b8815c MLXPlayground`void std::__1::__invoke_void_return_wrapper<void, true>::__call[abi:de180100]<mlx::core::metal::make_task(mlx::core::array, bool)::$_0&>(__args=0x0000600002ac4608) at invoke.h:419:5
    frame #14: 0x0000000100b88138 MLXPlayground`std::__1::__function::__alloc_func<mlx::core::metal::make_task(mlx::core::array, bool)::$_0, std::__1::allocator<mlx::core::metal::make_task(mlx::core::array, bool)::$_0>, void ()>::operator()[abi:de180100](this=0x0000600002ac4608) at function.h:169:12
    frame #15: 0x0000000100b86f80 MLXPlayground`std::__1::__function::__func<mlx::core::metal::make_task(mlx::core::array, bool)::$_0, std::__1::allocator<mlx::core::metal::make_task(mlx::core::array, bool)::$_0>, void ()>::operator()(this=0x0000600002ac4600) at function.h:311:10
    frame #16: 0x000000010046b6f0 MLXPlayground`std::__1::__function::__value_func<void ()>::operator()[abi:de180100](this=0x000000017002aee8) const at function.h:428:12
    frame #17: 0x000000010046af90 MLXPlayground`std::__1::function<void ()>::operator()(this=0x000000017002aee8) const at function.h:981:10
    frame #18: 0x0000000100d30828 MLXPlayground`mlx::core::scheduler::StreamThread::thread_fn(this=0x0000600001fa0000) at scheduler.h:54:7
    frame #19: 0x0000000100d30f80 MLXPlayground`decltype(*std::declval<mlx::core::scheduler::StreamThread*>().*std::declval<void (mlx::core::scheduler::StreamThread::*)()>()()) std::__1::__invoke[abi:de180100]<void (mlx::core::scheduler::StreamThread::*)(), mlx::core::scheduler::StreamThread*, void>(__f=0x0000600002ab7da8, __a0=0x0000600002ab7db8) at invoke.h:312:25
    frame #20: 0x0000000100d30ef0 MLXPlayground`void std::__1::__thread_execute[abi:de180100]<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void (mlx::core::scheduler::StreamThread::*)(), mlx::core::scheduler::StreamThread*, 2ul>(__t=size=3, (null)=__tuple_indices<2UL> @ 0x000000017002af7f) at thread.h:199:3
    frame #21: 0x0000000100d30b9c MLXPlayground`void* std::__1::__thread_proxy[abi:de180100]<std::__1::tuple<std::__1::unique_ptr
<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void (mlx::core::scheduler::StreamThread::*)(), mlx::core::scheduler::StreamThread*>>(__vp=0x0000600002ab7da0) at thread.h:208:3
    frame #22: 0x0000000103e3a9ac libsystem_pthread.dylib`_pthread_start + 136

Similar code in mlx python doesn't crash. When I do the same operation on the CPU, it doesn't crash.

I'm running this on Xcode 16 b1 with macOS Sequoia b1.

@rounak rounak changed the title C++ exception when reshaping after transposing C++ exception when reshaping after transposing on GPU Jun 24, 2024
@davidkoski
Copy link
Collaborator

I doesn't reproduce for me on macOS Sonoma 14.4:

array([[[0, 1, 2],
        [6, 7, 8],
        [3, 4, 5],
        [9, 10, 11]]], dtype=int32)

I will have to try it on Sequoia

@awni
Copy link
Member

awni commented Jun 24, 2024

Seems related to ml-explore/mlx-c#30. I think we should wait until updating MLX Swift and MLX C to the latest MLX then try this again. MLX Core had a few updates to get it working with OS 15 ml-explore/mlx#1208

@davidkoski
Copy link
Collaborator

#101 puts mlx-swift on the latest mlx -- can you give this a try again?

@DePasqualeOrg
Copy link
Contributor

DePasqualeOrg commented Jul 2, 2024

I'm still getting the same crash in Xcode 16 with error vector[] index out of bounds when calling generate using the latest version of mlx-swift. Before the crash, this warning is shown multiple times:

Warning: Compilation succeeded with: 

program_source:261:31: warning: unused variable 'MAX_REDUCE_SPECIALIZED_DIMS' [-Wunused-const-variable]
static constant constexpr int MAX_REDUCE_SPECIALIZED_DIMS = 4;
                              ^
program_source:262:31: warning: unused variable 'REDUCE_N_READS' [-Wunused-const-variable]
static constant constexpr int REDUCE_N_READS = 16;
                              ^
program_source:263:31: warning: unused variable 'SOFTMAX_N_READS' [-Wunused-const-variable]
static constant constexpr int SOFTMAX_N_READS = 4;
                              ^
program_source:264:31: warning: unused variable 'RMS_N_READS' [-Wunused-const-variable]
static constant constexpr int RMS_N_READS = 4;
                              ^
program_source:265:31: warning: unused variable 'RMS_LOOPED_LIMIT' [-Wunused-const-variable]
static constant constexpr int RMS_LOOPED_LIMIT = 4096;
                              ^

Xcode 15 doesn't crash, but now shows the warnings, which wasn't the case before.

@davidkoski
Copy link
Collaborator

Those are from the JIT compile and are not related. OK, so it is still failing this particular test on Sequoia (macOS 15)

@LiYanan2004
Copy link

Yeah. Still throw the error after updating to the latest version 0.15.2

@rounak
Copy link
Contributor Author

rounak commented Jul 3, 2024

I tried this on the main and the latest tag of mlx-swift, and still getting the same crash.

@awni
Copy link
Member

awni commented Jul 3, 2024

This might be related to ml-explore/mlx-examples#642

@davidkoski
Copy link
Collaborator

davidkoski commented Jul 3, 2024

OK, I can reproduce this on macOS 15 with Xcode 16. I find that it reproduces with Debug builds but not Release.

The problem is in:

collapse_contiguous_dims(

      out_strides[j].push_back(st[to_collapse[i - 1]]);
(lldb) p to_collapse[i - 1]
(std::vector<int>::value_type) 3

(lldb) p st
(const std::vector<unsigned long> &) size=3: {
  [0] = 12
  [1] = 3
  [2] = 1
}

The code executes the same way in Release but appears to silently pass when evaluating st[3]

It doesn't crash in Release because this macro in vector is empty:

  _LIBCPP_ASSERT_VALID_ELEMENT_ACCESS(__n < size(), "vector[] index out of bounds");

@davidkoski
Copy link
Collaborator

And this is turned on in Debug builds and appears to be new in macOS 15:

// Debug hardening mode checks.

#  elif _LIBCPP_HARDENING_MODE == _LIBCPP_HARDENING_MODE_DEBUG

Per https://libcxx.llvm.org/Hardening.html#notes-for-users

@derekelewis
Copy link

More info here in the Apple Xcode 16 C++ language support docs:

https://developer.apple.com/xcode/cpp/#library-hardening

@Paramstr
Copy link

Paramstr commented Jul 5, 2024

Getting same error when calling:

...
                let result = await MLXLLM.generate(
                    promptTokens: promptTokens, parameters: GenerateParameters(), model: model,
                    tokenizer: tokenizer, extraEOSTokens: modelConfiguration.extraEOSTokens
                ) { tokens in
                    let text = tokenizer.decode(tokens: tokens)
                
                    modelOutputTokens = tokens.count
                    
                    // update the output -- this will make the view show the text as it generates
                    if tokens.count % displayEveryNTokens == 0 {
                
                           
                        await MainActor.run {
                            self.output = text
                        }
                        
                        ..

_LIBCPP_ASSERT_VALID_ELEMENT_ACCESS(__n < size(), "vector[] index out of bounds");

Mac Version 15.0 Beta (24A5279h)
Version 16.0 beta 2 (16A5171r)

@davidkoski
Copy link
Collaborator

Right, there is no fix yet, but we have a better handle on what is going on and why it shows up in macOS 15

@davidkoski
Copy link
Collaborator

Note: this is merged in the mlx core side but not picked up in mlx-swift yet.

@davidkoski davidkoski reopened this Jul 8, 2024
@Paramstr
Copy link

Any idea when this will be in mlx-swift?

@davidkoski
Copy link
Collaborator

They just cut a release of the mlx core so I would need to integrate that. Hopefully next week.

You can avoid the assertion by building Release, though that doesn't actually avoid the underlying bug that the new assertions picked up.

@davidkoski
Copy link
Collaborator

This should be fixed once #115 merges

@davidkoski
Copy link
Collaborator

Merged #115, please try this out

@DePasqualeOrg
Copy link
Contributor

This fixes the crash for me.

@rounak
Copy link
Contributor Author

rounak commented Jul 16, 2024

It fixes for me too, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants