Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cuda error when running Llama3 and Llama3.1 #651

Closed
joshpopelka20 opened this issue Jul 30, 2024 · 13 comments
Closed

Cuda error when running Llama3 and Llama3.1 #651

joshpopelka20 opened this issue Jul 30, 2024 · 13 comments
Labels
bug Something isn't working

Comments

@joshpopelka20
Copy link
Contributor

Describe the bug

When running this command CUDA_NVCC_FLAGS="-fPIC" cargo run --release --features "cuda flash-attn cudnn" -- --token-source "literal:---" -n "0:8;1:8;2:8;3:8" -i plain -m meta-llama/Meta-Llama-3.1-8B-Instruct -a llama, I'm getting this error:

thread '<unnamed>' panicked at /home/ec2-user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cudarc-0.11.6/src/driver/safe/core.rs:252:76:
called `Result::unwrap()` on an `Err` value: DriverError(CUDA_ERROR_ILLEGAL_ADDRESS, "an illegal memory access was encountered")
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread '<unnamed>' panicked at /home/ec2-user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cudarc-0.11.6/src/driver/safe/core.rs:252:76:
called `Result::unwrap()` on an `Err` value: DriverError(CUDA_ERROR_ILLEGAL_ADDRESS, "an illegal memory access was encountered")
stack backtrace:
   0:     0x56239b9102d5 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h3692694645b1bb6a
   1:     0x56239b93e8eb - core::fmt::write::h5131d80b4c69b88d
   2:     0x56239b90bc6f - std::io::Write::write_fmt::h1fb327a7d8b0eb36
   3:     0x56239b9100ae - std::sys_common::backtrace::print::h998d75b840f75a73
   4:     0x56239b9115f9 - std::panicking::default_hook::{{closure}}::h18ec7fe6a38b9da0
   5:     0x56239b91139a - std::panicking::default_hook::hfb3f22c2e4075a6a
   6:     0x56239b911a93 - std::panicking::rust_panic_with_hook::h51af00bcb4660c4e
   7:     0x56239b911974 - std::panicking::begin_panic_handler::{{closure}}::h39f76aa863fbe8ce
   8:     0x56239b910799 - std::sys_common::backtrace::__rust_end_short_backtrace::h4d10fc2251b89840
   9:     0x56239b9116a7 - rust_begin_unwind
  10:     0x56239a5e8de3 - core::panicking::panic_fmt::h319840fcbcd912ef
  11:     0x56239a5e92d6 - core::result::unwrap_failed::haccb9aaa604e1e21
  12:     0x56239a95dca4 - <cudarc::driver::safe::core::CudaSlice<T> as core::ops::drop::Drop>::drop::hd684dcc6e2bc9210
  13:     0x56239aa97dfd - alloc::sync::Arc<T,A>::drop_slow::h025961952ec363c7
  14:     0x56239aa9b6d0 - alloc::sync::Arc<T,A>::drop_slow::hd44c7568258a4933
  15:     0x56239aaa69b6 - mistralrs_core::models::llama::Llama::forward::h8c73e5649cf95cbb
  16:     0x56239aaa7aa9 - <mistralrs_core::models::llama::Llama as mistralrs_core::pipeline::NormalModel>::forward::h6ff0cde06b1817e6
  17:     0x56239a7e69c0 - <mistralrs_core::pipeline::normal::NormalPipeline as mistralrs_core::pipeline::Pipeline>::forward_inputs::h7c8eac0dd705ae60
  18:     0x56239a7e8e35 - mistralrs_core::pipeline::Pipeline::step::{{closure}}::h17e8da9387ab1472
  19:     0x56239a805578 - mistralrs_core::engine::Engine::run::{{closure}}::hf8ac117b8c902b00
  20:     0x56239a802f51 - tokio::runtime::park::CachedParkThread::block_on::h500f0459029570e4
  21:     0x56239aafd883 - tokio::runtime::context::runtime::enter_runtime::h663b1b593a494e73
  22:     0x56239a7ecbc8 - std::sys_common::backtrace::__rust_begin_short_backtrace::h428212e440861a67
  23:     0x56239a7ef43d - core::ops::function::FnOnce::call_once{{vtable.shim}}::hc9ec55c69944d520
  24:     0x56239b91722b - std::sys::pal::unix::thread::Thread::new::thread_start::h3b8e81128811868f
  25:     0x7fc0adb0344b - start_thread
  26:     0x7fc0ad4fe52f - __clone
  27:                0x0 - <unknown>
thread '<unnamed>' panicked at library/core/src/panicking.rs:227:5:
panic in a destructor during cleanup
thread caused non-unwinding panic. aborting.
Aborted

I ran it for both llama3 and 3.1. Got the same error for both.

Latest commit or version

77d6bf9

@joshpopelka20 joshpopelka20 added the bug Something isn't working label Jul 30, 2024
@EricLBuehler
Copy link
Owner

Hi @joshpopelka20 can you please run with CUDA_LAUNCH_BLOCKING=1, too?

@joshpopelka20
Copy link
Contributor Author

Adding that didn't give me any additional output, but RUST_BACKTRACE=1 did:

thread '<unnamed>' panicked at /home/ec2-user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cudarc-0.11.6/src/driver/safe/core.rs:252:76:
called `Result::unwrap()` on an `Err` value: DriverError(CUDA_ERROR_ILLEGAL_ADDRESS, "an illegal memory access was encountered")
stack backtrace:
   0: rust_begin_unwind
   1: core::panicking::panic_fmt
   2: core::result::unwrap_failed
   3: <cudarc::driver::safe::core::CudaSlice<T> as core::ops::drop::Drop>::drop
   4: alloc::sync::Arc<T,A>::drop_slow
   5: alloc::sync::Arc<T,A>::drop_slow
   6: <mistralrs_core::models::llama::Mlp as mistralrs_core::amoe::MlpLayer>::forward
   7: mistralrs_core::models::llama::Llama::forward
   8: <mistralrs_core::models::llama::Llama as mistralrs_core::pipeline::NormalModel>::forward
   9: <mistralrs_core::pipeline::normal::NormalPipeline as mistralrs_core::pipeline::Pipeline>::forward_inputs
  10: mistralrs_core::pipeline::Pipeline::step::{{closure}}
  11: mistralrs_core::engine::Engine::run::{{closure}}
  12: tokio::runtime::park::CachedParkThread::block_on
  13: tokio::runtime::context::runtime::enter_runtime
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
thread '<unnamed>' panicked at /home/ec2-user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cudarc-0.11.6/src/driver/safe/core.rs:252:76:
called `Result::unwrap()` on an `Err` value: DriverError(CUDA_ERROR_ILLEGAL_ADDRESS, "an illegal memory access was encountered")
stack backtrace:
   0:     0x55c6dd9102d5 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h3692694645b1bb6a
   1:     0x55c6dd93e8eb - core::fmt::write::h5131d80b4c69b88d
   2:     0x55c6dd90bc6f - std::io::Write::write_fmt::h1fb327a7d8b0eb36
   3:     0x55c6dd9100ae - std::sys_common::backtrace::print::h998d75b840f75a73
   4:     0x55c6dd9115f9 - std::panicking::default_hook::{{closure}}::h18ec7fe6a38b9da0
   5:     0x55c6dd91139a - std::panicking::default_hook::hfb3f22c2e4075a6a
   6:     0x55c6dd911a93 - std::panicking::rust_panic_with_hook::h51af00bcb4660c4e
   7:     0x55c6dd911974 - std::panicking::begin_panic_handler::{{closure}}::h39f76aa863fbe8ce
   8:     0x55c6dd910799 - std::sys_common::backtrace::__rust_end_short_backtrace::h4d10fc2251b89840
   9:     0x55c6dd9116a7 - rust_begin_unwind
  10:     0x55c6dc5e8de3 - core::panicking::panic_fmt::h319840fcbcd912ef
  11:     0x55c6dc5e92d6 - core::result::unwrap_failed::haccb9aaa604e1e21
  12:     0x55c6dc95dca4 - <cudarc::driver::safe::core::CudaSlice<T> as core::ops::drop::Drop>::drop::hd684dcc6e2bc9210
  13:     0x55c6dca97dfd - alloc::sync::Arc<T,A>::drop_slow::h025961952ec363c7
  14:     0x55c6dca9b6d0 - alloc::sync::Arc<T,A>::drop_slow::hd44c7568258a4933
  15:     0x55c6dcaa69b6 - mistralrs_core::models::llama::Llama::forward::h8c73e5649cf95cbb
  16:     0x55c6dcaa7aa9 - <mistralrs_core::models::llama::Llama as mistralrs_core::pipeline::NormalModel>::forward::h6ff0cde06b1817e6
  17:     0x55c6dc7e69c0 - <mistralrs_core::pipeline::normal::NormalPipeline as mistralrs_core::pipeline::Pipeline>::forward_inputs::h7c8eac0dd705ae60
  18:     0x55c6dc7e8e35 - mistralrs_core::pipeline::Pipeline::step::{{closure}}::h17e8da9387ab1472
  19:     0x55c6dc805578 - mistralrs_core::engine::Engine::run::{{closure}}::hf8ac117b8c902b00
  20:     0x55c6dc802f51 - tokio::runtime::park::CachedParkThread::block_on::h500f0459029570e4
  21:     0x55c6dcafd883 - tokio::runtime::context::runtime::enter_runtime::h663b1b593a494e73
  22:     0x55c6dc7ecbc8 - std::sys_common::backtrace::__rust_begin_short_backtrace::h428212e440861a67
  23:     0x55c6dc7ef43d - core::ops::function::FnOnce::call_once{{vtable.shim}}::hc9ec55c69944d520
  24:     0x55c6dd91722b - std::sys::pal::unix::thread::Thread::new::thread_start::h3b8e81128811868f
  25:     0x7fa6eacbc44b - start_thread
  26:     0x7fa6ea6b752f - __clone
  27:                0x0 - <unknown>
thread '<unnamed>' panicked at library/core/src/panicking.rs:227:5:
panic in a destructor during cleanup
thread caused non-unwinding panic. aborting.
Aborted

Also, this error occurs during inference

@EricLBuehler
Copy link
Owner

Ah ok, can you run with RUST_BACKTRACE=1 and CUDA_LAUNCH_BLOCKING=1? This will help me see where the issue occurs.

@joshpopelka20
Copy link
Contributor Author

thread '<unnamed>' panicked at /home/ec2-user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cudarc-0.11.6/src/driver/safe/core.rs:252:76:
called `Result::unwrap()` on an `Err` value: DriverError(CUDA_ERROR_ILLEGAL_ADDRESS, "an illegal memory access was encountered")
stack backtrace:
   0: rust_begin_unwind
   1: core::panicking::panic_fmt
   2: core::result::unwrap_failed
   3: <cudarc::driver::safe::core::CudaSlice<T> as core::ops::drop::Drop>::drop
   4: alloc::sync::Arc<T,A>::drop_slow
   5: alloc::sync::Arc<T,A>::drop_slow
   6: mistralrs_core::paged_attention::layers::paged_attention::PagedAttention::forward
   7: mistralrs_core::models::llama::Llama::forward
   8: <mistralrs_core::models::llama::Llama as mistralrs_core::pipeline::NormalModel>::forward
   9: <mistralrs_core::pipeline::normal::NormalPipeline as mistralrs_core::pipeline::Pipeline>::forward_inputs
  10: mistralrs_core::pipeline::Pipeline::step::{{closure}}
  11: mistralrs_core::engine::Engine::run::{{closure}}
  12: tokio::runtime::park::CachedParkThread::block_on
  13: tokio::runtime::context::runtime::enter_runtime
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
thread '<unnamed>' panicked at /home/ec2-user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cudarc-0.11.6/src/driver/safe/core.rs:252:76:
called `Result::unwrap()` on an `Err` value: DriverError(CUDA_ERROR_ILLEGAL_ADDRESS, "an illegal memory access was encountered")
stack backtrace:
   0:     0x55a20a5102d5 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h3692694645b1bb6a
   1:     0x55a20a53e8eb - core::fmt::write::h5131d80b4c69b88d
   2:     0x55a20a50bc6f - std::io::Write::write_fmt::h1fb327a7d8b0eb36
   3:     0x55a20a5100ae - std::sys_common::backtrace::print::h998d75b840f75a73
   4:     0x55a20a5115f9 - std::panicking::default_hook::{{closure}}::h18ec7fe6a38b9da0
   5:     0x55a20a51139a - std::panicking::default_hook::hfb3f22c2e4075a6a
   6:     0x55a20a511a93 - std::panicking::rust_panic_with_hook::h51af00bcb4660c4e
   7:     0x55a20a511974 - std::panicking::begin_panic_handler::{{closure}}::h39f76aa863fbe8ce
   8:     0x55a20a510799 - std::sys_common::backtrace::__rust_end_short_backtrace::h4d10fc2251b89840
   9:     0x55a20a5116a7 - rust_begin_unwind
  10:     0x55a2091e8de3 - core::panicking::panic_fmt::h319840fcbcd912ef
  11:     0x55a2091e92d6 - core::result::unwrap_failed::haccb9aaa604e1e21
  12:     0x55a20955dca4 - <cudarc::driver::safe::core::CudaSlice<T> as core::ops::drop::Drop>::drop::hd684dcc6e2bc9210
  13:     0x55a209697dfd - alloc::sync::Arc<T,A>::drop_slow::h025961952ec363c7
  14:     0x55a20969b6d0 - alloc::sync::Arc<T,A>::drop_slow::hd44c7568258a4933
  15:     0x55a2098829b6 - mistralrs_core::paged_attention::layers::paged_attention::PagedAttention::forward::h85b89ee8dd38a409
  16:     0x55a2096a4ee6 - mistralrs_core::models::llama::Llama::forward::h8c73e5649cf95cbb
  17:     0x55a2096a7aa9 - <mistralrs_core::models::llama::Llama as mistralrs_core::pipeline::NormalModel>::forward::h6ff0cde06b1817e6
  18:     0x55a2093e69c0 - <mistralrs_core::pipeline::normal::NormalPipeline as mistralrs_core::pipeline::Pipeline>::forward_inputs::h7c8eac0dd705ae60
  19:     0x55a2093e8e35 - mistralrs_core::pipeline::Pipeline::step::{{closure}}::h17e8da9387ab1472
  20:     0x55a209405578 - mistralrs_core::engine::Engine::run::{{closure}}::hf8ac117b8c902b00
  21:     0x55a209402f51 - tokio::runtime::park::CachedParkThread::block_on::h500f0459029570e4
  22:     0x55a2096fd883 - tokio::runtime::context::runtime::enter_runtime::h663b1b593a494e73
  23:     0x55a2093ecbc8 - std::sys_common::backtrace::__rust_begin_short_backtrace::h428212e440861a67
  24:     0x55a2093ef43d - core::ops::function::FnOnce::call_once{{vtable.shim}}::hc9ec55c69944d520
  25:     0x55a20a51722b - std::sys::pal::unix::thread::Thread::new::thread_start::h3b8e81128811868f
  26:     0x7fe0f2aac44b - start_thread
  27:     0x7fe0f24a752f - __clone
  28:                0x0 - <unknown>
thread '<unnamed>' panicked at library/core/src/panicking.rs:227:5:
panic in a destructor during cleanup
thread caused non-unwinding panic. aborting.
Aborted

@EricLBuehler
Copy link
Owner

Ok thanks!

So, you are just starting a chat interaction when this occurs?

@joshpopelka20
Copy link
Contributor Author

Exactly. It's my first prompt to test it.

@joshpopelka20
Copy link
Contributor Author

joshpopelka20 commented Jul 30, 2024

I'm still trying to narrow down the exact line of code where it's failing, but one finding of note is that it fails on the 9th block (block_idx 8):
image

In mistralrs-core/src/models/llama.rs:

 for (block_idx, block) in self.blocks.iter().enumerate() {
            x = self.mapper.map(x, block_idx)?;

@joshpopelka20
Copy link
Contributor Author

That 9th block is the first layer on the 2nd GPU. It happens when I put fewer layers on the 1st GPU. So, for whatever reason, the paged attention on the 2nd GPU is causing a problem.

After fixing the NVCC flag bug last week, the code worked fine. It must be a recent change that is causing this issue.

Also, it's something in the paged_attention because when I set --no-paged-attn, I don't get the error.

@EricLBuehler
Copy link
Owner

@joshpopelka20 this should be fixed in #656!

@joshpopelka20
Copy link
Contributor Author

Still erroring though it's different this time:

2024-07-31T18:10:43.924163Z  INFO mistralrs_core::utils::normal: Detected minimum CUDA compute capability 8.6
2024-07-31T18:10:43.965820Z  INFO mistralrs_core::utils::normal: DType selected is BF16.
2024-07-31T18:10:54.188274Z  INFO mistralrs_core::pipeline::chat_template: bos_toks = "<|begin_of_text|>", eos_toks = "<|eot_id|>", "<|end_of_text|>", "<|eom_id|>", unk_tok = `None`
2024-07-31T18:10:54.219946Z  INFO mistralrs_server: Model loaded.
**thread 'main' panicked at mistralrs-server/src/main.rs:433:18:
called `Option::unwrap()` on a `None` value**
stack backtrace:
   0:     0x5612d0e01d18 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h3692694645b1bb6a
   1:     0x5612d0382cfb - core::fmt::write::h5131d80b4c69b88d
   2:     0x5612d0dca35e - std::io::Write::write_fmt::h1fb327a7d8b0eb36
   3:     0x5612d0e03599 - std::sys_common::backtrace::print::h998d75b840f75a73
   4:     0x5612d0e02ea9 - std::panicking::default_hook::{{closure}}::h18ec7fe6a38b9da0
   5:     0x5612d0e0416a - std::panicking::rust_panic_with_hook::h51af00bcb4660c4e
   6:     0x5612d0e038ea - std::panicking::begin_panic_handler::{{closure}}::h39f76aa863fbe8ce
   7:     0x5612d0e03879 - std::sys_common::backtrace::__rust_end_short_backtrace::h4d10fc2251b89840
   8:     0x5612d0e03866 - rust_begin_unwind
   9:     0x5612cfd2b232 - core::panicking::panic_fmt::h319840fcbcd912ef
  10:     0x5612cfd2b3c4 - core::panicking::panic::h19def44c80243eda
  11:     0x5612cfd2b7a8 - core::option::unwrap_failed::h9b45086d3ec3e03c
  12:     0x5612cfeda420 - mistralrs_server::main::{{closure}}::h6adfb1fb78520af2
  13:     0x5612cfed178b - mistralrs_server::main::h03b1eb3eb84ed66e
  14:     0x5612cfdea0b3 - std::sys_common::backtrace::__rust_begin_short_backtrace::h07d80d93e80867eb
  15:     0x5612cfdeaa1d - std::rt::lang_start::{{closure}}::h07cff93bf168bbae
  16:     0x5612d0dca120 - std::rt::lang_start_internal::h63a185b0ddd212e9
  17:     0x5612cfedd41a - main
  18:     0x7f492186013a - __libc_start_main
  19:     0x5612cfd98a9a - _start
  20:                0x0 - <unknown>

@EricLBuehler
Copy link
Owner

Oops I violated an invariant! Should be fixed now in #658.

@joshpopelka20
Copy link
Contributor Author

It's no problem. It works now. I'll close the issue. Thanks!

@ShelbyJenkins
Copy link

Same

thread '<unnamed>' panicked at /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cudarc-0.12.1/src/driver/safe/core.rs:252:76:
called `Result::unwrap()` on an `Err` value: DriverError(CUDA_ERROR_ILLEGAL_ADDRESS, "an illegal memory access was encountered")
stack backtrace:
   0: rust_begin_unwind
             at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/std/src/panicking.rs:645:5
   1: core::panicking::panic_fmt
             at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/core/src/panicking.rs:72:14
   2: core::result::unwrap_failed
             at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/core/src/result.rs:1654:5
   3: core::result::Result<T,E>::unwrap
             at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/core/src/result.rs:1077:23
   4: <cudarc::driver::safe::core::CudaSlice<T> as core::ops::drop::Drop>::drop
             at /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cudarc-0.12.1/src/driver/safe/core.rs:252:17
   5: core::ptr::drop_in_place<cudarc::driver::safe::core::CudaSlice<f32>>
             at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/core/src/ptr/mod.rs:514:1
   6: core::ptr::drop_in_place<candle_core::cuda_backend::CudaStorageSlice>
             at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/core/src/ptr/mod.rs:514:1
   7: core::ptr::drop_in_place<candle_core::cuda_backend::CudaStorage>
             at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/core/src/ptr/mod.rs:514:1
   8: core::ptr::drop_in_place<candle_core::storage::Storage>
             at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/core/src/ptr/mod.rs:514:1
   9: core::ptr::drop_in_place<core::cell::UnsafeCell<candle_core::storage::Storage>>
             at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/core/src/ptr/mod.rs:514:1
  10: core::ptr::drop_in_place<std::sync::rwlock::RwLock<candle_core::storage::Storage>>
             at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/core/src/ptr/mod.rs:514:1
  11: alloc::sync::Arc<T,A>::drop_slow
             at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/alloc/src/sync.rs:1804:18
  12: <alloc::sync::Arc<T,A> as core::ops::drop::Drop>::drop
             at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/alloc/src/sync.rs:2462:13
  13: core::ptr::drop_in_place<alloc::sync::Arc<std::sync::rwlock::RwLock<candle_core::storage::Storage>>>
             at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/core/src/ptr/mod.rs:514:1
  14: core::ptr::drop_in_place<candle_core::tensor::Tensor_>
             at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/core/src/ptr/mod.rs:514:1
  15: alloc::sync::Arc<T,A>::drop_slow
             at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/alloc/src/sync.rs:1804:18
  16: <alloc::sync::Arc<T,A> as core::ops::drop::Drop>::drop
             at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/alloc/src/sync.rs:2462:13
  17: core::ptr::drop_in_place<alloc::sync::Arc<candle_core::tensor::Tensor_>>
             at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/core/src/ptr/mod.rs:514:1
  18: core::ptr::drop_in_place<candle_core::tensor::Tensor>
             at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/core/src/ptr/mod.rs:514:1
  19: mistralrs_core::models::quantized_llama::LayerWeights::forward_attn
             at /root/.cargo/git/checkouts/mistral.rs-0a2607fe9768eac5/a702c6d/mistralrs-core/src/models/quantized_llama.rs:213:5
  20: mistralrs_core::models::quantized_llama::ModelWeights::forward
             at /root/.cargo/git/checkouts/mistral.rs-0a2607fe9768eac5/a702c6d/mistralrs-core/src/models/quantized_llama.rs:657:24
  21: <mistralrs_core::pipeline::gguf::GGUFPipeline as mistralrs_core::pipeline::Pipeline>::forward_inputs
             at /root/.cargo/git/checkouts/mistral.rs-0a2607fe9768eac5/a702c6d/mistralrs-core/src/pipeline/gguf.rs:664:40
  22: mistralrs_core::pipeline::Pipeline::step::{{closure}}
             at /root/.cargo/git/checkouts/mistral.rs-0a2607fe9768eac5/a702c6d/mistralrs-core/src/pipeline/mod.rs:327:38
  23: <core::pin::Pin<P> as core::future::future::Future>::poll
             at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/core/src/future/future.rs:123:9
  24: mistralrs_core::engine::Engine::run::{{closure}}
             at /root/.cargo/git/checkouts/mistral.rs-0a2607fe9768eac5/a702c6d/mistralrs-core/src/engine/mod.rs:234:34
  25: mistralrs_core::MistralRs::new::{{closure}}::{{closure}}
             at /root/.cargo/git/checkouts/mistral.rs-0a2607fe9768eac5/a702c6d/mistralrs-core/src/lib.rs:332:30
  26: <core::pin::Pin<P> as core::future::future::Future>::poll
             at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/core/src/future/future.rs:123:9
  27: tokio::runtime::park::CachedParkThread::block_on::{{closure}}
             at /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/park.rs:281:63
  28: tokio::runtime::coop::with_budget
             at /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/coop.rs:107:5
  29: tokio::runtime::coop::budget
             at /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/coop.rs:73:5
  30: tokio::runtime::park::CachedParkThread::block_on
             at /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/park.rs:281:31
  31: tokio::runtime::context::blocking::BlockingRegionGuard::block_on
             at /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/context/blocking.rs:66:9
  32: tokio::runtime::scheduler::multi_thread::MultiThread::block_on::{{closure}}
             at /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/scheduler/multi_thread/mod.rs:87:13
  33: tokio::runtime::context::runtime::enter_runtime
             at /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/context/runtime.rs:65:16
  34: tokio::runtime::scheduler::multi_thread::MultiThread::block_on
             at /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/scheduler/multi_thread/mod.rs:86:9
  35: tokio::runtime::runtime::Runtime::block_on_inner
             at /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/runtime.rs:363:45
  36: tokio::runtime::runtime::Runtime::block_on
             at /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/runtime.rs:333:13
  37: mistralrs_core::MistralRs::new::{{closure}}
             at /root/.cargo/git/checkouts/mistral.rs-0a2607fe9768eac5/a702c6d/mistralrs-core/src/lib.rs:320:13
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
thread '<unnamed>' panicked at /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cudarc-0.12.1/src/driver/safe/core.rs:252:76:
called `Result::unwrap()` on an `Err` value: DriverError(CUDA_ERROR_ILLEGAL_ADDRESS, "an illegal memory access was encountered")
stack backtrace:
   0:     0x61ce90cceff5 - std::backtrace_rs::backtrace::libunwind::trace::hc79cced6f418596d
                               at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/std/src/../../backtrace/src/backtrace/libunwind.rs:105:5
   1:     0x61ce90cceff5 - std::backtrace_rs::backtrace::trace_unsynchronized::h06f3eef6c8a22cf0
                               at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:     0x61ce90cceff5 - std::sys_common::backtrace::_print_fmt::hba273d0c77fc3421
                               at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/std/src/sys_common/backtrace.rs:68:5
   3:     0x61ce90cceff5 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h409f1e3c1e32650e
                               at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/std/src/sys_common/backtrace.rs:44:22
   4:     0x61ce90cfdecb - core::fmt::rt::Argument::fmt::h8811fe3c91cda7b3
                               at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/core/src/fmt/rt.rs:142:9
   5:     0x61ce90cfdecb - core::fmt::write::h7a8f70a9b146d9ee
                               at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/core/src/fmt/mod.rs:1153:17
   6:     0x61ce90ccaebf - std::io::Write::write_fmt::hc57d86a7c88c29ef
                               at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/std/src/io/mod.rs:1843:15
   7:     0x61ce90ccedce - std::sys_common::backtrace::_print::h0dc0bbf9b429a58b
                               at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/std/src/sys_common/backtrace.rs:47:5
   8:     0x61ce90ccedce - std::sys_common::backtrace::print::hf60182bd4aee207d
                               at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/std/src/sys_common/backtrace.rs:34:9
   9:     0x61ce90cd08d9 - std::panicking::default_hook::{{closure}}::hd90db44a41f772dc
  10:     0x61ce90cd0579 - std::panicking::default_hook::hd86be16b87521210
                               at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/std/src/panicking.rs:288:9
  11:     0x61ce8ddbd74a - <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call::h0f4e2b1213798605
                               at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/alloc/src/boxed.rs:2032:9
  12:     0x61ce8ddbd74a - test::test_main::{{closure}}::hec81cefc5baa15e2
                               at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/test/src/lib.rs:138:21
  13:     0x61ce90cd0eac - <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call::h2fe2a6e53d9884ad
                               at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/alloc/src/boxed.rs:2032:9
  14:     0x61ce90cd0eac - std::panicking::rust_panic_with_hook::ha4f8caa112a16574
                               at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/std/src/panicking.rs:792:13
  15:     0x61ce90cd0c56 - std::panicking::begin_panic_handler::{{closure}}::hc879855deab44ed0
                               at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/std/src/panicking.rs:657:13
  16:     0x61ce90ccf4b9 - std::sys_common::backtrace::__rust_end_short_backtrace::h85e59f289fdfff6c
                               at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/std/src/sys_common/backtrace.rs:171:18
  17:     0x61ce90cd0987 - rust_begin_unwind
                               at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/std/src/panicking.rs:645:5
  18:     0x61ce8dc6f766 - core::panicking::panic_fmt::h0baef2c59e253f8d
                               at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/core/src/panicking.rs:72:14
  19:     0x61ce8dc6fcf6 - core::result::unwrap_failed::ha3431373f2eea71f
                               at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/core/src/result.rs:1654:5
  20:     0x61ce8edbe3fa - core::result::Result<T,E>::unwrap::h0fec05548d92e9c5
                               at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/core/src/result.rs:1077:23
  21:     0x61ce8edbe3fa - <cudarc::driver::safe::core::CudaSlice<T> as core::ops::drop::Drop>::drop::he17e9948e4e2d725
                               at /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cudarc-0.12.1/src/driver/safe/core.rs:252:17
  22:     0x61ce8ebf6fe7 - core::ptr::drop_in_place<cudarc::driver::safe::core::CudaSlice<f32>>::h266239d2bab626e3
                               at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/core/src/ptr/mod.rs:514:1
  23:     0x61ce8ebf63de - core::ptr::drop_in_place<candle_core::cuda_backend::CudaStorageSlice>::hca12f5525f1761be
                               at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/core/src/ptr/mod.rs:514:1
  24:     0x61ce8ebf5ad7 - core::ptr::drop_in_place<candle_core::cuda_backend::CudaStorage>::hc8a04298e2362e61
                               at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/core/src/ptr/mod.rs:514:1
  25:     0x61ce8ebf406c - core::ptr::drop_in_place<candle_core::storage::Storage>::h9a86953112c3957c
                               at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/core/src/ptr/mod.rs:514:1
  26:     0x61ce8ebf855b - core::ptr::drop_in_place<core::cell::UnsafeCell<candle_core::storage::Storage>>::hf62c89b6f9218f5e
                               at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/core/src/ptr/mod.rs:514:1
  27:     0x61ce8ebf8e7f - core::ptr::drop_in_place<std::sync::rwlock::RwLock<candle_core::storage::Storage>>::h390c39dc983e637a
                               at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/core/src/ptr/mod.rs:514:1
  28:     0x61ce8ec7d85f - alloc::sync::Arc<T,A>::drop_slow::h8d79124923f6a52a
                               at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/alloc/src/sync.rs:1804:18
  29:     0x61ce8ec9b302 - <alloc::sync::Arc<T,A> as core::ops::drop::Drop>::drop::h4554c7a45f9a01c6
                               at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/alloc/src/sync.rs:2462:13
  30:     0x61ce8ebec02b - core::ptr::drop_in_place<alloc::sync::Arc<std::sync::rwlock::RwLock<candle_core::storage::Storage>>>::h4b9b02fc6af06a2b
                               at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/core/src/ptr/mod.rs:514:1
  31:     0x61ce8ebf3e5e - core::ptr::drop_in_place<candle_core::tensor::Tensor_>::h969af278012e4074
                               at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/core/src/ptr/mod.rs:514:1
  32:     0x61ce8ec7d8ff - alloc::sync::Arc<T,A>::drop_slow::hc6742b7c6f00a1f3
                               at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/alloc/src/sync.rs:1804:18
  33:     0x61ce8ec9b282 - <alloc::sync::Arc<T,A> as core::ops::drop::Drop>::drop::h225f1d395d75af8e
                               at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/alloc/src/sync.rs:2462:13
  34:     0x61ce8ebf7b7b - core::ptr::drop_in_place<alloc::sync::Arc<candle_core::tensor::Tensor_>>::h35190ac780d405bd
                               at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/core/src/ptr/mod.rs:514:1
  35:     0x61ce8ebf37eb - core::ptr::drop_in_place<candle_core::tensor::Tensor>::h9db9f54123674e57
                               at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/core/src/ptr/mod.rs:514:1
  36:     0x61ce8dff8ceb - mistralrs_core::models::quantized_llama::LayerWeights::forward_attn::h2c8f2cb2120c3544
                               at /root/.cargo/git/checkouts/mistral.rs-0a2607fe9768eac5/a702c6d/mistralrs-core/src/models/quantized_llama.rs:213:5
  37:     0x61ce8e007a22 - mistralrs_core::models::quantized_llama::ModelWeights::forward::h926345e9b57a5875
                               at /root/.cargo/git/checkouts/mistral.rs-0a2607fe9768eac5/a702c6d/mistralrs-core/src/models/quantized_llama.rs:657:24
  38:     0x61ce8e2329b5 - <mistralrs_core::pipeline::gguf::GGUFPipeline as mistralrs_core::pipeline::Pipeline>::forward_inputs::h243c075b9689b2e3
                               at /root/.cargo/git/checkouts/mistral.rs-0a2607fe9768eac5/a702c6d/mistralrs-core/src/pipeline/gguf.rs:664:40
  39:     0x61ce8e0d25e0 - mistralrs_core::pipeline::Pipeline::step::{{closure}}::hd84f65b1125e8bcf
                               at /root/.cargo/git/checkouts/mistral.rs-0a2607fe9768eac5/a702c6d/mistralrs-core/src/pipeline/mod.rs:327:38
  40:     0x61ce8df6fc04 - <core::pin::Pin<P> as core::future::future::Future>::poll::h9130a87a00b49342
                               at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/core/src/future/future.rs:123:9
  41:     0x61ce8e0fb837 - mistralrs_core::engine::Engine::run::{{closure}}::hfe4716020979fa45
                               at /root/.cargo/git/checkouts/mistral.rs-0a2607fe9768eac5/a702c6d/mistralrs-core/src/engine/mod.rs:234:34
  42:     0x61ce8e261989 - mistralrs_core::MistralRs::new::{{closure}}::{{closure}}::h00d8b6d8f22531e1
                               at /root/.cargo/git/checkouts/mistral.rs-0a2607fe9768eac5/a702c6d/mistralrs-core/src/lib.rs:332:30
  43:     0x61ce8df6fa97 - <core::pin::Pin<P> as core::future::future::Future>::poll::h5d182315eb8fc5ad
                               at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/core/src/future/future.rs:123:9
  44:     0x61ce8e191146 - tokio::runtime::park::CachedParkThread::block_on::{{closure}}::h41751e0de0933807
                               at /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/park.rs:281:63
  45:     0x61ce8e190a7b - tokio::runtime::coop::with_budget::h48a841bb411d59bc
                               at /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/coop.rs:107:5
  46:     0x61ce8e190a7b - tokio::runtime::coop::budget::hc5aa5ffdea92f45b
                               at /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/coop.rs:73:5
  47:     0x61ce8e190a7b - tokio::runtime::park::CachedParkThread::block_on::hc7505029a4e7c65f
                               at /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/park.rs:281:31
  48:     0x61ce8e02c534 - tokio::runtime::context::blocking::BlockingRegionGuard::block_on::hc8996903afa00fb9
                               at /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/context/blocking.rs:66:9
  49:     0x61ce8e14c18f - tokio::runtime::scheduler::multi_thread::MultiThread::block_on::{{closure}}::hed9b49b2adfd1ed5
                               at /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/scheduler/multi_thread/mod.rs:87:13
  50:     0x61ce8dfa1dd3 - tokio::runtime::context::runtime::enter_runtime::h8241746f6a1640be
                               at /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/context/runtime.rs:65:16
  51:     0x61ce8e14c01a - tokio::runtime::scheduler::multi_thread::MultiThread::block_on::hd77c8424eaba7afe
                               at /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/scheduler/multi_thread/mod.rs:86:9
  52:     0x61ce8e0e030a - tokio::runtime::runtime::Runtime::block_on_inner::h60341cd647cf4b48
                               at /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/runtime.rs:363:45
  53:     0x61ce8e0e091b - tokio::runtime::runtime::Runtime::block_on::he30078dafd5026ae
                               at /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/runtime.rs:333:13
  54:     0x61ce8e26167d - mistralrs_core::MistralRs::new::{{closure}}::hed7f0cfc4955b5ad
                               at /root/.cargo/git/checkouts/mistral.rs-0a2607fe9768eac5/a702c6d/mistralrs-core/src/lib.rs:320:13
  55:     0x61ce8e1f6fb6 - std::sys_common::backtrace::__rust_begin_short_backtrace::h2587321759660118
                               at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/std/src/sys_common/backtrace.rs:155:18
  56:     0x61ce8df62eb1 - std::thread::Builder::spawn_unchecked_::{{closure}}::{{closure}}::hceae8b5bc8aedaff
                               at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/std/src/thread/mod.rs:523:17
  57:     0x61ce8e1cce41 - <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::h70f0fff85dfd44ed
                               at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/core/src/panic/unwind_safe.rs:272:9
  58:     0x61ce8e185ad1 - std::panicking::try::do_call::h671fb002c7e08351
                               at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/std/src/panicking.rs:552:40
  59:     0x61ce8e1cc3db - __rust_try
  60:     0x61ce8e184f02 - std::panicking::try::h4a59a8198e7f3a4d
                               at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/std/src/panicking.rs:516:19
  61:     0x61ce8df627d5 - std::panic::catch_unwind::hbe987f130fbd4512
                               at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/std/src/panic.rs:149:14
  62:     0x61ce8df627d5 - std::thread::Builder::spawn_unchecked_::{{closure}}::h90ae6a7629d754ea
                               at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/std/src/thread/mod.rs:522:30
  63:     0x61ce8df05a4f - core::ops::function::FnOnce::call_once{{vtable.shim}}::h3f854c64849c60cc
                               at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/core/src/ops/function.rs:250:5
  64:     0x61ce90cd624b - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h5cf039e566d31df2
                               at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/alloc/src/boxed.rs:2018:9
  65:     0x61ce90cd624b - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h5b8a7e7667fbf80b
                               at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/alloc/src/boxed.rs:2018:9
  66:     0x61ce90cd624b - std::sys::pal::unix::thread::Thread::new::thread_start::h47ad6cb551091e6a
                               at /rustc/9d5cdf75aa42faaf0b58ba21a510117e8d0051a3/library/std/src/sys/pal/unix/thread.rs:108:17
  67:     0x7d57f3c70ac3 - <unknown>
  68:     0x7d57f3d01a04 - __clone
  69:                0x0 - <unknown>
thread '<unnamed>' panicked at library/core/src/panicking.rs:223:5:
panic in a destructor during cleanup
thread caused non-unwinding panic. aborting.

Tried the latest commit from today and using Llama3_1_8bInstruct.

mistralrs={git="https://github.com/EricLBuehler/mistral.rs.git", features=["cuda", "cudnn"], optional=true, rev="a702c6dd2944aaf75800b11f4dfeec6fe5a9b068"}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants