Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mistral-Nemo-Instruct-2407 Q8_0 GGUF: Model failed with error: ShapeMismatchBinaryOp { lhs: [1, 26, 4096], rhs: [26, 32, 160], op: "reshape" } #643

Closed
Remember20240719 opened this issue Jul 28, 2024 · 4 comments
Labels
bug Something isn't working models Additions to model or architectures urgent

Comments

@Remember20240719
Copy link

Remember20240719 commented Jul 28, 2024

Describe the bug

Hello, I found in PR #595 that Mistral Nemo Instruct 2407 was supported. It is working really well (using ISQ on HF safetensors).

Are GGUF supported too?

Using the Q8_0 from https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF/tree/main:

Server:

./target/release/./mistralrs-server --port 1234 --throughput gguf --quantized-model-id $D/models/Mistral-Nemo-Instruct-2407 --quantized-filename Mistral-Nemo-Instruct-2407.Q8_0.gguf
2024-07-28T18:13:38.820102Z  INFO mistralrs_server: avx: true, neon: false, simd128: false, f16c: true
2024-07-28T18:13:38.820123Z  INFO mistralrs_server: Sampling method: penalties -> temperature -> topk -> topp -> minp -> multinomial
2024-07-28T18:13:38.820136Z  INFO mistralrs_server: Model kind is: quantized from gguf (no adapters)
2024-07-28T18:13:38.820259Z  INFO mistralrs_core::pipeline::paths: Loading `Mistral-Nemo-Instruct-2407.Q8_0.gguf` locally at `$D/models/Mistral-Nemo-Instruct-2407/Mistral-Nemo-Instruct-2407.Q8_0.gguf`
2024-07-28T18:13:38.820396Z  INFO mistralrs_core::pipeline::gguf: Loading `generation_config.json` at `$D/models/Mistral-Nemo-Instruct-2407`
2024-07-28T18:13:38.820404Z  INFO mistralrs_core::pipeline::gguf: Loading `generation_config.json` locally at `$D/models/Mistral-Nemo-Instruct-2407/generation_config.json`
2024-07-28T18:13:38.820445Z  INFO mistralrs_core::pipeline::gguf: Loading model `$D/models/Mistral-Nemo-Instruct-2407` on cpu.
2024-07-28T18:13:39.392118Z  INFO mistralrs_core::pipeline::gguf: Model config:
general.architecture: llama
general.basename: models-mistralai-Mistral-Nemo
general.file_type: 7
general.finetune: Instruct
general.languages: en, fr, de, es, it, pt, ru, zh, ja
general.license: apache-2.0
general.name: Models Mistralai Mistral Nemo Instruct 2407
general.quantization_version: 2
general.size_label: 12B
general.type: model
general.version: 2407
llama.attention.head_count: 32
llama.attention.head_count_kv: 8
llama.attention.key_length: 128
llama.attention.layer_norm_rms_epsilon: 0.00001
llama.attention.value_length: 128
llama.block_count: 40
llama.context_length: 1024000
llama.embedding_length: 5120
llama.feed_forward_length: 14336
llama.rope.dimension_count: 128
llama.rope.freq_base: 1000000
llama.vocab_size: 131072
quantize.imatrix.chunks_count: 73
quantize.imatrix.dataset: group_40.txt
quantize.imatrix.entries_count: 280
quantize.imatrix.file: ./Mistral-Nemo-Instruct-2407-GGUF_imatrix.dat
2024-07-28T18:13:39.393241Z  INFO mistralrs_core::pipeline::gguf: Debug is enabled, wrote the names and information about each tensor to `mistralrs_gguf_tensors.txt`.
2024-07-28T18:13:39.698852Z  INFO mistralrs_core::gguf::gguf_tokenizer: GGUF tokenizer model is `gpt2`, kind: `Bpe`, num tokens: 131072, num added tokens: 0, num merges: 269443, num scores: 0
2024-07-28T18:13:39.698873Z  INFO mistralrs_core::gguf::gguf_tokenizer: Tokenizer: Tokenizer(TokenizerImpl { normalizer: None, pre_tokenizer: Some(ByteLevel(ByteLevel { add_prefix_space: false, trim_offsets: true, use_regex: true })), model: BPE(BPE { dropout: None, unk_token: Some("<unk>"), continuing_subword_prefix: None, end_of_word_suffix: None, fuse_unk: false, byte_fallback: false, vocab: 131072, merges: 269443, ignore_merges: false }), post_processor: Some(Template(TemplateProcessing { single: Template([SpecialToken { id: "<s>", type_id: 0 }, Sequence { id: A, type_id: 0 }]), pair: Template([SpecialToken { id: "<s>", type_id: 0 }, Sequence { id: A, type_id: 0 }, Sequence { id: B, type_id: 1 }]), added_single: 1, added_pair: 1, special_tokens: Tokens({"<s>": SpecialToken { id: "<s>", ids: [1], tokens: ["<s>"] }}) })), decoder: Some(ByteLevel(ByteLevel { add_prefix_space: true, trim_offsets: true, use_regex: true })), added_vocabulary: AddedVocabulary { added_tokens_map: {"</s>": 2, "<unk>": 0, "<s>": 1}, added_tokens_map_r: {2: AddedToken { content: "</s>", single_word: false, lstrip: false, rstrip: false, normalized: false, special: true }, 1: AddedToken { content: "<s>", single_word: false, lstrip: false, rstrip: false, normalized: false, special: true }, 0: AddedToken { content: "<unk>", single_word: false, lstrip: false, rstrip: false, normalized: false, special: true }}, added_tokens: [], special_tokens: [AddedToken { content: "<s>", single_word: false, lstrip: false, rstrip: false, normalized: false, special: true }, AddedToken { content: "</s>", single_word: false, lstrip: false, rstrip: false, normalized: false, special: true }, AddedToken { content: "<unk>", single_word: false, lstrip: false, rstrip: false, normalized: false, special: true }], special_tokens_set: {"<unk>", "</s>", "<s>"}, split_trie: (AhoCorasick(dfa::DFA(
D 000000: \x00-\x0E => 0
F 000016:
* 000032: \x00-\x0E => 0
 matches: 1
* 000048: \x00-\x0E => 0
 matches: 2
* 000064: \x00-\x0E => 0
 matches: 0
 >000080: \x00-\x02 => 80, \x03 => 208, \x04-\x0E => 80
  000096: \x00-\x02 => 0, \x03 => 208, \x04-\x0E => 0
  000112: \x00-\x02 => 80, \x03 => 208, \x04-\n => 80, \x0B => 128, \x0C-\x0E => 80
  000128: \x00-\x02 => 80, \x03 => 208, \x04 => 80, \x05 => 32, \x06-\x0E => 80
  000144: \x00-\x02 => 80, \x03 => 208, \x04 => 80, \x05 => 64, \x06-\x0E => 80
  000160: \x00-\x02 => 80, \x03 => 208, \x04-\x08 => 80, \t => 176, \n-\x0E => 80
  000176: \x00-\x02 => 80, \x03 => 208, \x04-\x06 => 80, \x07 => 192, \x08-\x0E => 80
  000192: \x00-\x02 => 80, \x03 => 208, \x04 => 80, \x05 => 48, \x06-\x0E => 80
  000208: \x00 => 80, \x01 => 112, \x02 => 80, \x03 => 208, \x04-\n => 80, \x0B => 144, \x0C => 80, \r => 160, \x0E => 80
match kind: LeftmostLongest
prefilter: true
state length: 14
pattern length: 3
shortest pattern length: 3
longest pattern length: 5
alphabet length: 15
stride: 16
byte classes: ByteClasses(0 => [0-46], 1 => [47], 2 => [48-59], 3 => [60], 4 => [61], 5 => [62], 6 => [63-106], 7 => [107], 8 => [108-109], 9 => [110], 10 => [111-114], 11 => [115], 12 => [116], 13 => [117], 14 => [118-255])
memory usage: 992
)
), [1, 2, 0]), split_normalized_trie: (AhoCorasick(dfa::DFA(
D 000000: \x00 => 0
F 000001:
 >000002: \x00 => 2
  000003: \x00 => 0
match kind: LeftmostLongest
prefilter: false
state length: 4
pattern length: 0
shortest pattern length: 18446744073709551615
longest pattern length: 0
alphabet length: 1
stride: 1
byte classes: ByteClasses(0 => [0-255])
memory usage: 16
)
), []), encode_special_tokens: false }, truncation: None, padding: None })
2024-07-28T18:13:39.718452Z  INFO mistralrs_core::gguf::chat_template: Discovered and using GGUF chat template: `{%- if messages[0]['role'] == 'system' %}\n    {%- set system_message = messages[0]['content'] %}\n    {%- set loop_messages = messages[1:] %}\n{%- else %}\n    {%- set loop_messages = messages %}\n{%- endif %}\n\n{{- bos_token }}\n{%- for message in loop_messages %}\n    {%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}\n        {{- raise_exception('After the optional system message, conversation roles must alternate user/assistant/user/assistant/...') }}\n    {%- endif %}\n    {%- if message['role'] == 'user' %}\n        {%- if loop.last and system_message is defined %}\n            {{- '[INST] ' + system_message + '\n\n' + message['content'] + '[/INST]' }}\n        {%- else %}\n            {{- '[INST] ' + message['content'] + '[/INST]' }}\n        {%- endif %}\n    {%- elif message['role'] == 'assistant' %}\n        {{- ' ' + message['content'] + eos_token}}\n    {%- else %}\n        {{- raise_exception('Only user and assistant roles are supported, with the exception of an initial optional system message!') }}\n    {%- endif %}\n{%- endfor %}\n`
2024-07-28T18:14:42.854153Z  INFO mistralrs_core::pipeline::paths: Using literal chat template.
2024-07-28T18:14:43.281455Z  INFO mistralrs_core::pipeline::chat_template: bos_toks = "<s>", eos_toks = "</s>", unk_tok = <unk>
2024-07-28T18:14:43.300792Z  INFO mistralrs_server: Model loaded.
2024-07-28T18:14:43.302137Z  INFO mistralrs_server: Serving on http://0.0.0.0:1234.
2024-07-28T18:17:36.124399Z ERROR mistralrs_core::engine: prompt step - Model failed with error: ShapeMismatchBinaryOp { lhs: [1, 26, 4096], rhs: [26, 32, 160], op: "reshape" }

Client:

python3 examples/server/chat.py
Enter system prompt >>>                            
>>> Is 22.20 greater than 22.6?
Traceback (most recent call last):
  File "mistral.rs/examples/server/chat.py", line 47, in <module>
    completion = openai.chat.completions.create(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "mistral.rs/venv/lib/python3.12/site-packages/openai/_utils/_utils.py", line 277, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "mistral.rs/venv/lib/python3.12/site-packages/openai/resources/chat/completions.py", line 643, in create
    return self._post(
           ^^^^^^^^^^^
  File "mistral.rs/venv/lib/python3.12/site-packages/openai/_base_client.py", line 1266, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "mistral.rs/venv/lib/python3.12/site-packages/openai/_base_client.py", line 942, in request
    return self._request(
           ^^^^^^^^^^^^^^
  File "mistral.rs/venv/lib/python3.12/site-packages/openai/_base_client.py", line 1031, in _request
    return self._retry_request(
           ^^^^^^^^^^^^^^^^^^^^
  File "mistral.rs/venv/lib/python3.12/site-packages/openai/_base_client.py", line 1079, in _retry_request
    return self._request(
           ^^^^^^^^^^^^^^
  File "mistral.rs/venv/lib/python3.12/site-packages/openai/_base_client.py", line 1031, in _request
    return self._retry_request(
           ^^^^^^^^^^^^^^^^^^^^
  File "mistral.rs/venv/lib/python3.12/site-packages/openai/_base_client.py", line 1079, in _retry_request
    return self._request(
           ^^^^^^^^^^^^^^
  File "mistral.rs/venv/lib/python3.12/site-packages/openai/_base_client.py", line 1046, in _request
    raise self._make_status_error_from_response(err.response) from None
openai.InternalServerError: Error code: 500 - {'message': 'shape mismatch in reshape, lhs: [1, 26, 4096], rhs: [26, 32, 160]', 'partial_response': {'id': '2', 'choices': [{'finish_reason': 'error', 'index': 0, 'message': {'content': '', 'role': 'assistant'}, 'logprobs': None}], 'created': 1722190658, 'model': '$D/models/Mistral-Nemo-Instruct-2407', 'system_fingerprint': 'local', 'object': 'chat.completion', 'usage': {'completion_tokens': 0, 'prompt_tokens': 26, 'total_tokens': 26, 'avg_tok_per_sec': 1733.3334, 'avg_prompt_tok_per_sec': None, 'avg_compl_tok_per_sec': None, 'total_time_sec': 0.015, 'total_prompt_time_sec': 0.0, 'total_completion_time_sec': 0.0}}}

Latest commit or version

Latest commit 38fb942

@Remember20240719 Remember20240719 added the bug Something isn't working label Jul 28, 2024
@EricLBuehler EricLBuehler added urgent models Additions to model or architectures labels Jul 28, 2024
@EricLBuehler
Copy link
Owner

Hey @Remember20240719! Can you please run with RUST_BACKTRACE=1?

@EricLBuehler
Copy link
Owner

EricLBuehler commented Jul 31, 2024

@Remember20240719 fixed it in #657! Please confirm that it works now after git pull.

@EricLBuehler
Copy link
Owner

@Remember20240719 closing as complete via #657.

@Remember20240719
Copy link
Author

Remember20240719 commented Aug 2, 2024

That was quick, thanks!

This time, the command runs normally, but crashes when given a prompt:

git HEAD a9b8b2e

RUST_BACKTRACE=1 ./target/release/./mistralrs-server -i gguf --quantized-model-id $D/models/Mistral-Nemo-Instruct-2407-GGUF --quantized-filename Mistral-Nemo-Instruct-2407-Q4_K_L.gguf
2024-08-02T02:19:34.960621Z  INFO mistralrs_server: avx: true, neon: false, simd128: false, f16c: true
2024-08-02T02:19:34.960657Z  INFO mistralrs_server: Sampling method: penalties -> temperature -> topk -> topp -> minp -> multinomial
2024-08-02T02:19:34.960674Z  INFO mistralrs_server: Model kind is: quantized from gguf (no adapters)
2024-08-02T02:19:34.960804Z  INFO mistralrs_core::pipeline::paths: Loading `Mistral-Nemo-Instruct-2407-Q4_K_L.gguf` locally at `$D/models/Mistral-Nemo-Instruct-2407-GGUF/Mistral-Nemo-Instruct-2407-Q4_K_L.gguf`
2024-08-02T02:19:34.960895Z  INFO mistralrs_core::pipeline::gguf: Loading model `$D/models/Mistral-Nemo-Instruct-2407-GGUF` on cpu.
2024-08-02T02:19:35.540106Z  INFO mistralrs_core::pipeline::gguf: Model config:
general.architecture: llama
general.basename: Mistral-Nemo
general.file_type: 15
general.finetune: Instruct
general.languages: en, fr, de, es, it, pt, ru, zh, ja
general.license: apache-2.0
general.name: Mistral Nemo Instruct 2407
general.quantization_version: 2
general.size_label: 12B
general.type: model
general.version: 2407
llama.attention.head_count: 32
llama.attention.head_count_kv: 8
llama.attention.key_length: 128
llama.attention.layer_norm_rms_epsilon: 0.00001
llama.attention.value_length: 128
llama.block_count: 40
llama.context_length: 1024000
llama.embedding_length: 5120
llama.feed_forward_length: 14336
llama.rope.dimension_count: 128
llama.rope.freq_base: 1000000
llama.vocab_size: 131072
quantize.imatrix.chunks_count: 128
quantize.imatrix.dataset: /training_dir/calibration_datav3.txt
quantize.imatrix.entries_count: 280
quantize.imatrix.file: /models_out/Mistral-Nemo-Instruct-2407-GGUF/Mistral-Nemo-Instruct-2407.imatrix
2024-08-02T02:19:35.540602Z  INFO mistralrs_core::pipeline::gguf: Debug is enabled, wrote the names and information about each tensor to `mistralrs_gguf_tensors.txt`.
2024-08-02T02:19:35.874385Z  INFO mistralrs_core::gguf::gguf_tokenizer: GGUF tokenizer model is `gpt2`, kind: `Bpe`, num tokens: 131072, num added tokens: 0, num merges: 269443, num scores: 0
2024-08-02T02:19:35.874402Z  INFO mistralrs_core::gguf::gguf_tokenizer: Tokenizer: Tokenizer(TokenizerImpl { normalizer: None, pre_tokenizer: Some(ByteLevel(ByteLevel { add_prefix_space: false, trim_offsets: true, use_regex: true })), model: BPE(BPE { dropout: None, unk_token: Some("<unk>"), continuing_subword_prefix: None, end_of_word_suffix: None, fuse_unk: false, byte_fallback: false, vocab: 131072, merges: 269443, ignore_merges: false }), post_processor: Some(Template(TemplateProcessing { single: Template([SpecialToken { id: "<s>", type_id: 0 }, Sequence { id: A, type_id: 0 }]), pair: Template([SpecialToken { id: "<s>", type_id: 0 }, Sequence { id: A, type_id: 0 }, Sequence { id: B, type_id: 1 }]), added_single: 1, added_pair: 1, special_tokens: Tokens({"<s>": SpecialToken { id: "<s>", ids: [1], tokens: ["<s>"] }}) })), decoder: Some(ByteLevel(ByteLevel { add_prefix_space: true, trim_offsets: true, use_regex: true })), added_vocabulary: AddedVocabulary { added_tokens_map: {"<s>": 1, "</s>": 2, "<unk>": 0}, added_tokens_map_r: {0: AddedToken { content: "<unk>", single_word: false, lstrip: false, rstrip: false, normalized: false, special: true }, 2: AddedToken { content: "</s>", single_word: false, lstrip: false, rstrip: false, normalized: false, special: true }, 1: AddedToken { content: "<s>", single_word: false, lstrip: false, rstrip: false, normalized: false, special: true }}, added_tokens: [], special_tokens: [AddedToken { content: "<s>", single_word: false, lstrip: false, rstrip: false, normalized: false, special: true }, AddedToken { content: "</s>", single_word: false, lstrip: false, rstrip: false, normalized: false, special: true }, AddedToken { content: "<unk>", single_word: false, lstrip: false, rstrip: false, normalized: false, special: true }], special_tokens_set: {"</s>", "<s>", "<unk>"}, split_trie: (AhoCorasick(dfa::DFA(
D 000000: \x00-\x0E => 0
F 000016:
* 000032: \x00-\x0E => 0
 matches: 1
* 000048: \x00-\x0E => 0
 matches: 2
* 000064: \x00-\x0E => 0
 matches: 0
 >000080: \x00-\x02 => 80, \x03 => 208, \x04-\x0E => 80
  000096: \x00-\x02 => 0, \x03 => 208, \x04-\x0E => 0
  000112: \x00-\x02 => 80, \x03 => 208, \x04-\n => 80, \x0B => 128, \x0C-\x0E => 80
  000128: \x00-\x02 => 80, \x03 => 208, \x04 => 80, \x05 => 32, \x06-\x0E => 80
  000144: \x00-\x02 => 80, \x03 => 208, \x04 => 80, \x05 => 64, \x06-\x0E => 80
  000160: \x00-\x02 => 80, \x03 => 208, \x04-\x08 => 80, \t => 176, \n-\x0E => 80
  000176: \x00-\x02 => 80, \x03 => 208, \x04-\x06 => 80, \x07 => 192, \x08-\x0E => 80
  000192: \x00-\x02 => 80, \x03 => 208, \x04 => 80, \x05 => 48, \x06-\x0E => 80
  000208: \x00 => 80, \x01 => 112, \x02 => 80, \x03 => 208, \x04-\n => 80, \x0B => 144, \x0C => 80, \r => 160, \x0E => 80
match kind: LeftmostLongest
prefilter: true
state length: 14
pattern length: 3
shortest pattern length: 3
longest pattern length: 5
alphabet length: 15
stride: 16
byte classes: ByteClasses(0 => [0-46], 1 => [47], 2 => [48-59], 3 => [60], 4 => [61], 5 => [62], 6 => [63-106], 7 => [107], 8 => [108-109], 9 => [110], 10 => [111-114], 11 => [115], 12 => [116], 13 => [117], 14 => [118-255])
memory usage: 992
)
), [1, 2, 0]), split_normalized_trie: (AhoCorasick(dfa::DFA(
D 000000: \x00 => 0
F 000001:
 >000002: \x00 => 2
  000003: \x00 => 0
match kind: LeftmostLongest
prefilter: false
state length: 4
pattern length: 0
shortest pattern length: 18446744073709551615
longest pattern length: 0
alphabet length: 1
stride: 1
byte classes: ByteClasses(0 => [0-255])
memory usage: 16
)
), []), encode_special_tokens: false }, truncation: None, padding: None })
2024-08-02T02:19:35.897140Z  INFO mistralrs_core::gguf::chat_template: Discovered and using GGUF chat template: `{%- if messages[0]['role'] == 'system' %}\n    {%- set system_message = messages[0]['content'] %}\n    {%- set loop_messages = messages[1:] %}\n{%- else %}\n    {%- set loop_messages = messages %}\n{%- endif %}\n\n{{- bos_token }}\n{%- for message in loop_messages %}\n    {%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}\n        {{- raise_exception('After the optional system message, conversation roles must alternate user/assistant/user/assistant/...') }}\n    {%- endif %}\n    {%- if message['role'] == 'user' %}\n        {%- if loop.last and system_message is defined %}\n            {{- '[INST] ' + system_message + '\n\n' + message['content'] + '[/INST]' }}\n        {%- else %}\n            {{- '[INST] ' + message['content'] + '[/INST]' }}\n        {%- endif %}\n    {%- elif message['role'] == 'assistant' %}\n        {{- ' ' + message['content'] + eos_token}}\n    {%- else %}\n        {{- raise_exception('Only user and assistant roles are supported, with the exception of an initial optional system message!') }}\n    {%- endif %}\n{%- endfor %}\n`
2024-08-02T02:20:29.804507Z  INFO mistralrs_core::pipeline::paths: Using literal chat template.
2024-08-02T02:20:30.167965Z  INFO mistralrs_core::pipeline::chat_template: bos_toks = "<s>", eos_toks = "</s>", unk_tok = <unk>
2024-08-02T02:20:30.194591Z  INFO mistralrs_server: Model loaded.
2024-08-02T02:20:30.194725Z  INFO mistralrs_server::interactive_mode: Starting interactive loop with sampling params: SamplingParams { temperature: Some(0.1), top_k: Some(32), top_p: Some(0.1), min_p: Some(0.05), top_n_logprobs: 0, frequency_penalty: Some(0.1), presence_penalty: Some(0.1), stop_toks: None, max_len: Some(4096), logits_bias: None, n_choices: 1 }
> Dags. Do you like dags?
2024-08-02T02:20:44.803643Z ERROR mistralrs_core::engine: prompt step - Model failed with error: WithBacktrace { inner: ShapeMismatchBinaryOp { lhs: [1, 71, 32, 128], rhs: [1, 71, 5120], op: "reshape" }, backtrace: Backtrace [{ fn: "candle_core::error::Error::bt" }, { fn: "candle_core::tensor::Tensor::reshape" }, { fn: "mistralrs_core::models::quantized_llama::ModelWeights::forward" }, { fn: "<mistralrs_core::pipeline::gguf::GGUFPipeline as mistralrs_core::pipeline::Pipeline>::forward_inputs" }, { fn: "mistralrs_core::pipeline::Pipeline::step::{{closure}}" }, { fn: "mistralrs_core::engine::Engine::run::{{closure}}" }, { fn: "std::sys_common::backtrace::__rust_begin_short_backtrace" }, { fn: "core::ops::function::FnOnce::call_once{{vtable.shim}}" }, { fn: "std::sys::pal::unix::thread::Thread::new::thread_start" }, { fn: "start_thread" }, { fn: "__GI___clone3" }] }
2024-08-02T02:20:44.804331Z ERROR mistralrs_server::interactive_mode: Got a model error: "shape mismatch in reshape, lhs: [1, 71, 32, 128], rhs: [1, 71, 5120]
   0: candle_core::error::Error::bt
   1: candle_core::tensor::Tensor::reshape
   2: mistralrs_core::models::quantized_llama::ModelWeights::forward
   3: <mistralrs_core::pipeline::gguf::GGUFPipeline as mistralrs_core::pipeline::Pipeline>::forward_inputs
   4: mistralrs_core::pipeline::Pipeline::step::{{closure}}
   5: mistralrs_core::engine::Engine::run::{{closure}}
   6: std::sys_common::backtrace::__rust_begin_short_backtrace
   7: core::ops::function::FnOnce::call_once{{vtable.shim}}
   8: std::sys::pal::unix::thread::Thread::new::thread_start
   9: start_thread
  10: __GI___clone3", response: ChatCompletionResponse { id: "0", choices: [Choice { finish_reason: "error", index: 0, message: ResponseMessage { content: Some(""), role: "assistant", tool_calls: [] }, logprobs: None }], created: 1722565244, model: "$D/models/Mistral-Nemo-Instruct-2407-GGUF", system_fingerprint: "local", object: "chat.completion", usage: Usage { completion_tokens: 0, prompt_tokens: 71, total_tokens: 71, avg_tok_per_sec: 1203.3899, avg_prompt_tok_per_sec: inf, avg_compl_tok_per_sec: NaN, total_time_sec: 0.059, total_prompt_time_sec: 0.0, total_completion_time_sec: 0.0 } }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working models Additions to model or architectures urgent
Projects
None yet
Development

No branches or pull requests

2 participants