Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mac M2 build error #716

Closed
trickster opened this issue Aug 27, 2024 · 7 comments
Closed

Mac M2 build error #716

trickster opened this issue Aug 27, 2024 · 7 comments
Labels
bug Something isn't working build Issues relating to building mistral.rs

Comments

@trickster
Copy link

Minimum reproducible example

The minium example to reproduce the error. Simpler examples make it easier and faster to fix!

cargo build --release -F metal

Error

   = note: the matched value is of type `quantized::GgmlDType`
help: ensure that all possible cases are being handled by adding a match arm with a wildcard pattern or an explicit pattern as shown
    |
105 ~             },
106 +             quantized::GgmlDType::BF16 => todo!()
    |

error[E0004]: non-exhaustive patterns: `quantized::GgmlDType::BF16` not covered
   --> /Users/user/.cargo/git/checkouts/candle-c6a149c3b35a488f/f706ef2/candle-core/src/quantized/metal.rs:229:15
    |
229 |         match value {
    |               ^^^^^ pattern `quantized::GgmlDType::BF16` not covered
    |
note: `quantized::GgmlDType` defined here
   --> /Users/user/.cargo/git/checkouts/candle-c6a149c3b35a488f/f706ef2/candle-core/src/quantized/mod.rs:145:10
    |
145 | pub enum GgmlDType {
    |          ^^^^^^^^^
...
148 |     BF16,
    |     ---- not covered
    = note: the matched value is of type `quantized::GgmlDType`
help: ensure that all possible cases are being handled by adding a match arm with a wildcard pattern or an explicit pattern as shown
    |
243 ~             GgmlDType::F32 => candle_metal_kernels::GgmlDType::F32,
244 ~             quantized::GgmlDType::BF16 => todo!(),
    |

For more information about this error, try `rustc --explain E0004`.
error: could not compile `candle-core` (lib) due to 2 previous errors
warning: build failed, waiting for other jobs to finish...

Latest commit or version

8ddf258

@trickster trickster added bug Something isn't working build Issues relating to building mistral.rs labels Aug 27, 2024
@EricLBuehler
Copy link
Owner

@trickster thanks for reporting this. I think #719 should fix this, can you please run:

git pull
git switch metal_build_quant

And try to build again?

@jawshoeadan
Copy link

It did not fix it for me

@EricLBuehler
Copy link
Owner

@jawshoeadan do you have the same error?

@trickster
Copy link
Author

trickster commented Aug 29, 2024

@EricLBuehler
In f17f113/candle-metal-kernels/src/lib.rs on line 1963, you had a typo, you need to use GgmlDType::BF16 instead of using F16 twice

@trickster
Copy link
Author

trickster commented Aug 29, 2024

So, after I fixed it locally,

cargo build --release --features metal
./target/release/mistralrs-server -i plain -m microsoft/Phi-3.5-mini-instruct -a phi3

This works and loads all the layers, but it's not on GPU and the inference is insanely slow. Do you know why?

The gguf models work, but for some reason chat_template is not applied?

./target/release/mistralrs-server --port 8711 --chat-template chat_templates/phi3.json -i gguf  --quantized-model-id bartowski/Phi-3.5-mini-instruct-GGUF --quantized-filename Phi-3.5-mini-instruct-Q8_0.gguf

Params { temperature: Some(0.1), top_k: Some(32), top_p: Some(0.1), min_p: Some(0.05), top_n_logprobs: 0, frequency_penalty: Some(0.1), presence_penalty: Some(0.1), stop_toks: None, max_len: Some(4096), logits_bias: None, n_choices: 1, dry_params: Some(DrySamplingParams { sequence_breakers: ["\n", ":", "\"", "*"], multiplier: 0.0, base: 1.75, allowed_length: 2 }) }
> hi
Hello! How can I assist you today?<|end|><|assistant|> Hello! I'm Phi, an AI language model. How can I help you today?<|end|><|assistant|> Hello! I'm Phi, an AI language model developed by Microsoft.

@EricLBuehler
Copy link
Owner

@trickster thanks for letting me know. #719 has been updated accordingly.

This works and loads all the layers, but it's not on GPU and the inference is insanely slow. Do you know why?

So, it's not using the GPU at all? Can you please attach the full output of running the model?

The gguf models work, but for some reason chat_template is not applied?

Perhaps the chat template is incorrect? I just merged #734 which adds the Phi 3.5 chat template, but perhaps using the GGUF builtin template or sourcing the tokenizer and chat template from the official HF repository would be beter.

The following works:

cargo run --features cuda -- -i gguf -m ../gguf-models/ -f Phi-3.5-mini-instruct-Q4_K_M.gguf -t microsoft/Phi-3.5-mini-instruct

But I can reproduce the issue with the GGUF tokenizer.

@EricLBuehler
Copy link
Owner

@trickster I'm closing this as the build error is incorrect. Can you please open a separate issue with the GGUF template behavior?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working build Issues relating to building mistral.rs
Projects
None yet
Development

No branches or pull requests

3 participants