llama: use sliding window for phi3 #8627

FanShupei · 2024-07-22T09:19:46Z

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

Related issue report: #7709

This PR switches Phi3 model to use sliding window attention. After this PR, it no longer geneartes broken output after the 2,048 token. Tested on "phi3-mini-4k-instruct" model.

TODO: (DONE) ~~convert_hf_to_gguf.py changes~~

src/llama.cpp

ngxson · 2024-07-23T15:40:00Z

src/llama.cpp

+                    }
+                }
+
+                if (data_swa) {


data_swa is also modified by the code above (line 14158). It is used by gemma2.

Overwriting it here may break gemma2

Sorry I'm not familiar with gemma2, so I haven't test the PR on gemma2. I only test this PR with Phi3 on CPU.

I do not understand when it should padded to GGML_KD_MASK_PAD or not. I pad data_swa to GGML_KD_MASK_PAD because it looks like the original just forgets to do so. Actually, padding data_swa or not does not affect the correctness of Phi3 on CPU.

I'm confused on the original code that data is explicitly padded to GGML_KQ_MASK_PAD but data_swa is not. Is this the intended behavior? If yes, I'm happy to revert the change (padding data_swa to GGML_KQ_MASK_PAD). but I still want someone could explain to me what GGML_KQ_MASK_PAD actually means.

@ngxson I agree with @FanShupei here, I think data_swa should also be padded. I don't see why not, since the ranges of data written here and above do not overlap.

Not sure why this worked before though. Padding data_swa seems saner than leaving the values uninitialized.

Both should be padded. The padding is necessary so that GPU kernels (such as the Metal Flash-Attention) not perform extra checks for out-of-bounds access when working on chunks of data

* use sliding window for phi3 * fix typo, "data_swa" -> "data" * [conver_hf_to_gguf.py] add phi3 sliding window

Arlodotexe · 2024-07-30T23:59:15Z

This PR broke loading of all existing Phi models, even those published by Microsoft. What gives?

From LM Studio:

"llama.cpp error: 'error loading model hyperparameters: key not found in model: phi3.attention.sliding_window'"

Arlodotexe · 2024-07-31T00:16:34Z

From the Ollama maintainers:

there's additional data required in the GGUF to make Phi3 fully work correctly. Latest llama.cpp adds it and requires it. It's just an update of the model file.

Looks like this PR was a fix to an issue where Phi was outputting garbage after 2k tokens, and any model that fails to load would be broken by 2k tokens if we allowed it to load anyway.

There's a number of fine-tuned Phi variants out there, I guess they all need to be separately updated before they can be used again? That's no small task, and kinda fragments the model ecosystem. Anything in place to ease the process or notify owners of fine-tuned variants? Not even the official ones seem to have been updated yet.

ThantZinWinnnnn · 2024-07-31T08:43:08Z

This PR broke loading of all existing Phi models, even those published by Microsoft. What gives?

From LM Studio:
"llama.cpp error: 'error loading model hyperparameters: key not found in model: phi3.attention.sliding_window'"

me too . i can't load phi-3-mini-4k-instruct-gguf

cllamastii · 2024-07-31T11:37:48Z

For me is the same.
llama_model_load: error loading model: error loading model hyperparameters: key not found in model: phi3.attention.sliding_window

vladfaust · 2024-07-31T13:50:48Z

Try this one: https://huggingface.co/vladfaust/Phi-3-mini-4k-instruct-Q4_K_M-GGUF.

flatsiedatsie · 2024-08-07T14:18:48Z

Try this one: https://huggingface.co/vladfaust/Phi-3-mini-4k-instruct-Q4_K_M-GGUF.

I spent a lot of effort finding Phi 3 Mini versions that are specialized in specific languages. All of them are now not working. A cursory check shows that there are no updates for these models, and those updates probably won't come.

Is there perhaps a parameter that can force the models to load anyway?

flatsiedatsie · 2024-08-13T17:22:46Z

Whoa! You actually created an upstream fix!

.. I'm so impressed!

And it works too!

Thank you so much for this. Gobsmacked.

use sliding window for phi3

03cb5cd

slaren reviewed Jul 22, 2024

View reviewed changes

src/llama.cpp Outdated Show resolved Hide resolved

fix typo, "data_swa" -> "data"

b4e3de6

ngxson reviewed Jul 23, 2024

View reviewed changes

[conver_hf_to_gguf.py] add phi3 sliding window

19d6ad9

github-actions bot added the python python script changes label Jul 24, 2024

FanShupei marked this pull request as ready for review July 24, 2024 07:10

slaren approved these changes Jul 24, 2024

View reviewed changes

ggerganov merged commit 8a4bad5 into ggerganov:master Jul 25, 2024
55 checks passed

kaetemi mentioned this pull request Jul 25, 2024

Phi3-mini-4k-instruct will need to be updated for latest llama.cpp ollama/ollama#5956

Closed

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Jul 27, 2024

llama: use sliding window for phi3 (ggerganov#8627)

fc5b21b

* use sliding window for phi3 * fix typo, "data_swa" -> "data" * [conver_hf_to_gguf.py] add phi3 sliding window

ThiloteE mentioned this pull request Aug 3, 2024

Bug: phi-3-mini-4k-it July update failing to load. #8845

Closed

steampunque mentioned this pull request Aug 4, 2024

Phi 3 medium 128k bad generations #7520

Closed

ngxson mentioned this pull request Aug 6, 2024

Phi-3: error loading model hyperparameters ngxson/wllama#106

Closed

ngxson mentioned this pull request Aug 8, 2024

llama : default n_swa for phi-3 #8931

Merged

4 tasks

njsyw1997 deleted the phi3-swa branch August 16, 2024 22:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama: use sliding window for phi3 #8627

llama: use sliding window for phi3 #8627

FanShupei commented Jul 22, 2024 •

edited

Loading

ngxson Jul 23, 2024

FanShupei Jul 23, 2024

compilade Jul 23, 2024

ggerganov Jul 25, 2024

Arlodotexe commented Jul 30, 2024 •

edited

Loading

Arlodotexe commented Jul 31, 2024 •

edited

Loading

ThantZinWinnnnn commented Jul 31, 2024

cllamastii commented Jul 31, 2024

vladfaust commented Jul 31, 2024

flatsiedatsie commented Aug 7, 2024

flatsiedatsie commented Aug 13, 2024

llama: use sliding window for phi3 #8627

llama: use sliding window for phi3 #8627

Conversation

FanShupei commented Jul 22, 2024 • edited Loading

ngxson Jul 23, 2024

Choose a reason for hiding this comment

FanShupei Jul 23, 2024

Choose a reason for hiding this comment

compilade Jul 23, 2024

Choose a reason for hiding this comment

ggerganov Jul 25, 2024

Choose a reason for hiding this comment

Arlodotexe commented Jul 30, 2024 • edited Loading

Arlodotexe commented Jul 31, 2024 • edited Loading

ThantZinWinnnnn commented Jul 31, 2024

cllamastii commented Jul 31, 2024

vladfaust commented Jul 31, 2024

flatsiedatsie commented Aug 7, 2024

flatsiedatsie commented Aug 13, 2024

FanShupei commented Jul 22, 2024 •

edited

Loading

Arlodotexe commented Jul 30, 2024 •

edited

Loading

Arlodotexe commented Jul 31, 2024 •

edited

Loading