Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add BPE pre-tokenization for Command-R/R+. #7063

Merged
merged 3 commits into from
May 5, 2024

Conversation

dranger003
Copy link
Contributor

This replaces PR #7033 as a result of merging PR #6511.

Closes #7030 and #7040.

Copy link
Contributor

github-actions bot commented May 3, 2024

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 536 iterations 🚀

Expand details for performance related PR only
  • Concurrent users: 8, duration: 10m
  • HTTP request : avg=8756.89ms p(95)=21734.99ms fails=, finish reason: stop=469 truncated=67
  • Prompt processing (pp): avg=103.54tk/s p(95)=469.12tk/s
  • Token generation (tg): avg=32.32tk/s p(95)=46.55tk/s
  • ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=bpe-pretok-command-r-2 commit=f5806b2d09ba2dcf60d8d66046ed5853234f28de

prompt_tokens_seconds

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 536 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1714885613 --> 1714886239
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 675.74, 675.74, 675.74, 675.74, 675.74, 513.3, 513.3, 513.3, 513.3, 513.3, 531.59, 531.59, 531.59, 531.59, 531.59, 577.54, 577.54, 577.54, 577.54, 577.54, 621.43, 621.43, 621.43, 621.43, 621.43, 644.79, 644.79, 644.79, 644.79, 644.79, 647.69, 647.69, 647.69, 647.69, 647.69, 688.4, 688.4, 688.4, 688.4, 688.4, 691.16, 691.16, 691.16, 691.16, 691.16, 709.44, 709.44, 709.44, 709.44, 709.44, 731.08, 731.08, 731.08, 731.08, 731.08, 743.11, 743.11, 743.11, 743.11, 743.11, 724.87, 724.87, 724.87, 724.87, 724.87, 770.53, 770.53, 770.53, 770.53, 770.53, 794.5, 794.5, 794.5, 794.5, 794.5, 789.99, 789.99, 789.99, 789.99, 789.99, 791.02, 791.02, 791.02, 791.02, 791.02, 816.51, 816.51, 816.51, 816.51, 816.51, 813.72, 813.72, 813.72, 813.72, 813.72, 816.15, 816.15, 816.15, 816.15, 816.15, 822.75, 822.75, 822.75, 822.75, 822.75, 826.64, 826.64, 826.64, 826.64, 826.64, 832.54, 832.54, 832.54, 832.54, 832.54, 817.58, 817.58, 817.58, 817.58, 817.58, 820.87, 820.87, 820.87, 820.87, 820.87, 822.5, 822.5, 822.5, 822.5, 822.5, 837.84, 837.84, 837.84, 837.84, 837.84, 835.14, 835.14, 835.14, 835.14, 835.14, 834.02, 834.02, 834.02, 834.02, 834.02, 835.37, 835.37, 835.37, 835.37, 835.37, 840.63, 840.63, 840.63, 840.63, 840.63, 840.19, 840.19, 840.19, 840.19, 840.19, 840.33, 840.33, 840.33, 840.33, 840.33, 842.91, 842.91, 842.91, 842.91, 842.91, 845.73, 845.73, 845.73, 845.73, 845.73, 850.35, 850.35, 850.35, 850.35, 850.35, 861.33, 861.33, 861.33, 861.33, 861.33, 860.72, 860.72, 860.72, 860.72, 860.72, 858.59, 858.59, 858.59, 858.59, 858.59, 861.43, 861.43, 861.43, 861.43, 861.43, 864.45, 864.45, 864.45, 864.45, 864.45, 876.09, 876.09, 876.09, 876.09, 876.09, 860.41, 860.41, 860.41, 860.41, 860.41, 836.69, 836.69, 836.69, 836.69, 836.69, 836.74, 836.74, 836.74, 836.74, 836.74, 834.68, 834.68, 834.68, 834.68, 834.68, 832.0, 832.0, 832.0, 832.0, 832.0, 835.99, 835.99, 835.99, 835.99, 835.99, 838.72, 838.72, 838.72, 838.72, 838.72, 839.72, 839.72, 839.72, 839.72, 839.72, 842.33, 842.33, 842.33, 842.33, 842.33, 844.19, 844.19, 844.19, 844.19, 844.19, 847.05, 847.05, 847.05, 847.05, 847.05, 847.78, 847.78, 847.78, 847.78, 847.78, 849.13, 849.13, 849.13, 849.13, 849.13, 853.68, 853.68, 853.68, 853.68, 853.68, 854.56, 854.56, 854.56, 854.56, 854.56, 854.47, 854.47, 854.47, 854.47, 854.47, 855.58, 855.58, 855.58, 855.58, 855.58, 856.38, 856.38, 856.38, 856.38, 856.38, 856.17, 856.17, 856.17, 856.17]
                    
Loading
predicted_tokens_seconds
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 536 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1714885613 --> 1714886239
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 43.8, 43.8, 43.8, 43.8, 43.8, 40.63, 40.63, 40.63, 40.63, 40.63, 34.12, 34.12, 34.12, 34.12, 34.12, 33.15, 33.15, 33.15, 33.15, 33.15, 32.78, 32.78, 32.78, 32.78, 32.78, 32.89, 32.89, 32.89, 32.89, 32.89, 33.74, 33.74, 33.74, 33.74, 33.74, 34.58, 34.58, 34.58, 34.58, 34.58, 34.84, 34.84, 34.84, 34.84, 34.84, 34.7, 34.7, 34.7, 34.7, 34.7, 34.54, 34.54, 34.54, 34.54, 34.54, 34.41, 34.41, 34.41, 34.41, 34.41, 33.57, 33.57, 33.57, 33.57, 33.57, 33.47, 33.47, 33.47, 33.47, 33.47, 32.14, 32.14, 32.14, 32.14, 32.14, 31.47, 31.47, 31.47, 31.47, 31.47, 31.87, 31.87, 31.87, 31.87, 31.87, 31.98, 31.98, 31.98, 31.98, 31.98, 31.28, 31.28, 31.28, 31.28, 31.28, 30.99, 30.99, 30.99, 30.99, 30.99, 30.96, 30.96, 30.96, 30.96, 30.96, 31.09, 31.09, 31.09, 31.09, 31.09, 31.31, 31.31, 31.31, 31.31, 31.31, 31.17, 31.17, 31.17, 31.17, 31.17, 31.2, 31.2, 31.2, 31.2, 31.2, 31.38, 31.38, 31.38, 31.38, 31.38, 31.37, 31.37, 31.37, 31.37, 31.37, 30.79, 30.79, 30.79, 30.79, 30.79, 30.52, 30.52, 30.52, 30.52, 30.52, 30.71, 30.71, 30.71, 30.71, 30.71, 30.86, 30.86, 30.86, 30.86, 30.86, 31.05, 31.05, 31.05, 31.05, 31.05, 31.27, 31.27, 31.27, 31.27, 31.27, 31.31, 31.31, 31.31, 31.31, 31.31, 31.26, 31.26, 31.26, 31.26, 31.26, 31.19, 31.19, 31.19, 31.19, 31.19, 31.09, 31.09, 31.09, 31.09, 31.09, 30.87, 30.87, 30.87, 30.87, 30.87, 30.88, 30.88, 30.88, 30.88, 30.88, 31.08, 31.08, 31.08, 31.08, 31.08, 31.22, 31.22, 31.22, 31.22, 31.22, 31.26, 31.26, 31.26, 31.26, 31.26, 31.23, 31.23, 31.23, 31.23, 31.23, 31.14, 31.14, 31.14, 31.14, 31.14, 30.9, 30.9, 30.9, 30.9, 30.9, 29.61, 29.61, 29.61, 29.61, 29.61, 29.6, 29.6, 29.6, 29.6, 29.6, 29.56, 29.56, 29.56, 29.56, 29.56, 29.55, 29.55, 29.55, 29.55, 29.55, 29.69, 29.69, 29.69, 29.69, 29.69, 29.7, 29.7, 29.7, 29.7, 29.7, 29.89, 29.89, 29.89, 29.89, 29.89, 29.88, 29.88, 29.88, 29.88, 29.88, 29.87, 29.87, 29.87, 29.87, 29.87, 29.69, 29.69, 29.69, 29.69, 29.69, 29.62, 29.62, 29.62, 29.62, 29.62, 29.67, 29.67, 29.67, 29.67, 29.67, 29.81, 29.81, 29.81, 29.81, 29.81, 29.92, 29.92, 29.92, 29.92, 29.92, 30.03, 30.03, 30.03, 30.03, 30.03, 30.07, 30.07, 30.07, 30.07]
                    
Loading

Details

kv_cache_usage_ratio

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 536 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1714885613 --> 1714886239
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.18, 0.18, 0.18, 0.18, 0.18, 0.39, 0.39, 0.39, 0.39, 0.39, 0.18, 0.18, 0.18, 0.18, 0.18, 0.16, 0.16, 0.16, 0.16, 0.16, 0.2, 0.2, 0.2, 0.2, 0.2, 0.14, 0.14, 0.14, 0.14, 0.14, 0.07, 0.07, 0.07, 0.07, 0.07, 0.15, 0.15, 0.15, 0.15, 0.15, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.15, 0.15, 0.15, 0.15, 0.15, 0.32, 0.32, 0.32, 0.32, 0.32, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.13, 0.13, 0.13, 0.13, 0.13, 0.29, 0.29, 0.29, 0.29, 0.29, 0.29, 0.29, 0.29, 0.29, 0.29, 0.24, 0.24, 0.24, 0.24, 0.24, 0.14, 0.14, 0.14, 0.14, 0.14, 0.17, 0.17, 0.17, 0.17, 0.17, 0.26, 0.26, 0.26, 0.26, 0.26, 0.26, 0.26, 0.26, 0.26, 0.26, 0.12, 0.12, 0.12, 0.12, 0.12, 0.18, 0.18, 0.18, 0.18, 0.18, 0.3, 0.3, 0.3, 0.3, 0.3, 0.26, 0.26, 0.26, 0.26, 0.26, 0.09, 0.09, 0.09, 0.09, 0.09, 0.16, 0.16, 0.16, 0.16, 0.16, 0.13, 0.13, 0.13, 0.13, 0.13, 0.12, 0.12, 0.12, 0.12, 0.12, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.15, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.22, 0.33, 0.33, 0.33, 0.33, 0.33, 0.12, 0.12, 0.12, 0.12, 0.12, 0.14, 0.14, 0.14, 0.14, 0.14, 0.16, 0.16, 0.16, 0.16, 0.16, 0.09, 0.09, 0.09, 0.09, 0.09, 0.33, 0.33, 0.33, 0.33, 0.33, 0.45, 0.45, 0.45, 0.45, 0.45, 0.5, 0.5, 0.5, 0.5, 0.5, 0.6, 0.6, 0.6, 0.6, 0.6, 0.46, 0.46, 0.46, 0.46, 0.46, 0.13, 0.13, 0.13, 0.13, 0.13, 0.17, 0.17, 0.17, 0.17, 0.17, 0.11, 0.11, 0.11, 0.11, 0.11, 0.13, 0.13, 0.13, 0.13, 0.13, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.18, 0.3, 0.3, 0.3, 0.3, 0.3, 0.27, 0.27, 0.27, 0.27, 0.27, 0.15, 0.15, 0.15, 0.15, 0.15, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.17, 0.17, 0.17, 0.17, 0.17, 0.14, 0.14, 0.14, 0.14, 0.14, 0.17, 0.17, 0.17, 0.17]
                    
Loading
requests_processing
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 536 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1714885613 --> 1714886239
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 1.0, 1.0, 1.0, 1.0, 1.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 2.0, 2.0, 2.0, 2.0, 2.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 2.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0]
                    
Loading

@slaren
Copy link
Collaborator

slaren commented May 3, 2024

This and #7041 have different regex. Which one is correct?

@dranger003
Copy link
Contributor Author

also has 'Digits' and individual_digits=True, so making an assumption there now.

@slaren There is mention of an assumption about digits, which I haven't included but I can include if needed. The regex in this PR has been tested with test-tokenizer-0 which I presume does not cover all scenarios?

@araleza
Copy link

araleza commented May 3, 2024

Hi, does this mean that Command-R was always running at reduced quality, and we just didn't know until recently? Or have the recent Llama 3 changes to the llama.cpp tokenizer resulted in this update being needed to get it back to where it was before the Llama 3 changes went in?

@eskeletor97
Copy link

There is mention of an assumption about digits, which I haven't included but I can include if needed. The regex in this PR has been tested with test-tokenizer-0 which I presume does not cover all scenarios?

I haven't really tested command-r before with any math or numbers, but isn't it a similar issue to llama3 where digits were grouped and tokenized incorrectly?

@ggerganov
Copy link
Owner

I had to update to new transformers:

diff --git a/requirements/requirements-convert.txt b/requirements/requirements-convert.txt
index a3d6ecec..5520ba73 100644
--- a/requirements/requirements-convert.txt
+++ b/requirements/requirements-convert.txt
@@ -1,5 +1,5 @@
 numpy~=1.24.4
 sentencepiece~=0.1.98
-transformers>=4.35.2,<5.0.0
+transformers>=4.40.1,<5.0.0
 gguf>=0.1.0
 protobuf>=4.21.0,<5.0.0

Else, I got this error:

python3 convert-hf-to-gguf-update.py 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
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Traceback (most recent call last):
  File "/Users/ggerganov/development/github/llama.cpp/convert-hf-to-gguf-update.py", line 135, in <module>
    tokenizer = AutoTokenizer.from_pretrained(f"models/tokenizers/{name}")
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/transformers/models/auto/tokenization_auto.py", line 784, in from_pretrained
    raise ValueError(
ValueError: Tokenizer class CohereTokenizer does not exist or is not currently imported.

@ggerganov
Copy link
Owner

Let's rebase on latest master and I will run some extra tests to check if the regexes are correct

@dranger003
Copy link
Contributor Author

@ggerganov Thanks, the PR has been rebased and I added the transformers change.

@ggerganov ggerganov merged commit 889bdd7 into ggerganov:master May 5, 2024
63 checks passed
nopperl pushed a commit to nopperl/llama.cpp that referenced this pull request May 5, 2024
* Add BPE pre-tokenization for Command-R/R+.

* Bump transformers convert requirement.

* command-r : add individual digits regex

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
teleprint-me pushed a commit to teleprint-me/llama.cpp that referenced this pull request May 7, 2024
* Add BPE pre-tokenization for Command-R/R+.

* Bump transformers convert requirement.

* command-r : add individual digits regex

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Command-R GGUF conversion no longer working
5 participants