Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ggml : fix loongson compile warnings #7537

Merged
merged 3 commits into from
May 31, 2024
Merged

ggml : fix loongson compile warnings #7537

merged 3 commits into from
May 31, 2024

Conversation

ggerganov
Copy link
Owner

@ggerganov ggerganov commented May 25, 2024

Adding a ggml-101-loongson node to ggml-ci - still some compile warnings left to fix

@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label May 25, 2024
@ggerganov
Copy link
Owner Author

@junchao-loongson I've configured the loongson node to run ggml-ci. Currently, there are a few build errors:

https://github.com/ggml-org/ci/blob/results/llama.cpp/80/787c2a26c54998fff5f621e5aa7ae9866d0bfd/ggml-101-loongson/stdall

In file included from /home/loongson/work/llama.cpp/ggml-impl.h:464,
                 from /home/loongson/work/llama.cpp/ggml-quants.c:5:
/home/loongson/work/llama.cpp/ggml-quants.c: In function ‘ggml_vec_dot_q3_K_q8_K’:
/home/loongson/work/llama.cpp/ggml-quants.c:6841:90: error: invalid argument to built-in function
             const __m256i q3h_0 = __lasx_xvslli_h(__lasx_xvsrli_h(__lasx_xvandn_v(hbits, __lasx_xvslli_h(mone, bit)), bit), 2);
                                                                                          ^~~~~~~~~~~~~~~
/home/loongson/work/llama.cpp/ggml-quants.c:6841:35: error: invalid argument to built-in function
             const __m256i q3h_0 = __lasx_xvslli_h(__lasx_xvsrli_h(__lasx_xvandn_v(hbits, __lasx_xvslli_h(mone, bit)), bit), 2);
                                   ^~~~~~~~~~~~~~~
/home/loongson/work/llama.cpp/ggml-quants.c:6845:90: error: invalid argument to built-in function
             const __m256i q3h_1 = __lasx_xvslli_h(__lasx_xvsrli_h(__lasx_xvandn_v(hbits, __lasx_xvslli_h(mone, bit)), bit), 2);
                                                                                          ^~~~~~~~~~~~~~~
/home/loongson/work/llama.cpp/ggml-quants.c:6845:35: error: invalid argument to built-in function
             const __m256i q3h_1 = __lasx_xvslli_h(__lasx_xvsrli_h(__lasx_xvandn_v(hbits, __lasx_xvslli_h(mone, bit)), bit), 2);
                                   ^~~~~~~~~~~~~~~
/home/loongson/work/llama.cpp/ggml-quants.c:6849:90: error: invalid argument to built-in function
             const __m256i q3h_2 = __lasx_xvslli_h(__lasx_xvsrli_h(__lasx_xvandn_v(hbits, __lasx_xvslli_h(mone, bit)), bit), 2);
                                                                                          ^~~~~~~~~~~~~~~
/home/loongson/work/llama.cpp/ggml-quants.c:6849:35: error: invalid argument to built-in function
             const __m256i q3h_2 = __lasx_xvslli_h(__lasx_xvsrli_h(__lasx_xvandn_v(hbits, __lasx_xvslli_h(mone, bit)), bit), 2);
                                   ^~~~~~~~~~~~~~~
/home/loongson/work/llama.cpp/ggml-quants.c:6853:90: error: invalid argument to built-in function
             const __m256i q3h_3 = __lasx_xvslli_h(__lasx_xvsrli_h(__lasx_xvandn_v(hbits, __lasx_xvslli_h(mone, bit)), bit), 2);
                                                                                          ^~~~~~~~~~~~~~~
/home/loongson/work/llama.cpp/ggml-quants.c:6853:35: error: invalid argument to built-in function
             const __m256i q3h_3 = __lasx_xvslli_h(__lasx_xvsrli_h(__lasx_xvandn_v(hbits, __lasx_xvslli_h(mone, bit)), bit), 2);
                                   ^~~~~~~~~~~~~~~
/home/loongson/work/llama.cpp/ggml-quants.c: In function ‘ggml_vec_dot_q5_K_q8_K’:
/home/loongson/work/llama.cpp/ggml-quants.c:8051:35: error: invalid argument to built-in function
             const __m256i q5h_0 = __lasx_xvslli_h(__lasx_xvsrli_h(__lasx_xvand_v(hbits, hmask), bit++), 4);
                                   ^~~~~~~~~~~~~~~
/home/loongson/work/llama.cpp/ggml-quants.c:8056:35: error: invalid argument to built-in function
             const __m256i q5h_1 = __lasx_xvslli_h(__lasx_xvsrli_h(__lasx_xvand_v(hbits, hmask), bit++), 4);
                                   ^~~~~~~~~~~~~~~
make[2]: *** [CMakeFiles/ggml.dir/build.make:118:CMakeFiles/ggml.dir/ggml-quants.c.o] 错误 1
make[2]: *** 正在等待未完成的任务....
make[1]: *** [CMakeFiles/Makefile2:820:CMakeFiles/ggml.dir/all] 错误 2
make: *** [Makefile:146:all] 错误 2

Could you suggest a fix for these?

Copy link
Contributor

github-actions bot commented May 25, 2024

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 551 iterations 🚀

Expand details for performance related PR only
  • Concurrent users: 8, duration: 10m
  • HTTP request : avg=8488.66ms p(95)=19849.83ms fails=, finish reason: stop=503 truncated=48
  • Prompt processing (pp): avg=101.93tk/s p(95)=447.04tk/s
  • Token generation (tg): avg=33.84tk/s p(95)=46.25tk/s
  • ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=gg/ci-loongson commit=fd5de67bb7835011e3caa21ea21cc49657fb1198

prompt_tokens_seconds

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 551 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1717069285 --> 1717069909
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 648.52, 648.52, 648.52, 648.52, 648.52, 780.49, 780.49, 780.49, 780.49, 780.49, 787.89, 787.89, 787.89, 787.89, 787.89, 863.15, 863.15, 863.15, 863.15, 863.15, 868.0, 868.0, 868.0, 868.0, 868.0, 859.27, 859.27, 859.27, 859.27, 859.27, 854.47, 854.47, 854.47, 854.47, 854.47, 862.91, 862.91, 862.91, 862.91, 862.91, 874.94, 874.94, 874.94, 874.94, 874.94, 871.18, 871.18, 871.18, 871.18, 871.18, 892.56, 892.56, 892.56, 892.56, 892.56, 860.72, 860.72, 860.72, 860.72, 860.72, 857.66, 857.66, 857.66, 857.66, 857.66, 852.48, 852.48, 852.48, 852.48, 852.48, 854.81, 854.81, 854.81, 854.81, 854.81, 834.27, 834.27, 834.27, 834.27, 834.27, 834.68, 834.68, 834.68, 834.68, 834.68, 840.89, 840.89, 840.89, 840.89, 840.89, 846.97, 846.97, 846.97, 846.97, 846.97, 856.99, 856.99, 856.99, 856.99, 856.99, 855.72, 855.72, 855.72, 855.72, 855.72, 857.81, 857.81, 857.81, 857.81, 857.81, 868.3, 868.3, 868.3, 868.3, 868.3, 871.01, 871.01, 871.01, 871.01, 871.01, 872.83, 872.83, 872.83, 872.83, 872.83, 874.45, 874.45, 874.45, 874.45, 874.45, 860.3, 860.3, 860.3, 860.3, 860.3, 855.34, 855.34, 855.34, 855.34, 855.34, 857.02, 857.02, 857.02, 857.02, 857.02, 859.58, 859.58, 859.58, 859.58, 859.58, 858.8, 858.8, 858.8, 858.8, 858.8, 856.51, 856.51, 856.51, 856.51, 856.51, 846.74, 846.74, 846.74, 846.74, 846.74, 858.36, 858.36, 858.36, 858.36, 858.36, 858.28, 858.28, 858.28, 858.28, 858.28, 863.7, 863.7, 863.7, 863.7, 863.7, 861.4, 861.4, 861.4, 861.4, 861.4, 860.98, 860.98, 860.98, 860.98, 860.98, 862.57, 862.57, 862.57, 862.57, 862.57, 864.57, 864.57, 864.57, 864.57, 864.57, 864.52, 864.52, 864.52, 864.52, 864.52, 841.87, 841.87, 841.87, 841.87, 841.87, 833.7, 833.7, 833.7, 833.7, 833.7, 833.11, 833.11, 833.11, 833.11, 833.11, 831.92, 831.92, 831.92, 831.92, 831.92, 834.67, 834.67, 834.67, 834.67, 834.67, 834.42, 834.42, 834.42, 834.42, 834.42, 834.09, 834.09, 834.09, 834.09, 834.09, 839.92, 839.92, 839.92, 839.92, 839.92, 840.2, 840.2, 840.2, 840.2, 840.2, 846.75, 846.75, 846.75, 846.75, 846.75, 845.52, 845.52, 845.52, 845.52, 845.52, 845.72, 845.72, 845.72, 845.72, 845.72, 846.47, 846.47, 846.47, 846.47, 846.47, 846.89, 846.89, 846.89, 846.89, 846.89, 846.54, 846.54, 846.54, 846.54, 846.54, 847.54, 847.54, 847.54, 847.54, 847.54, 846.65, 846.65, 846.65, 846.65, 846.65, 847.55, 847.55, 847.55, 847.55, 847.55, 848.85, 848.85, 848.85, 848.85, 848.85, 849.0, 849.0, 849.0]
                    
Loading
predicted_tokens_seconds
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 551 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1717069285 --> 1717069909
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 41.62, 41.62, 41.62, 41.62, 41.62, 34.24, 34.24, 34.24, 34.24, 34.24, 29.03, 29.03, 29.03, 29.03, 29.03, 31.37, 31.37, 31.37, 31.37, 31.37, 31.61, 31.61, 31.61, 31.61, 31.61, 32.35, 32.35, 32.35, 32.35, 32.35, 33.02, 33.02, 33.02, 33.02, 33.02, 33.55, 33.55, 33.55, 33.55, 33.55, 33.97, 33.97, 33.97, 33.97, 33.97, 34.13, 34.13, 34.13, 34.13, 34.13, 34.58, 34.58, 34.58, 34.58, 34.58, 34.12, 34.12, 34.12, 34.12, 34.12, 33.83, 33.83, 33.83, 33.83, 33.83, 33.31, 33.31, 33.31, 33.31, 33.31, 30.45, 30.45, 30.45, 30.45, 30.45, 30.34, 30.34, 30.34, 30.34, 30.34, 30.29, 30.29, 30.29, 30.29, 30.29, 30.53, 30.53, 30.53, 30.53, 30.53, 30.45, 30.45, 30.45, 30.45, 30.45, 30.51, 30.51, 30.51, 30.51, 30.51, 30.53, 30.53, 30.53, 30.53, 30.53, 30.55, 30.55, 30.55, 30.55, 30.55, 30.57, 30.57, 30.57, 30.57, 30.57, 30.34, 30.34, 30.34, 30.34, 30.34, 30.44, 30.44, 30.44, 30.44, 30.44, 30.63, 30.63, 30.63, 30.63, 30.63, 30.71, 30.71, 30.71, 30.71, 30.71, 30.89, 30.89, 30.89, 30.89, 30.89, 31.23, 31.23, 31.23, 31.23, 31.23, 31.29, 31.29, 31.29, 31.29, 31.29, 31.35, 31.35, 31.35, 31.35, 31.35, 31.37, 31.37, 31.37, 31.37, 31.37, 31.45, 31.45, 31.45, 31.45, 31.45, 31.45, 31.45, 31.45, 31.45, 31.45, 31.3, 31.3, 31.3, 31.3, 31.3, 31.04, 31.04, 31.04, 31.04, 31.04, 30.78, 30.78, 30.78, 30.78, 30.78, 30.64, 30.64, 30.64, 30.64, 30.64, 30.81, 30.81, 30.81, 30.81, 30.81, 30.97, 30.97, 30.97, 30.97, 30.97, 30.99, 30.99, 30.99, 30.99, 30.99, 31.1, 31.1, 31.1, 31.1, 31.1, 30.71, 30.71, 30.71, 30.71, 30.71, 30.24, 30.24, 30.24, 30.24, 30.24, 30.2, 30.2, 30.2, 30.2, 30.2, 29.88, 29.88, 29.88, 29.88, 29.88, 29.83, 29.83, 29.83, 29.83, 29.83, 29.82, 29.82, 29.82, 29.82, 29.82, 29.84, 29.84, 29.84, 29.84, 29.84, 29.96, 29.96, 29.96, 29.96, 29.96, 30.03, 30.03, 30.03, 30.03, 30.03, 29.96, 29.96, 29.96, 29.96, 29.96, 29.94, 29.94, 29.94, 29.94, 29.94, 29.86, 29.86, 29.86, 29.86, 29.86, 29.86, 29.86, 29.86, 29.86, 29.86, 30.0, 30.0, 30.0, 30.0, 30.0, 30.07, 30.07, 30.07, 30.07, 30.07, 30.13, 30.13, 30.13, 30.13, 30.13, 30.19, 30.19, 30.19, 30.19, 30.19, 30.29, 30.29, 30.29, 30.29, 30.29, 30.37, 30.37, 30.37]
                    
Loading

Details

kv_cache_usage_ratio

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 551 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1717069285 --> 1717069909
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.19, 0.19, 0.19, 0.19, 0.19, 0.37, 0.37, 0.37, 0.37, 0.37, 0.15, 0.15, 0.15, 0.15, 0.15, 0.17, 0.17, 0.17, 0.17, 0.17, 0.23, 0.23, 0.23, 0.23, 0.23, 0.24, 0.24, 0.24, 0.24, 0.24, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.17, 0.17, 0.17, 0.17, 0.17, 0.11, 0.11, 0.11, 0.11, 0.11, 0.1, 0.1, 0.1, 0.1, 0.1, 0.33, 0.33, 0.33, 0.33, 0.33, 0.22, 0.22, 0.22, 0.22, 0.22, 0.4, 0.4, 0.4, 0.4, 0.4, 0.31, 0.31, 0.31, 0.31, 0.31, 0.23, 0.23, 0.23, 0.23, 0.23, 0.19, 0.19, 0.19, 0.19, 0.19, 0.15, 0.15, 0.15, 0.15, 0.15, 0.17, 0.17, 0.17, 0.17, 0.17, 0.19, 0.19, 0.19, 0.19, 0.19, 0.22, 0.22, 0.22, 0.22, 0.22, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.11, 0.11, 0.11, 0.11, 0.11, 0.1, 0.1, 0.1, 0.1, 0.1, 0.19, 0.19, 0.19, 0.19, 0.19, 0.18, 0.18, 0.18, 0.18, 0.18, 0.12, 0.12, 0.12, 0.12, 0.12, 0.16, 0.16, 0.16, 0.16, 0.16, 0.15, 0.15, 0.15, 0.15, 0.15, 0.19, 0.19, 0.19, 0.19, 0.19, 0.17, 0.17, 0.17, 0.17, 0.17, 0.14, 0.14, 0.14, 0.14, 0.14, 0.24, 0.24, 0.24, 0.24, 0.24, 0.3, 0.3, 0.3, 0.3, 0.3, 0.44, 0.44, 0.44, 0.44, 0.44, 0.2, 0.2, 0.2, 0.2, 0.2, 0.12, 0.12, 0.12, 0.12, 0.12, 0.14, 0.14, 0.14, 0.14, 0.14, 0.17, 0.17, 0.17, 0.17, 0.17, 0.2, 0.2, 0.2, 0.2, 0.2, 0.35, 0.35, 0.35, 0.35, 0.35, 0.54, 0.54, 0.54, 0.54, 0.54, 0.33, 0.33, 0.33, 0.33, 0.33, 0.33, 0.33, 0.33, 0.33, 0.33, 0.19, 0.19, 0.19, 0.19, 0.19, 0.26, 0.26, 0.26, 0.26, 0.26, 0.14, 0.14, 0.14, 0.14, 0.14, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.31, 0.31, 0.31, 0.31, 0.31, 0.19, 0.19, 0.19, 0.19, 0.19, 0.22, 0.22, 0.22, 0.22, 0.22, 0.14, 0.14, 0.14, 0.14, 0.14, 0.09, 0.09, 0.09, 0.09, 0.09, 0.16, 0.16, 0.16, 0.16, 0.16, 0.14, 0.14, 0.14, 0.14, 0.14, 0.13, 0.13, 0.13, 0.13, 0.13, 0.16, 0.16, 0.16, 0.16, 0.16, 0.19, 0.19, 0.19, 0.19, 0.19, 0.18, 0.18, 0.18]
                    
Loading
requests_processing
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 551 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1717069285 --> 1717069909
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 1.0, 1.0, 1.0, 1.0, 1.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 1.0, 1.0, 1.0, 1.0, 1.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 1.0, 1.0, 1.0, 1.0, 1.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 2.0, 2.0, 2.0, 2.0, 2.0, 4.0, 4.0, 4.0, 4.0, 4.0, 2.0, 2.0, 2.0, 2.0, 2.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 2.0, 2.0, 2.0]
                    
Loading

@mofosyne mofosyne added the Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level label May 27, 2024
@junchao-loongson
Copy link
Contributor

junchao-loongson commented May 29, 2024 via email

@junchao-loongson
Copy link
Contributor

Hello~ @ggerganov
I have fixed this issue in PR #7636.
You can extract code for handling bit variables from PR #7636 and add it to the #7537

Using cmake DCMAKE-BUILD-TYPE=Debug Compilation passed

@ggerganov
Copy link
Owner Author

@junchao-loongson Thanks. There are some connection issues from this node - I just sent you an e-mail. PTAL

@junchao-loongson
Copy link
Contributor

junchao-loongson commented May 31, 2024

In the first PR, I made a mistake when doing the rebase code that caused the unit test to fail
Fix in #7646

Fix unexpected error introduced during rebase code.
@github-actions github-actions bot added the testing Everything test related label May 31, 2024
@ggerganov ggerganov marked this pull request as ready for review May 31, 2024 11:16
@ggerganov ggerganov merged commit 0c27e6f into master May 31, 2024
7 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level testing Everything test related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants