Cheaper hardware to run bigger model #18

ekorudi · 2022-10-04T07:23:02Z

Refer our discussion at #8 , I can run ggml-large.bin, for same input audio 120 sec ( 2 minutes) in around 54 minutes on Samsung A52.

What is your suggestion for optimization to run bigger model on cheaper hardware:

Selecting better hardware for array manipulation (Neon?)
Improve algorithm
Use GPU provided by hardware
... ?

Will be happy if you share the resource I can learn to achieve that goal.

The text was updated successfully, but these errors were encountered:

ggerganov · 2022-10-04T09:56:20Z

As I mentioned in #8, I think some extra performance could be gained by properly implementing the SIMD routines in ggml.c for the Android Arm architecture:

whisper.cpp/ggml.c

Line 303 in 8dd855a

inline static void ggml_vec_dot_f16(const int n, float * restrict s, ggml_fp16_t * restrict x, ggml_fp16_t * restrict y) {
whisper.cpp/ggml.c

Line 476 in 8dd855a

inline static void ggml_vec_mad_f16(const int n, ggml_fp16_t * restrict y, ggml_fp16_t * restrict x, const float v) {

I don't think that the current implementation is optimal - it was just something that I hacked to make it run on RPi4.

But in any case - this will probably lead to a few 10s of percent improvement at best. Not sure what is your expectation for these devices. The large model inference will always be at least an order of magnitude longer than the audio length on a phone device.

I don't know what GPUs are available available on modern mobile devices, but I don't plan on supporting them. Usually, it involves using some complex framework (e.g. CUDA, OpenCL, Metal, etc) and it take a lot of expertise and experience to utilize these efficiently.

Regarding the algorithm improvement:
The inference algorithm consists of relatively simple operations with matrices and vectors. Not much can be done to improve the algorithm, other than implementing efficiently these basic operations. The common strategies are:

SIMD implementation
cache-aware data storage
multi-threading
packing data into smaller elements (for example 32-bit float into 16-bit or 8-bit float / int) at the cost of some precision

I have an idea for reducing the memory for the large model even further that I want to experiment with at some point, but most likely it will fail. So, I don't think the algorithm can be improved in any significant way.

ArtyomZemlyak · 2022-10-04T11:08:05Z

An alternative solution could be to retrain a small model for a data domain, a specific task, or a specific language.

I think that you want to use a large model just because of its quality. Therefore, it is possible that it is worth considering additional training options for smaller models, but train them so that their quality is satisfactory.

But I can say that with Whisper it will not be an easy task now, since there are no official scripts for training or pre-training. But there are a couple of some written by craftsmen.

Now I'm trying a solution combined from two such scripts, but so far the quality is even worse compared to the original. On the most pre-training dataset, CER decreases, but on real examples, recognition becomes worse.

ggerganov · 2022-10-05T20:04:58Z

@ekorudi You might want to give a try using the latest master branch - it is possible that the performance is a bit better now. Let me know if there is a build issue.

ekorudi · 2022-10-05T23:05:39Z

Success on compile for Android

$ make
/ok/whisper/arm-compiler/bin/aarch64-linux-android28-clang  -O3 -std=c11 -Wall -Wextra -Wno-unused-parameter -Wno-unused-function -pthread -mavx -mavx2 -mfma -mf16c   -c ggml.c
clang80: warning: argument unused during compilation: '-mavx' [-Wunused-command-line-argument]
clang80: warning: argument unused during compilation: '-mavx2' [-Wunused-command-line-argument]
clang80: warning: argument unused during compilation: '-mfma' [-Wunused-command-line-argument]
clang80: warning: argument unused during compilation: '-mf16c' [-Wunused-command-line-argument]
/ok/whisper/arm-compiler/bin/aarch64-linux-android28-clang++ -O3 -std=c++11 -Wall -Wextra -Wno-unused-parameter -Wno-unused-function -pthread -c whisper.cpp
/ok/whisper/arm-compiler/bin/aarch64-linux-android28-clang++ -O3 -std=c++11 -Wall -Wextra -Wno-unused-parameter -Wno-unused-function -pthread main.cpp whisper.o ggml.o -o main -static-libstdc++

Error on compile for Linux Intel

$ make
cc  -O3 -std=c11 -Wall -Wextra -Wno-unused-parameter -Wno-unused-function -pthread -mavx -mavx2 -mfma -mf16c   -c ggml.c
ggml.c:189:36: error: initializer element is not constant
 const size_t CACHE_LINE_SIZE_F32 = CACHE_LINE_SIZE/sizeof(float);
                                    ^~~~~~~~~~~~~~~
Makefile:60: recipe for target 'ggml.o' failed
make: *** [ggml.o] Error 1

ekorudi · 2022-10-06T01:36:56Z

I changed Makefile to simpler one to make it work :

gcc="/ok/whisper/arm-compiler/bin/aarch64-linux-android28-clang"
gpp="/ok/whisper/arm-compiler/bin/aarch64-linux-android28-clang++"

main: ggml.o whisper.o main.o 
	$(gpp) -pthread -o main ggml.o main.o whisper.o -static-libstdc++
	adb push main /data/local/tmp/main

ggml.o: ggml.c ggml.h
	$(gcc) -pthread -O3 -c ggml.c -mcpu=cortex-a75  

main.o: main.cpp ggml.h
	$(gpp) -pthread -O3 -std=c++11 -c main.cpp 

whisper.o: whisper.cpp whisper.h
	$(gpp) -pthread -O3 -std=c++11 -c whisper.cpp 


# clean up the directory
clean:
	rm -f *.o main

ArtyomZemlyak · 2022-10-06T01:56:13Z

Error on compile for Linux Intel

Has same issue, then trying compile on server.
Solved installed gcc-9 (previos gcc-7).

…nstant

Fix build on Windows

ggerganov added the question Further information is requested label Oct 4, 2022

ggerganov added a commit that referenced this issue Oct 7, 2022

ref #11, #18, #26 : fix CACHE_LINE_SIZE constant

e29a5da

ggerganov mentioned this issue Oct 8, 2022

How do I compile to a shared library? without libc++_shared.so ? #30

Closed

ggerganov closed this as completed Oct 26, 2022

anandijain pushed a commit to anandijain/whisper.cpp that referenced this issue Apr 28, 2023

ref ggerganov#11, ggerganov#18, ggerganov#26 : fix CACHE_LINE_SIZE co…

4f5f09d

…nstant

warkcod mentioned this issue Jun 8, 2023

OpenCL clCreateCommandQueue error -30 on MacOS 13.4 intel #996

Open

kultivator-consulting pushed a commit to KultivatorConsulting/whisper.cpp that referenced this issue Feb 12, 2024

Merge pull request ggerganov#18 from jbayardo/master

2e17fe9

Fix build on Windows

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cheaper hardware to run bigger model #18

Cheaper hardware to run bigger model #18

ekorudi commented Oct 4, 2022 •

edited

Loading

ggerganov commented Oct 4, 2022

ArtyomZemlyak commented Oct 4, 2022

ggerganov commented Oct 5, 2022

ekorudi commented Oct 5, 2022 •

edited

Loading

ekorudi commented Oct 6, 2022

ArtyomZemlyak commented Oct 6, 2022

Cheaper hardware to run bigger model #18

Cheaper hardware to run bigger model #18

Comments

ekorudi commented Oct 4, 2022 • edited Loading

ggerganov commented Oct 4, 2022

ArtyomZemlyak commented Oct 4, 2022

ggerganov commented Oct 5, 2022

ekorudi commented Oct 5, 2022 • edited Loading

ekorudi commented Oct 6, 2022

ArtyomZemlyak commented Oct 6, 2022

ekorudi commented Oct 4, 2022 •

edited

Loading

ekorudi commented Oct 5, 2022 •

edited

Loading