-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cheaper hardware to run bigger model #18
Comments
As I mentioned in #8, I think some extra performance could be gained by properly implementing the SIMD routines in
I don't think that the current implementation is optimal - it was just something that I hacked to make it run on RPi4. But in any case - this will probably lead to a few 10s of percent improvement at best. Not sure what is your expectation for these devices. The large model inference will always be at least an order of magnitude longer than the audio length on a phone device. I don't know what GPUs are available available on modern mobile devices, but I don't plan on supporting them. Usually, it involves using some complex framework (e.g. CUDA, OpenCL, Metal, etc) and it take a lot of expertise and experience to utilize these efficiently. Regarding the algorithm improvement:
I have an idea for reducing the memory for the large model even further that I want to experiment with at some point, but most likely it will fail. So, I don't think the algorithm can be improved in any significant way. |
An alternative solution could be to retrain a small model for a data domain, a specific task, or a specific language. I think that you want to use a large model just because of its quality. Therefore, it is possible that it is worth considering additional training options for smaller models, but train them so that their quality is satisfactory. But I can say that with Whisper it will not be an easy task now, since there are no official scripts for training or pre-training. But there are a couple of some written by craftsmen. Now I'm trying a solution combined from two such scripts, but so far the quality is even worse compared to the original. On the most pre-training dataset, CER decreases, but on real examples, recognition becomes worse. |
@ekorudi You might want to give a try using the latest |
Success on compile for Android
Error on compile for Linux Intel
|
I changed
|
Has same issue, then trying compile on server. |
Fix build on Windows
Refer our discussion at #8 , I can run
ggml-large.bin
, for same input audio 120 sec ( 2 minutes) in around 54 minutes on Samsung A52.What is your suggestion for optimization to run bigger model on cheaper hardware:
Will be happy if you share the resource I can learn to achieve that goal.
The text was updated successfully, but these errors were encountered: