Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

whisper.swiftui example not working #1720

Closed
jeybee opened this issue Jan 3, 2024 · 24 comments
Closed

whisper.swiftui example not working #1720

jeybee opened this issue Jan 3, 2024 · 24 comments
Labels
help wanted Extra attention is needed

Comments

@jeybee
Copy link

jeybee commented Jan 3, 2024

When running the whisper.swiftui example, compiled in XCode, transcription fails with the following log:

About to run whisper_full
whisper_full_with_state: failed to encode
Failed to run the model

This is using the ggml-base.en.bin model. The whisper.objc sample run on the same machine with the same model works fine.

Tested on a MBP M1 16GB.

@ggerganov
Copy link
Owner

We need to apply similar fix as we did in: ggerganov/llama.cpp#4754

@ggerganov ggerganov added the good first issue Good for newcomers label Jan 4, 2024
@ggerganov
Copy link
Owner

Pinging @singularity-s0 as they fixed the build in llama.cpp

Here it might be more difficult because the examples uses the whisper.cpp Swift.Package which in turn depends on the ggml Swift.Package. I tried to add a similar build rule, but couldn't figure out the details, so help will be appreciated

@singularity-s0
Copy link

Xcode build rules don't seem to apply to Swift packages. In fact, this post suggests custom build behavior for Swift packages might not be supported by Xcode at all. Need to find another way around this.

@ggerganov
Copy link
Owner

Ok, pinging @1-ashraful-islam as well

@ggerganov ggerganov added help wanted Extra attention is needed and removed good first issue Good for newcomers labels Jan 4, 2024
@jeybee
Copy link
Author

jeybee commented Jan 4, 2024

Just wanted to flag that even when manually changing ggml to look for the default.metallib that does exist, and having the Metal device successfully initialised, the same error log still occurs.

@ggerganov
Copy link
Owner

Did you change the ggml-metal.m in whisper.cpp or in ggml? You have to change it in the latter because that is what the Swift package uses

@zshannon
Copy link

zshannon commented Jan 4, 2024

ggml_metal_init: allocating
ggml_metal_init: found device: Apple M1 Max
ggml_metal_init: picking default device: Apple M1 Max
ggml_metal_init: ggml.metallib not found, loading from source
ggml_metal_init: GGML_METAL_PATH_RESOURCES = nil
ggml_metal_init: error: could not use bundle path to find ggml-metal.metal, falling back to trying cwd
ggml_metal_init: loading 'ggml-metal.metal'
ggml_metal_init: error: Error Domain=NSCocoaErrorDomain Code=260 "The file “ggml-metal.metal” couldn’t be opened because there is no such file." UserInfo={NSFilePath=ggml-metal.metal, NSUnderlyingError=0x600002e4e280 {Error Domain=NSPOSIXErrorDomain Code=2 "No such file or directory"}}
whisper_backend_init: ggml_backend_metal_init() failed

So seems like moving ggml to a SPM dependency broke loading the metal file?

@jeybee
Copy link
Author

jeybee commented Jan 4, 2024

Did you change the ggml-metal.m in whisper.cpp or in ggml? You have to change it in the latter because that is what the Swift package uses

Yes, I changed it in GGML and in the logs I can see it successfully loads and allocates the Metal buffers but transcription still fails.

@ggerganov
Copy link
Owner

@jeybee I figure it out - there was a divergence in the ggml API because of commit a3d0aa7

After syncing it back to the ggml repo (ggerganov/ggml@9a867f1) the SwiftUI example now works correctly (make sure to update the Swift Packages to latest version: Xcode -> File -> Packages -> Update to latest package version)

Still, the problem with ggml.metallib remains unresolved, so the example will fallback to CPU transcription if it cannot load ggml.metallib

@zshannon
Copy link

zshannon commented Jan 4, 2024

It might be a problem of SPM that bundle resources can't be copied from dependencies, so eg copying ggml-metal.metal in ggml then depending on ggml in whisper.cpp/llama.cpp doesn't pull the metal file into the final build. You worked around it in the swiftui example in llama.cpp by adding a build step to Xcode, but that was only viable because the metal file is present in the llama.cpp repo, which sorta blows up the value of using SPM to consume the package... (looking into work arounds now)

@zshannon
Copy link

zshannon commented Jan 4, 2024

@ggerganov will the metal files always be present in the respective whisper/llama repos and sync'd with the ggml repo? Perhaps copying the metal files in each whisper/llama swift package while depending on the ggml swift package's compiled ggml lib (but not metal files) solves the twin problems of needing the metal files + avoiding the duplication of symbols compiler error when using both libs?

@ggerganov
Copy link
Owner

Yes, the metal files will always be present in the downstream Swift packages. Probably this is the way to go then

@jeybee
Copy link
Author

jeybee commented Jan 4, 2024

@jeybee I figure it out - there was a divergence in the ggml API because of commit a3d0aa7

After syncing it back to the ggml repo (ggerganov/ggml@9a867f1) the SwiftUI example now works correctly (make sure to update the Swift Packages to latest version: Xcode -> File -> Packages -> Update to latest package version)

Still, the problem with ggml.metallib remains unresolved, so the example will fallback to CPU transcription if it cannot load ggml.metallib

That did fix the issue, thanks! For now, I'm just updating ggml to look for default.metallib. Is there some reason you couldn't also just change it to do that?

@zshannon
Copy link

zshannon commented Jan 4, 2024

Ok reverting the change to ggml/src/ggml-metal.m:260 searching for "ggml.metallib" instead of "default.metallib" from llama#4705 in ggml fixes this for me, but I'm assuming you made that change to fix something else @ggerganov?

Alternatively, we can probably create a Swift Package Plugin for ggml with a build step that both copies & compiles the metal file without combining into a single metallib (as is the Xcode default), but seems like overkill to me perhaps because I don't understand why the fallback to search for the uncompiled metal file is here..

@1-ashraful-islam
Copy link
Contributor

I have a forked version of ggml, whisper.cpp from 30th December and everything seems to work fine and loads metal. This is with whisper.cpp swift package declaration that uses ggml as dependency. Here's a screenshot that loads default.metallib

image

@1-ashraful-islam
Copy link
Contributor

1-ashraful-islam commented Jan 5, 2024

I can also confirm the observation reported by @zshannon and @jeybee regarding ggml-metal.m. Reverting to default.metallib instead of ggml.metallib solves the issue.

During the build process, for ggml package, .metal file gets compiled into default.metallib by default.

Is it possible to revert it back?

@singularity-s0
Copy link

Does that mean the extra build step added to llama.cpp swiftui project was also unnecessary?

@ggerganov
Copy link
Owner

We can obviously revert to searching for default.metallib, but it seems too hacky. What if some other project also uses the same approach - we will have a default.metallib collision. Would like to see if there is way to fix this properly before reverting

@zshannon
Copy link

zshannon commented Jan 5, 2024

It's my understanding from the research I did today that Xcode bundles all the metal code into a single default.metallib, so yeah if there are other libs with metal code it'll be merged with ggml into a single compiled binary (I could be wrong) and ggml will still have access to its logic.

@1-ashraful-islam
Copy link
Contributor

1-ashraful-islam commented Jan 5, 2024

I concur with @zshannon, and came to a similar understanding after reading through the documentation and forums for a few hours. Based on what I understand - this is the default behavior of Swift package manager. To have custom metallib filenames we need to either add extra build steps or add build tool plugins. Prior to Swift tools version 5.3, it seems like developers would need to manually compile the metal files into metallib.

Also, parsing through the application bundle- I see the default.metallib inside both ggml_ggml.bundle, and whisper_whisper.bundle.

See also:
swiftlang/swift-package-manager#5822
swiftlang/swift-package-manager#5823
swiftlang/swift-package-manager#6124

https://github.com/schwa/MetalCompilerPlugin

@1-ashraful-islam
Copy link
Contributor

Additionally, I have both whisper.cpp and llama.cpp load default.metallib from the ggml_ggml.bundle without error in a single swift project here (application name and identifier omitted from the log):

.....Loading WhisperState........
whisper_init_from_file_with_params_no_state: loading model from '/private/var/containers/Bundle/Application/----/models/ggml-base.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head  = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 512
whisper_model_load: n_text_head   = 8
whisper_model_load: n_text_layer  = 6
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 2 (base)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: n_langs       = 99
whisper_backend_init: using Metal backend
ggml_metal_init: allocating
ggml_metal_init: picking default device: Apple A14 GPU
ggml_metal_init: loading '/var/containers/Bundle/Application/-----/ggml_ggml.bundle/default.metallib'
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =   140.64 MiB, (  141.77)
whisper_model_load:    Metal buffer size =   147.46 MB
whisper_model_load: model size    =  147.37 MB
whisper_backend_init: using Metal backend
ggml_metal_init: allocating
ggml_metal_init: picking default device: Apple A14 GPU
ggml_metal_init: loading '/var/containers/Bundle/Application/-----/ggml_ggml.bundle/default.metallib'
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =    15.75 MiB, (  157.52)
whisper_init_state: kv self size  =   16.52 MB
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =    17.58 MiB, (  175.09)
whisper_init_state: kv cross size =   18.43 MB
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =     0.02 MiB, (  175.11)
whisper_init_state: compute buffer (conv)   =   14.86 MB
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =     0.02 MiB, (  175.12)
whisper_init_state: compute buffer (encode) =   85.99 MB
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =     0.02 MiB, (  175.14)
whisper_init_state: compute buffer (cross)  =    4.78 MB
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =     0.02 MiB, (  175.16)
whisper_init_state: compute buffer (decode) =   96.48 MB
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =    12.55 MiB, (  187.69)
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =    80.39 MiB, (  268.06)
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =     2.94 MiB, (  270.98)
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =    90.39 MiB, (  361.36)
.....Done Loading WhisperState........
.....Loading LlamaState........
llama_model_loader: loaded meta data with 20 key-value pairs and 201 tensors from /private/var/containers/Bundle/Application/-----/models/tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf (version GGUF V2)
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = py007_tinyllama-1.1b-chat-v0.3
llama_model_loader: - kv   2:                       llama.context_length u32              = 2048
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 2048
llama_model_loader: - kv   4:                          llama.block_count u32              = 22
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 5632
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 64
llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 4
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  10:                       llama.rope.freq_base f32              = 10000.000000
llama_model_loader: - kv  11:                          general.file_type u32              = 15
llama_model_loader: - kv  12:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  13:                      tokenizer.ggml.tokens arr[str,32003]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
llama_model_loader: - kv  14:                      tokenizer.ggml.scores arr[f32,32003]   = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  15:                  tokenizer.ggml.token_type arr[i32,32003]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv  16:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  17:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  18:            tokenizer.ggml.unknown_token_id u32              = 0
llama_model_loader: - kv  19:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   45 tensors
llama_model_loader: - type q4_K:  135 tensors
llama_model_loader: - type q6_K:   21 tensors
llm_load_vocab: special tokens definition check successful ( 262/32003 ).
llm_load_print_meta: format           = GGUF V2
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 32003
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: n_ctx_train      = 2048
llm_load_print_meta: n_embd           = 2048
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 4
llm_load_print_meta: n_layer          = 22
llm_load_print_meta: n_rot            = 64
llm_load_print_meta: n_gqa            = 8
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff             = 5632
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 2048
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: model type       = 1B
llm_load_print_meta: model ftype      = Q4_K - Medium
llm_load_print_meta: model params     = 1.10 B
llm_load_print_meta: model size       = 636.18 MiB (4.85 BPW) 
llm_load_print_meta: general.name     = py007_tinyllama-1.1b-chat-v0.3
llm_load_print_meta: BOS token        = 1 '<s>'
llm_load_print_meta: EOS token        = 2 '</s>'
llm_load_print_meta: UNK token        = 0 '<unk>'
llm_load_print_meta: LF token         = 13 '<0x0A>'
llm_load_tensors: ggml ctx size       =    0.08 MiB
ggml_backend_metal_buffer_from_ptr: allocated buffer, size =   636.89 MiB, (  998.25)
llm_load_tensors: system memory used  =  636.26 MiB
......................................................................................
Using 4 threads
llama_new_context_with_model: n_ctx      = 2048
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
ggml_metal_init: allocating
ggml_metal_init: picking default device: Apple A14 GPU
ggml_metal_init: loading '/var/containers/Bundle/Application/----/ggml_ggml.bundle/default.metallib'
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =    44.00 MiB, ( 1042.25)
llama_new_context_with_model: KV self size  =   44.00 MiB, K (f16):   22.00 MiB, V (f16):   22.00 MiB
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =     0.02 MiB, ( 1042.27)
llama_build_graph: non-view tensors processed: 466/466
llama_new_context_with_model: compute buffer total size = 147.19 MiB
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =   144.02 MiB, ( 1186.27)
.....Done Loading LlamaState........

@ggerganov
Copy link
Owner

Ok, thanks for investigating - I will make the changes to revert back to default.metallib and remove the extra build step from the project

@ggerganov
Copy link
Owner

Should be OK now using latest master

@1-ashraful-islam
Copy link
Contributor

Thanks for the quick resolution @ggerganov. I believe the issue is resolved and can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

5 participants