whisper.swiftui example not working #1720

jeybee · 2024-01-03T23:55:05Z

When running the whisper.swiftui example, compiled in XCode, transcription fails with the following log:

About to run whisper_full
whisper_full_with_state: failed to encode
Failed to run the model

This is using the ggml-base.en.bin model. The whisper.objc sample run on the same machine with the same model works fine.

Tested on a MBP M1 16GB.

ggerganov · 2024-01-04T12:32:11Z

We need to apply similar fix as we did in: ggerganov/llama.cpp#4754

ggerganov · 2024-01-04T13:02:36Z

Pinging @singularity-s0 as they fixed the build in llama.cpp

Here it might be more difficult because the examples uses the whisper.cpp Swift.Package which in turn depends on the ggml Swift.Package. I tried to add a similar build rule, but couldn't figure out the details, so help will be appreciated

singularity-s0 · 2024-01-04T14:10:20Z

Xcode build rules don't seem to apply to Swift packages. In fact, this post suggests custom build behavior for Swift packages might not be supported by Xcode at all. Need to find another way around this.

ggerganov · 2024-01-04T14:37:29Z

Ok, pinging @1-ashraful-islam as well

jeybee · 2024-01-04T19:12:27Z

Just wanted to flag that even when manually changing ggml to look for the default.metallib that does exist, and having the Metal device successfully initialised, the same error log still occurs.

ggerganov · 2024-01-04T19:47:37Z

Did you change the ggml-metal.m in whisper.cpp or in ggml? You have to change it in the latter because that is what the Swift package uses

zshannon · 2024-01-04T20:13:32Z

ggml_metal_init: allocating
ggml_metal_init: found device: Apple M1 Max
ggml_metal_init: picking default device: Apple M1 Max
ggml_metal_init: ggml.metallib not found, loading from source
ggml_metal_init: GGML_METAL_PATH_RESOURCES = nil
ggml_metal_init: error: could not use bundle path to find ggml-metal.metal, falling back to trying cwd
ggml_metal_init: loading 'ggml-metal.metal'
ggml_metal_init: error: Error Domain=NSCocoaErrorDomain Code=260 "The file “ggml-metal.metal” couldn’t be opened because there is no such file." UserInfo={NSFilePath=ggml-metal.metal, NSUnderlyingError=0x600002e4e280 {Error Domain=NSPOSIXErrorDomain Code=2 "No such file or directory"}}
whisper_backend_init: ggml_backend_metal_init() failed

So seems like moving ggml to a SPM dependency broke loading the metal file?

jeybee · 2024-01-04T20:17:57Z

Did you change the ggml-metal.m in whisper.cpp or in ggml? You have to change it in the latter because that is what the Swift package uses

Yes, I changed it in GGML and in the logs I can see it successfully loads and allocates the Metal buffers but transcription still fails.

ggerganov · 2024-01-04T20:59:37Z

@jeybee I figure it out - there was a divergence in the ggml API because of commit a3d0aa7

After syncing it back to the ggml repo (ggerganov/ggml@9a867f1) the SwiftUI example now works correctly (make sure to update the Swift Packages to latest version: Xcode -> File -> Packages -> Update to latest package version)

Still, the problem with ggml.metallib remains unresolved, so the example will fallback to CPU transcription if it cannot load ggml.metallib

zshannon · 2024-01-04T21:05:57Z

It might be a problem of SPM that bundle resources can't be copied from dependencies, so eg copying ggml-metal.metal in ggml then depending on ggml in whisper.cpp/llama.cpp doesn't pull the metal file into the final build. You worked around it in the swiftui example in llama.cpp by adding a build step to Xcode, but that was only viable because the metal file is present in the llama.cpp repo, which sorta blows up the value of using SPM to consume the package... (looking into work arounds now)

zshannon · 2024-01-04T21:10:46Z

@ggerganov will the metal files always be present in the respective whisper/llama repos and sync'd with the ggml repo? Perhaps copying the metal files in each whisper/llama swift package while depending on the ggml swift package's compiled ggml lib (but not metal files) solves the twin problems of needing the metal files + avoiding the duplication of symbols compiler error when using both libs?

ggerganov · 2024-01-04T21:13:14Z

Yes, the metal files will always be present in the downstream Swift packages. Probably this is the way to go then

jeybee · 2024-01-04T21:44:50Z

@jeybee I figure it out - there was a divergence in the ggml API because of commit a3d0aa7

After syncing it back to the ggml repo (ggerganov/ggml@9a867f1) the SwiftUI example now works correctly (make sure to update the Swift Packages to latest version: Xcode -> File -> Packages -> Update to latest package version)

Still, the problem with ggml.metallib remains unresolved, so the example will fallback to CPU transcription if it cannot load ggml.metallib

That did fix the issue, thanks! For now, I'm just updating ggml to look for default.metallib. Is there some reason you couldn't also just change it to do that?

zshannon · 2024-01-04T22:58:24Z

Ok reverting the change to ggml/src/ggml-metal.m:260 searching for "ggml.metallib" instead of "default.metallib" from llama#4705 in ggml fixes this for me, but I'm assuming you made that change to fix something else @ggerganov?

Alternatively, we can probably create a Swift Package Plugin for ggml with a build step that both copies & compiles the metal file without combining into a single metallib (as is the Xcode default), but seems like overkill to me perhaps because I don't understand why the fallback to search for the uncompiled metal file is here..

1-ashraful-islam · 2024-01-05T01:05:50Z

I have a forked version of ggml, whisper.cpp from 30th December and everything seems to work fine and loads metal. This is with whisper.cpp swift package declaration that uses ggml as dependency. Here's a screenshot that loads default.metallib

1-ashraful-islam · 2024-01-05T04:18:27Z

I can also confirm the observation reported by @zshannon and @jeybee regarding ggml-metal.m. Reverting to default.metallib instead of ggml.metallib solves the issue.

During the build process, for ggml package, .metal file gets compiled into default.metallib by default.

Is it possible to revert it back?

singularity-s0 · 2024-01-05T07:30:34Z

Does that mean the extra build step added to llama.cpp swiftui project was also unnecessary?

ggerganov · 2024-01-05T08:03:30Z

We can obviously revert to searching for default.metallib, but it seems too hacky. What if some other project also uses the same approach - we will have a default.metallib collision. Would like to see if there is way to fix this properly before reverting

zshannon · 2024-01-05T08:26:26Z

It's my understanding from the research I did today that Xcode bundles all the metal code into a single default.metallib, so yeah if there are other libs with metal code it'll be merged with ggml into a single compiled binary (I could be wrong) and ggml will still have access to its logic.

1-ashraful-islam · 2024-01-05T09:45:54Z

I concur with @zshannon, and came to a similar understanding after reading through the documentation and forums for a few hours. Based on what I understand - this is the default behavior of Swift package manager. To have custom metallib filenames we need to either add extra build steps or add build tool plugins. Prior to Swift tools version 5.3, it seems like developers would need to manually compile the metal files into metallib.

Also, parsing through the application bundle- I see the default.metallib inside both ggml_ggml.bundle, and whisper_whisper.bundle.

See also:
swiftlang/swift-package-manager#5822
swiftlang/swift-package-manager#5823
swiftlang/swift-package-manager#6124

https://github.com/schwa/MetalCompilerPlugin

1-ashraful-islam · 2024-01-05T10:52:08Z

Additionally, I have both whisper.cpp and llama.cpp load default.metallib from the ggml_ggml.bundle without error in a single swift project here (application name and identifier omitted from the log):

.....Loading WhisperState........
whisper_init_from_file_with_params_no_state: loading model from '/private/var/containers/Bundle/Application/----/models/ggml-base.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head  = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 512
whisper_model_load: n_text_head   = 8
whisper_model_load: n_text_layer  = 6
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 2 (base)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: n_langs       = 99
whisper_backend_init: using Metal backend
ggml_metal_init: allocating
ggml_metal_init: picking default device: Apple A14 GPU
ggml_metal_init: loading '/var/containers/Bundle/Application/-----/ggml_ggml.bundle/default.metallib'
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =   140.64 MiB, (  141.77)
whisper_model_load:    Metal buffer size =   147.46 MB
whisper_model_load: model size    =  147.37 MB
whisper_backend_init: using Metal backend
ggml_metal_init: allocating
ggml_metal_init: picking default device: Apple A14 GPU
ggml_metal_init: loading '/var/containers/Bundle/Application/-----/ggml_ggml.bundle/default.metallib'
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =    15.75 MiB, (  157.52)
whisper_init_state: kv self size  =   16.52 MB
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =    17.58 MiB, (  175.09)
whisper_init_state: kv cross size =   18.43 MB
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =     0.02 MiB, (  175.11)
whisper_init_state: compute buffer (conv)   =   14.86 MB
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =     0.02 MiB, (  175.12)
whisper_init_state: compute buffer (encode) =   85.99 MB
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =     0.02 MiB, (  175.14)
whisper_init_state: compute buffer (cross)  =    4.78 MB
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =     0.02 MiB, (  175.16)
whisper_init_state: compute buffer (decode) =   96.48 MB
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =    12.55 MiB, (  187.69)
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =    80.39 MiB, (  268.06)
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =     2.94 MiB, (  270.98)
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =    90.39 MiB, (  361.36)
.....Done Loading WhisperState........
.....Loading LlamaState........
llama_model_loader: loaded meta data with 20 key-value pairs and 201 tensors from /private/var/containers/Bundle/Application/-----/models/tinyllama-1.1b-chat-v0.3.Q4_K_M.gguf (version GGUF V2)
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = py007_tinyllama-1.1b-chat-v0.3
llama_model_loader: - kv   2:                       llama.context_length u32              = 2048
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 2048
llama_model_loader: - kv   4:                          llama.block_count u32              = 22
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 5632
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 64
llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 4
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  10:                       llama.rope.freq_base f32              = 10000.000000
llama_model_loader: - kv  11:                          general.file_type u32              = 15
llama_model_loader: - kv  12:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  13:                      tokenizer.ggml.tokens arr[str,32003]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
llama_model_loader: - kv  14:                      tokenizer.ggml.scores arr[f32,32003]   = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  15:                  tokenizer.ggml.token_type arr[i32,32003]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv  16:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  17:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  18:            tokenizer.ggml.unknown_token_id u32              = 0
llama_model_loader: - kv  19:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   45 tensors
llama_model_loader: - type q4_K:  135 tensors
llama_model_loader: - type q6_K:   21 tensors
llm_load_vocab: special tokens definition check successful ( 262/32003 ).
llm_load_print_meta: format           = GGUF V2
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 32003
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: n_ctx_train      = 2048
llm_load_print_meta: n_embd           = 2048
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 4
llm_load_print_meta: n_layer          = 22
llm_load_print_meta: n_rot            = 64
llm_load_print_meta: n_gqa            = 8
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff             = 5632
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 2048
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: model type       = 1B
llm_load_print_meta: model ftype      = Q4_K - Medium
llm_load_print_meta: model params     = 1.10 B
llm_load_print_meta: model size       = 636.18 MiB (4.85 BPW) 
llm_load_print_meta: general.name     = py007_tinyllama-1.1b-chat-v0.3
llm_load_print_meta: BOS token        = 1 '<s>'
llm_load_print_meta: EOS token        = 2 '</s>'
llm_load_print_meta: UNK token        = 0 '<unk>'
llm_load_print_meta: LF token         = 13 '<0x0A>'
llm_load_tensors: ggml ctx size       =    0.08 MiB
ggml_backend_metal_buffer_from_ptr: allocated buffer, size =   636.89 MiB, (  998.25)
llm_load_tensors: system memory used  =  636.26 MiB
......................................................................................
Using 4 threads
llama_new_context_with_model: n_ctx      = 2048
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
ggml_metal_init: allocating
ggml_metal_init: picking default device: Apple A14 GPU
ggml_metal_init: loading '/var/containers/Bundle/Application/----/ggml_ggml.bundle/default.metallib'
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =    44.00 MiB, ( 1042.25)
llama_new_context_with_model: KV self size  =   44.00 MiB, K (f16):   22.00 MiB, V (f16):   22.00 MiB
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =     0.02 MiB, ( 1042.27)
llama_build_graph: non-view tensors processed: 466/466
llama_new_context_with_model: compute buffer total size = 147.19 MiB
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =   144.02 MiB, ( 1186.27)
.....Done Loading LlamaState........

ggerganov · 2024-01-05T10:57:37Z

Ok, thanks for investigating - I will make the changes to revert back to default.metallib and remove the extra build step from the project

ggerganov · 2024-01-05T14:35:45Z

Should be OK now using latest master

1-ashraful-islam · 2024-01-06T13:09:38Z

Thanks for the quick resolution @ggerganov. I believe the issue is resolved and can be closed.

ggerganov added the good first issue Good for newcomers label Jan 4, 2024

ggerganov added help wanted Extra attention is needed and removed good first issue Good for newcomers labels Jan 4, 2024

ggerganov mentioned this issue Jan 5, 2024

metal : switch back to default.metallib ggerganov/ggml#681

Merged

ggerganov mentioned this issue Jan 5, 2024

sync : ggml ggerganov/llama.cpp#4784

Merged

ggerganov closed this as completed Jan 6, 2024

azarovalex mentioned this issue Jan 7, 2024

Use llama.cpp as SPM package in iOS sample ggerganov/llama.cpp#4804

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

whisper.swiftui example not working #1720

whisper.swiftui example not working #1720

jeybee commented Jan 3, 2024

ggerganov commented Jan 4, 2024

ggerganov commented Jan 4, 2024

singularity-s0 commented Jan 4, 2024

ggerganov commented Jan 4, 2024

jeybee commented Jan 4, 2024

ggerganov commented Jan 4, 2024

zshannon commented Jan 4, 2024 •

edited

Loading

jeybee commented Jan 4, 2024

ggerganov commented Jan 4, 2024

zshannon commented Jan 4, 2024

zshannon commented Jan 4, 2024

ggerganov commented Jan 4, 2024

jeybee commented Jan 4, 2024

zshannon commented Jan 4, 2024 •

edited

Loading

1-ashraful-islam commented Jan 5, 2024

1-ashraful-islam commented Jan 5, 2024 •

edited

Loading

singularity-s0 commented Jan 5, 2024

ggerganov commented Jan 5, 2024

zshannon commented Jan 5, 2024

1-ashraful-islam commented Jan 5, 2024 •

edited

Loading

1-ashraful-islam commented Jan 5, 2024

ggerganov commented Jan 5, 2024

ggerganov commented Jan 5, 2024

1-ashraful-islam commented Jan 6, 2024

whisper.swiftui example not working #1720

whisper.swiftui example not working #1720

Comments

jeybee commented Jan 3, 2024

ggerganov commented Jan 4, 2024

ggerganov commented Jan 4, 2024

singularity-s0 commented Jan 4, 2024

ggerganov commented Jan 4, 2024

jeybee commented Jan 4, 2024

ggerganov commented Jan 4, 2024

zshannon commented Jan 4, 2024 • edited Loading

jeybee commented Jan 4, 2024

ggerganov commented Jan 4, 2024

zshannon commented Jan 4, 2024

zshannon commented Jan 4, 2024

ggerganov commented Jan 4, 2024

jeybee commented Jan 4, 2024

zshannon commented Jan 4, 2024 • edited Loading

1-ashraful-islam commented Jan 5, 2024

1-ashraful-islam commented Jan 5, 2024 • edited Loading

singularity-s0 commented Jan 5, 2024

ggerganov commented Jan 5, 2024

zshannon commented Jan 5, 2024

1-ashraful-islam commented Jan 5, 2024 • edited Loading

1-ashraful-islam commented Jan 5, 2024

ggerganov commented Jan 5, 2024

ggerganov commented Jan 5, 2024

1-ashraful-islam commented Jan 6, 2024

zshannon commented Jan 4, 2024 •

edited

Loading

zshannon commented Jan 4, 2024 •

edited

Loading

1-ashraful-islam commented Jan 5, 2024 •

edited

Loading

1-ashraful-islam commented Jan 5, 2024 •

edited

Loading