Arabic Transcription #24

doit-ceo · 2024-12-14T14:27:31Z

I did all the steps to generate the tflite and bin files, and included the decoder id

forced_decoder_ids = processor.get_decoder_prompt_ids(language="ar", task="transcribe")

Arabic start to show up but with 50% missing letters

Mel spectrogram is calculated...!
2024-12-13 13:00:37.722 17057-17091 WhisperEngineJava       com.whispertflite                    D  output_len: 451
2024-12-13 13:00:37.723 17057-17091 WhisperEngineJava       com.whispertflite                    D  Skipping token: 50258, word: <|startoftranscript|>
2024-12-13 13:00:37.723 17057-17091 WhisperEngineJava       com.whispertflite                    D  Skipping token: 50272, word: <|ar|>
2024-12-13 13:00:37.723 17057-17091 WhisperEngineJava       com.whispertflite                    D  It is Transcription...
2024-12-13 13:00:37.723 17057-17091 WhisperEngineJava       com.whispertflite                    D  Skipping token: 50359, word: <|transcribe|>
2024-12-13 13:00:37.724 17057-17091 WhisperEngineJava       com.whispertflite                    D  Skipping token: 50363, word: <|notimestamps|>
2024-12-13 13:00:37.724 17057-17091 WhisperEngineJava       com.whispertflite                    D  Adding token: 21136, word: ĠØ§ÙĦØ³
2024-12-13 13:00:37.724 17057-17091 WhisperEngineJava       com.whispertflite                    D  Adding token: 37440, word: ÙĦØ§Ùħ
2024-12-13 13:00:37.724 17057-17091 WhisperEngineJava       com.whispertflite                    D  Adding token: 25894, word: ĠØ¹ÙĦÙĬ
2024-12-13 13:00:37.724 17057-17091 WhisperEngineJava       com.whispertflite                    D  Adding token: 24793, word: ÙĥÙħ
2024-12-13 13:00:37.725 17057-17091 WhisperEngineJava       com.whispertflite                    D  Inference is executed...!
2024-12-13 13:00:37.726 17057-17091 MainActivity            com.whispertflite                    D  Result: ?ا�?س�?ا�??ع�?�?�?�?

I chatgpt the problem and reached to this point, but I can't do progress any any more. I think it's not related to unicode issue, more likely the way the vocabulary file ignoring 50% of Arabic chars , I also tried using the files in py but I didn't manage to see any Arabic text at all

The text was updated successfully, but these errors were encountered:

vilassn · 2024-12-18T04:55:53Z

Can you try with base or small model?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Arabic Transcription #24

Arabic Transcription #24

doit-ceo commented Dec 14, 2024

vilassn commented Dec 18, 2024

Arabic Transcription #24

Arabic Transcription #24

Comments

doit-ceo commented Dec 14, 2024

vilassn commented Dec 18, 2024