You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Arabic start to show up but with 50% missing letters
Mel spectrogram is calculated...!
2024-12-13 13:00:37.722 17057-17091 WhisperEngineJava com.whispertflite D output_len: 451
2024-12-13 13:00:37.723 17057-17091 WhisperEngineJava com.whispertflite D Skipping token: 50258, word: <|startoftranscript|>
2024-12-13 13:00:37.723 17057-17091 WhisperEngineJava com.whispertflite D Skipping token: 50272, word: <|ar|>
2024-12-13 13:00:37.723 17057-17091 WhisperEngineJava com.whispertflite D It is Transcription...
2024-12-13 13:00:37.723 17057-17091 WhisperEngineJava com.whispertflite D Skipping token: 50359, word: <|transcribe|>
2024-12-13 13:00:37.724 17057-17091 WhisperEngineJava com.whispertflite D Skipping token: 50363, word: <|notimestamps|>
2024-12-13 13:00:37.724 17057-17091 WhisperEngineJava com.whispertflite D Adding token: 21136, word: ĠاÙĦس
2024-12-13 13:00:37.724 17057-17091 WhisperEngineJava com.whispertflite D Adding token: 37440, word: ÙĦاÙħ
2024-12-13 13:00:37.724 17057-17091 WhisperEngineJava com.whispertflite D Adding token: 25894, word: ĠعÙĦÙĬ
2024-12-13 13:00:37.724 17057-17091 WhisperEngineJava com.whispertflite D Adding token: 24793, word: ÙĥÙħ
2024-12-13 13:00:37.725 17057-17091 WhisperEngineJava com.whispertflite D Inference is executed...!
2024-12-13 13:00:37.726 17057-17091 MainActivity com.whispertflite D Result: ?ا�?س�?ا�??ع�?�?�?�?
I chatgpt the problem and reached to this point, but I can't do progress any any more. I think it's not related to unicode issue, more likely the way the vocabulary file ignoring 50% of Arabic chars , I also tried using the files in py but I didn't manage to see any Arabic text at all
The text was updated successfully, but these errors were encountered:
I did all the steps to generate the tflite and bin files, and included the decoder id
forced_decoder_ids = processor.get_decoder_prompt_ids(language="ar", task="transcribe")
Arabic start to show up but with 50% missing letters
I chatgpt the problem and reached to this point, but I can't do progress any any more. I think it's not related to unicode issue, more likely the way the vocabulary file ignoring 50% of Arabic chars , I also tried using the files in py but I didn't manage to see any Arabic text at all
The text was updated successfully, but these errors were encountered: