-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/add hotwords #731
Feature/add hotwords #731
Conversation
@nguyendc-systran hello, please check out this pr. |
@jax-explorer, hello. Can you please provide an example of your test cases? |
ok, comfyUI is a new word, it is The most powerful and modular stable diffusion GUI and backend. the test video is https://www.youtube.com/watch?v=Ybu6qTbEsew if no hotwords It is incorrectly recognized as Conf UI if add hotwords |
@trungkienbkhn hello, |
@jax-explorer , thanks for your PR. LGTM. |
ok, thanks. |
@jax-explorer I encountered an issue when using fast-whisper where the person name in initial_prompt only takes effect in the first part. Can your method solve this problem. How should I use it. thank |
@RichardQin1 Yes, this PR will solve your problem. |
I tested the patch out and it does seems to improve the vocabulary if given appropriate words Edit: I've noticed a small side effect, when the model is hallucinating, it will show the hot words, i personally can clean it up by a post-processor. But it worth mentioning. the hallucinated line is exact copy of the hot_words given. Edit2: After longer test, the hallucination still happens and there are variety to it. Sometimes it's exact copy of hot_words, on the next it's slight variation of the hot_words |
@arabcoders hello, It is true that the output of the hallucination will be affected by this setting and will change from the last few sentences of the previous window to the hot word related sentences, but I don't think that this is a side effect because when a hallucination occurs we shouldn't be concerned about the output of the hallucination, but rather the resolution of the hallucination, e.g. by using a vad, etc. |
Hi, this was with silvero vad filtering out the silence segments. I noticed it's occurring when the voice pitch changes i.e. before a song for example. This rarely happens without this patch. The promote reset when that happens and because this patch add hot words when the prompt is empty this occurs more frequently. I suggest you implement a more state aware injection instead of blindly adding the hot words when the prompt is empty. Thank you. |
@arabcoders hi, Got it, the equivalent of using hotwords where the original illusion didn't appear right, is there a link to an audio that can reproduce this problem? I'll try to modify and test. |
Sure, try this partial clip, i couldn't upload the entire thing as it's 2h+, this a 10min clip of that concert and it shows the problem i am speaking about. you can download this clip. the hot words i used {
"task": "translate",
"language": "Japanese",
"temperature": [
0.0,
0.2,
0.4,
0.6000000000000001,
0.8,
1.0
],
"best_of": 5,
"beam_size": 5,
"patience": 2,
"length_penalty": null,
"suppress_tokens": "-1",
"initial_prompt": null,
"condition_on_previous_text": true,
"compression_ratio_threshold": 2.4,
"logprob_threshold": -1.0,
"no_speech_threshold": 0.6,
"word_timestamps": false,
"prepend_punctuations": "\"'“¿([{-",
"append_punctuations": "\"'.。,,!!??::”)]}、"
} |
Thanks for your PR ! It's very useful to me . |
@JH90iOS Thanks for the affirmation, as I've been busy lately and haven't checked out the previous feedback on adding illusions. |
@jax-explorer Thanks for your PR! This helps me a lot. But I also encontered these two problems, please tell me if there is any solution:
|
Hi! |
Hi! |
and how to add hotwords in terminal command? because the segment_resolution doesn't work in python usage and the hotwords are not working in commanline. |
hello!
During the transcription process, I often encounter some proprietary or new vocabulary, and Whisper cannot handle it well. I searched for solutions, and the community provided two options:
Fine-tuning the model: This approach is costly, and it's not practical to fine-tune the model every time a new term emerges.
Using initial_prompt: However, initial_prompt only applies to the first window. If specialized terms don't appear at the beginning, this method is ineffective.
Upon reviewing other transcription models, it's common practice to use hotwords. So, I implemented this feature. My approach is to add hotword-related prompts before each transcription window. Since there's a maximum length limit, I occupy the space previously used by the prefix. When the prefix isn't set, hotwords take effect. After testing, it indeed resolved the issue of specialized vocabulary in my scenario.
The following is the community discussion on this issue:
openai/whisper#1477
https://discuss.huggingface.co/t/adding-custom-vocabularies-on-whisper/29311
https://stackoverflow.com/questions/73833916/how-can-i-give-some-hint-phrases-to-openais-whisper-asr
Since my project utilizes faster-whisper, I will first submit it to the git project. If the submission is approved, I will synchronize it with the whisper project.