Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make batched suppress_tokens behaviour same as in sequential #1194

Merged
merged 1 commit into from
Dec 11, 2024

Conversation

Purfview
Copy link
Contributor

@Purfview Purfview commented Dec 9, 2024

Make batched suppress_tokens behaviour same as in sequential mode.

Make batched suppress_tokens behaviour same as in sequential
@MahmoudAshraf97
Copy link
Collaborator

They should have the same behavior, the else case is already handled in the function itself

elif suppress_tokens is None or len(suppress_tokens) == 0:
suppress_tokens = [] # interpret empty string as an empty list

I'd suggest modifying the sequential code instead for consistency

@Purfview
Copy link
Contributor Author

Purfview commented Dec 11, 2024

They should have the same behavior, the else case is already handled in the function itself

Not same, as the function populates suppress_tokens.
I think original author had in mind an option to have suppress_tokens empty.

Look at #898

@MahmoudAshraf97
Copy link
Collaborator

I see, but I think the batched behavior is still the correct one, as you can see here, there are a certain set of tokens that are always suppressed regardless of the user request, these tokens are task tokens or prompt tokens so we should not expect them to be generated in any case

@MahmoudAshraf97
Copy link
Collaborator

@jordimas

@jordimas
Copy link
Contributor

jordimas commented Dec 11, 2024

Hello!
I have very little time to check the code, but from what I remember there are 3 uses cases:

  • User provides None tokens (nothing is applied)
  • User provides -1 and then the defaults are used
  • The user provides a list and then the defaults + user's list is used

Consider also that adding suppress is not called is the if evaluates to False (e.g. None)
https://github.com/openai/whisper/blob/90db0de1896c23cbfaf0c58bc2d30665f709f170/whisper/decoding.py#L557C25-L557C40

What I can tell is that I test extensively OpenAI and faster-whisper to behave the same in all cases when I did the PR.
I think that @Purfview is right.

@MahmoudAshraf97
Copy link
Collaborator

Thanks all, I will merge then

@MahmoudAshraf97 MahmoudAshraf97 merged commit f32c0e8 into SYSTRAN:master Dec 11, 2024
3 checks passed
@Purfview Purfview deleted the patch-2 branch December 11, 2024 12:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants