Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong EOS Masking ? #103

Open
sarvam31 opened this issue Jan 8, 2025 · 0 comments
Open

Wrong EOS Masking ? #103

sarvam31 opened this issue Jan 8, 2025 · 0 comments

Comments

@sarvam31
Copy link

sarvam31 commented Jan 8, 2025

I noticed that eos_token_id is unmasked only when no other token_id is unmasked (referring to get_next_token_acceptance_for_single_stack in IncrementalTokenRecognizer)

But suppose we have a grammar to generate 1 or 2 digit number expressed as
root ::= [0-9] ( [0-9] | )

Now top of stack allows generating 0-9 which is expected but after 1 time step when a token for single digit is sampled by llm stack updates to point to next 0-9 only and no stack entry accounts for epsilon. So, we cannot generate 1 digit number even if grammar allows it.

I thought it should be handled in a way where another stack entry is present, corresponding to epsilon, which points to next possible byte that can be generated if epsilon is chosen for current rule and in case epsilon is one of the last alternates in grammar then unmask eos token too.

Please let me know if I understood it right or you need further information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant