Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX: Generating with mixed adapter batches and with beam search enabled #2287

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

BenjaminBossan
Copy link
Member

See #2283

Right now, using mixed adapter batches (introduced in #1558) with beam search generations does not work. This is because users need to pass the adapter names associated with each sample, i.e. the number of adapter names should be identical to the number of samples in the input.

When applying beam search, transformers internally repeats the samples once per beam (or so it looks like). Therefore, we have more samples during generation than samples in the input. Consequently, the adapter names have to be extended accordingly. This is now taken care of.

For encoder-decoder models, we need to be careful. I seems like only the decoder needs to be extended, whereas the encoder receives the original number of inputs. Therefore, when an encoder-decoder model is identified, the extension is only applied to the decoder part.

See huggingface#2283

Right now, using mixed adapter batches with beam search generations does
not work. This is because users need to pass the adapter names
associated with each sample, i.e. the number of adapter names should be
identical to the number of samples in the input.

When applying beam search, transformers internally repeats the samples
once per beam (or so it looks like). Therefore, we have more samples
during generation than samples in the input. Consequently, the adapter
names have to be extended accordingly. This is now taken care of.

Unfortunately, this does not work for encoder-decoder models yet. With
these models, there is always a size mismatch, whether adapter names are
extended or not. What I suspect is happening is that only the decoder
needs to be extended, but right now I don't see a way to implement this
distinction in PEFT. Therefore, encoder-decoder + beam search
generations is not supported for the time being.
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants