Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix kv-cacheing and bsz > 1 in eval recipe #1622

Merged
merged 6 commits into from
Sep 19, 2024

Conversation

SalmanMohammadi
Copy link
Collaborator

@SalmanMohammadi SalmanMohammadi commented Sep 19, 2024

Context

What is the purpose of this PR? Is it to

  • add a new feature
  • fix a bug
  • update tests and/or documentation
  • other (please add here)

Please link to any issues this PR addresses.
closes #1600

Changelog

What are the changes made in this PR?

  • The eval harness will return batch_size specified prompts until the last batch, where it will return a smaller number of samples.

  • This breaks our static KV-cacheing which is only setup for a single batch size.

  • To address this, we can manually pad out batches, and then remove the padded samples after generation.

  • This could be wasteful for large batch sizes, and where the last batch may be mostly empty. We can kind of mitigate this by padding out with self._tokenizer.eos_id so we correctly trigger early stopping in generation.

  • Expected test values are changed because limit has changed.

  • Since swapping between KV-cacheing-inference and regular inference isn't supported yet, I've guarded against using generation tasks with non-generation tasks. This will be addressed in a follow up.

  • I've also moved setup cache logic outside of _model_generate - it only needs to be called once if we're expecting a generation task.

Thanks to @baberabb for the advice here.

Test plan

Please make sure to do each of the following if applicable to your PR. If you're unsure about any one of these just ask and we will happily help. We also have a contributing page for some guidance on contributing.

  • run pre-commit hooks and linters (make sure you've first installed via pre-commit install)
  • add unit tests for any new functionality
  • update docstrings for any new or updated methods or classes
  • run unit tests via pytest tests
  • run recipe tests via pytest tests -m integration_test
  • manually run any new or modified recipes with sufficient proof of correctness
  • include relevant commands and any other artifacts in this summary (pastes of loss curves, eval results, etc.)

UX

If your function changed a public API, please add a dummy example of what the user experience will look like when calling it.
Here is a docstring example
and a tutorial example

  • I did not change any public API
  • I have added an example to docs or docstrings

Copy link

pytorch-bot bot commented Sep 19, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1622

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 3ff80a2 with merge base c5db813 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 19, 2024
@SalmanMohammadi SalmanMohammadi marked this pull request as draft September 19, 2024 13:05
@SalmanMohammadi SalmanMohammadi marked this pull request as ready for review September 19, 2024 13:24
Comment on lines 126 to 130
maybe_padded_context = torch.nn.functional.pad(
context,
(0, 0, 0, self._batch_size - curr_batch_size),
value=self._tokenizer.eos_id,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add a comment explaining this cause otherwise it's not super obvious

Copy link
Contributor

@ebsmothers ebsmothers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small comment, otherwise looks good to me

@SalmanMohammadi SalmanMohammadi merged commit 5dd045e into pytorch:main Sep 19, 2024
17 checks passed
@SalmanMohammadi SalmanMohammadi deleted the eval_bs_fix branch September 19, 2024 17:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix KV-cacheing + bsz > 1 with eval recipe
3 participants