Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

single GPU automatic batching logic #394

Merged
merged 12 commits into from
May 3, 2023

Conversation

fattorib
Copy link
Contributor

@fattorib fattorib commented Mar 9, 2023

This PR addresses the single GPU component of https://github.com/EleutherAI/lm-eval2/issues/7.

Current method adds extra logic to determine maximum batch size based on the longest sample over all provided tasks and uses find_executable_batch_size from Accelerate to determine the largest batch size. Tested with both gpt2 and hf-causal model classes on both loglikelihood_rolling and _loglikelihood_tokens tasks.

Example use:

python main.py \
	--model gpt2 \
	--tasks lambada_openai,wikitext \
	--batch_size auto 

or

python main.py \
	--model hf-causal \
	--model_args pretrained=EleutherAI/pythia-70m \
	--tasks lambada_openai,wikitext \
	--batch_size auto

Output for both looks like:

...
Running loglikelihood requests
Passed argument batch_size = auto. Detecting largest batch size
Determined Largest batch size: 128
...

@haileyschoelkopf
Copy link
Collaborator

Thank you very much for this PR! I'll try to review this today.

@haileyschoelkopf
Copy link
Collaborator

What is the expected behavior / output of this PR when you test it?

When I test on my local machine: I get a RuntimeError: CUDA error: an illegal memory access was encountered error message or similar. I tried decreasing the starting batch size to one I knew fits, but I got the same errors, so for some reason either the Accelerate decorator is causing an OOM when without it and with the same batch size I don't get it, or something else has gone wrong--maybe on my end, but not sure.

@fattorib fattorib marked this pull request as draft March 14, 2023 15:13
@fattorib
Copy link
Contributor Author

Marking this as draft until MultipleChoiceTask task types (ex: Piqa) have been debugged

@haileyschoelkopf
Copy link
Collaborator

Hi @fattorib , how's this PR going? Happy to take a look if things have changed in it, or to help debug / talk it through if you need!

@fattorib fattorib marked this pull request as ready for review April 17, 2023 09:15
@fattorib
Copy link
Contributor Author

Updated batch size logic to improve robustness. Tested and confirmed no OOMs across the following models on two different setups with clean install:

  • EleutherAI/pythia-1b
  • EleutherAI/pythia-160m
  • gpt2

@CLAassistant
Copy link

CLAassistant commented Apr 23, 2023

CLA assistant check
All committers have signed the CLA.

@haileyschoelkopf
Copy link
Collaborator

I'll be merging this PR over the weekend most likely. After discussion with @fattorib , it seems that although I get errors in this mode on my local setup this PR has been robust in his and my remote testing across machines.

Copy link
Contributor

@juletx juletx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tested this code and it works properly for me. All the results in this issue were calculated using this code: #443. I have added some comments to extend it to greedy_until method in huggingface.py.

lm_eval/base.py Outdated Show resolved Hide resolved
Comment on lines +191 to +206

# automatic batch size detection for vectorization
adaptive_batch_size = None
if self.batch_size == 'auto':
# using rolling window with maximum context
print('Passed argument batch_size = auto. Detecting largest batch size')
@find_executable_batch_size(starting_batch_size=512) # if OOM, then halves batch_size and tries again
def forward_batch(batch_size):
test_batch = torch.ones((batch_size, self.max_length), device=self.device).long()
for _ in range(5):
out = F.log_softmax(self._model_call(test_batch), dim = -1).cpu()
return batch_size

batch_size = forward_batch()
print(f"Determined Largest batch size: {batch_size}")
adaptive_batch_size = batch_size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tested this and it works correctly. You can also add this logic to the greedy_until method in huggingface.py and it works.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this in. What tasks did you test this on? Just want to make sure everything works

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized that you added the wrong code to the greedy_until method in huggingface.py. The correct code is the one in loglikelihood_rolling method.

lm_eval/base.py Show resolved Hide resolved
@fattorib
Copy link
Contributor Author

fattorib commented May 2, 2023

I have tested this code and it works properly for me. All the results in this issue were calculated using this code: #443. I have added some comments to extend it to greedy_until method in huggingface.py.

Thank you for the revisions + extra verification that the code works. Will add these changes in now

Copy link
Member

@StellaAthena StellaAthena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent contribution! Thank you.

@StellaAthena StellaAthena merged commit 9a87719 into EleutherAI:master May 3, 2023
@fattorib fattorib deleted the auto-batching branch May 3, 2023 12:47
juletx added a commit to juletx/lm-evaluation-harness that referenced this pull request May 3, 2023
Fix automatic batching for greedy_until
Fix stop sequences bug for greedy_until
Fix max length bug when model max length is not specified in tokenizer
StellaAthena added a commit that referenced this pull request May 3, 2023
Fix bugs introduced in #394 #406 and max length bug
qmdnls pushed a commit to qmdnls/lm-evaluation-harness that referenced this pull request Aug 17, 2023
qmdnls pushed a commit to qmdnls/lm-evaluation-harness that referenced this pull request Aug 17, 2023
Fix automatic batching for greedy_until
Fix stop sequences bug for greedy_until
Fix max length bug when model max length is not specified in tokenizer
qmdnls pushed a commit to qmdnls/lm-evaluation-harness that referenced this pull request Aug 17, 2023
LZY-the-boys pushed a commit to LZY-the-boys/lm-evaluation-harness-fast that referenced this pull request Sep 12, 2023
LZY-the-boys pushed a commit to LZY-the-boys/lm-evaluation-harness-fast that referenced this pull request Sep 12, 2023
Fix automatic batching for greedy_until
Fix stop sequences bug for greedy_until
Fix max length bug when model max length is not specified in tokenizer
LZY-the-boys pushed a commit to LZY-the-boys/lm-evaluation-harness-fast that referenced this pull request Sep 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants