Add balanced strategies for device_map in from_pretrained #18349
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do?
This PR brings to Transformers the functionality introduced in huggingface/accelerate#534 .
Basically
device_map
can now take several options:"sequential"
which corresponds to the current auto: fill each GPU sequentially (and if the user has lots of GPU spaces, some are not used at all)"balanced"
which will split the model evenly across GPUs"balanced_low_0"
which will split the model evenly across GPUs while leaving the most available memory on GPU 0, since that GPU might have more tensors on it when the outputs are used for some form of post-processing (generate and use_cache for instance)"auto"
which now defaults to"balanced"
.When the user does not have enough GPU memory to accommodate the model, all the options are equivalent.