-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIR] Make Checkpoint.get_preprocessor
faster
#32350
[AIR] Make Checkpoint.get_preprocessor
faster
#32350
Conversation
Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor questions
python/ray/air/checkpoint.py
Outdated
# The preprocessor will either be stored in an in-memory dict or | ||
# written to storage. In either case, it will use the PREPROCESSOR_KEY key. | ||
|
||
def _get_preprocessor(self) -> Optional["Preprocessor"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this internal function? We don't use it anywhere else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's used below! I can rename it or extract it from the class.
python/ray/air/checkpoint.py
Outdated
else: | ||
preprocessor = load_preprocessor_from_dir(checkpoint_path) | ||
else: | ||
preprocessor = self._get_preprocessor() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
preprocessor = self._get_preprocessor() | |
checkpoint_dict = self.to_dict() | |
preprocessor = checkpoint_dict.get(PREPROCESSOR_KEY, None) |
Shouldn't this be enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may have a bytes checkpoint created from a directory, in which case this would fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Previously, `get_preprocessor` would always serialize the Checkpoint into a dictionary first. This is incredibly wasteful and causes huge memory usage and runtime with large directory-based Checkpoints. This PR changes the logic to first see if a directory Checkpoint should be loaded into a dictionary or not in order to obtain the preprocessor. Context: I had ran into it when trying to do predictions with 25 GB Hugging Face model. `HuggingFacePredictor` calls `get_preprocessor` internally, and that takes ages to complete and almost caused an OOM for me - and all of that is unnecessary as the preprocessor has to be loaded from a file anyway. Signed-off-by: Antoni Baum <antoni.baum@protonmail.com> Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Previously, `get_preprocessor` would always serialize the Checkpoint into a dictionary first. This is incredibly wasteful and causes huge memory usage and runtime with large directory-based Checkpoints. This PR changes the logic to first see if a directory Checkpoint should be loaded into a dictionary or not in order to obtain the preprocessor. Context: I had ran into it when trying to do predictions with 25 GB Hugging Face model. `HuggingFacePredictor` calls `get_preprocessor` internally, and that takes ages to complete and almost caused an OOM for me - and all of that is unnecessary as the preprocessor has to be loaded from a file anyway. Signed-off-by: Antoni Baum <antoni.baum@protonmail.com> Signed-off-by: elliottower <elliot@elliottower.com>
Signed-off-by: Antoni Baum antoni.baum@protonmail.com
Why are these changes needed?
Previously,
get_preprocessor
would always serialize the Checkpoint into a dictionary first. This is incredibly wasteful and causes huge memory usage and runtime with large directory-based Checkpoints. This PR changes the logic to first see if a directory Checkpoint should be loaded into a dictionary or not in order to obtain the preprocessor.Context: I had ran into it when trying to do predictions with 25 GB Hugging Face model.
HuggingFacePredictor
callsget_preprocessor
internally, and that takes ages to complete and almost caused an OOM for me - and all of that is unnecessary as the preprocessor has to be loaded from a file anyway.Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.