Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIR] Make Checkpoint.get_preprocessor faster #32350

Merged
merged 4 commits into from
Feb 15, 2023

Conversation

Yard1
Copy link
Member

@Yard1 Yard1 commented Feb 9, 2023

Signed-off-by: Antoni Baum antoni.baum@protonmail.com

Why are these changes needed?

Previously, get_preprocessor would always serialize the Checkpoint into a dictionary first. This is incredibly wasteful and causes huge memory usage and runtime with large directory-based Checkpoints. This PR changes the logic to first see if a directory Checkpoint should be loaded into a dictionary or not in order to obtain the preprocessor.

Context: I had ran into it when trying to do predictions with 25 GB Hugging Face model. HuggingFacePredictor calls get_preprocessor internally, and that takes ages to complete and almost caused an OOM for me - and all of that is unnecessary as the preprocessor has to be loaded from a file anyway.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>
Copy link
Contributor

@krfricke krfricke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor questions

# The preprocessor will either be stored in an in-memory dict or
# written to storage. In either case, it will use the PREPROCESSOR_KEY key.

def _get_preprocessor(self) -> Optional["Preprocessor"]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this internal function? We don't use it anywhere else?

Copy link
Member Author

@Yard1 Yard1 Feb 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used below! I can rename it or extract it from the class.

else:
preprocessor = load_preprocessor_from_dir(checkpoint_path)
else:
preprocessor = self._get_preprocessor()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
preprocessor = self._get_preprocessor()
checkpoint_dict = self.to_dict()
preprocessor = checkpoint_dict.get(PREPROCESSOR_KEY, None)

Shouldn't this be enough?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may have a bytes checkpoint created from a directory, in which case this would fail.

@krfricke krfricke added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Feb 10, 2023
Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>
@Yard1 Yard1 removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Feb 10, 2023
@Yard1 Yard1 requested a review from krfricke February 10, 2023 19:44
Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>
Copy link
Contributor

@krfricke krfricke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@krfricke krfricke merged commit e74dcfc into ray-project:master Feb 15, 2023
@Yard1 Yard1 deleted the checkpoint_get_preprocessor_speed branch February 15, 2023 01:26
edoakes pushed a commit to edoakes/ray that referenced this pull request Mar 22, 2023
Previously, `get_preprocessor` would always serialize the Checkpoint into a dictionary first. This is incredibly wasteful and causes huge memory usage and runtime with large directory-based Checkpoints. This PR changes the logic to first see if a directory Checkpoint should be loaded into a dictionary or not in order to obtain the preprocessor.

Context: I had ran into it when trying to do predictions with 25 GB Hugging Face model. `HuggingFacePredictor` calls `get_preprocessor` internally, and that takes ages to complete and almost caused an OOM for me - and all of that is unnecessary as the preprocessor has to be loaded from a file anyway.

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>
Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
elliottower pushed a commit to elliottower/ray that referenced this pull request Apr 22, 2023
Previously, `get_preprocessor` would always serialize the Checkpoint into a dictionary first. This is incredibly wasteful and causes huge memory usage and runtime with large directory-based Checkpoints. This PR changes the logic to first see if a directory Checkpoint should be loaded into a dictionary or not in order to obtain the preprocessor.

Context: I had ran into it when trying to do predictions with 25 GB Hugging Face model. `HuggingFacePredictor` calls `get_preprocessor` internally, and that takes ages to complete and almost caused an OOM for me - and all of that is unnecessary as the preprocessor has to be loaded from a file anyway.

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>
Signed-off-by: elliottower <elliot@elliottower.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants