Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] Fix ConnectorPipelineV2 restoring from checkpoint (by writing information about individual connector pieces to the ctor_args_and_kwargs file). #48213

Conversation

simonsays1980
Copy link
Collaborator

@simonsays1980 simonsays1980 commented Oct 23, 2024

Why are these changes needed?

At the moment a ConnectorPipelineV2 needs to be build first to load its state from a checkpoint. This is due to the fact that for a state to be restored connectors that should be loaded from the state need to be instantiated first in the pipeline.

This PR proposes a get_ctor_args_and_kwargs override in the ConnectorPipelineV2 that stores for each connector in the pipeline a 3-tuples, namely class, args, and kwargs. When building the pipeline from checkpoint the connector tuples are passed into the class constructor and all connectors are build before their state is loaded.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

…rectly from a checkpoint.

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
…ensure clean class inheritance. Instead 'get_ctor_args_and_kwargs' returns a list of 3-tuples with class, args, and kwargs to rebuild the connector instances before loading their state.

Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
Signed-off-by: Sven Mika <sven@anyscale.io>
Copy link
Contributor

@sven1977 sven1977 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @simonsays1980

@sven1977 sven1977 marked this pull request as ready for review October 27, 2024 19:21
@sven1977 sven1977 changed the title [RLlib] - Add restore_from_path to `ConnectorPipelineV2' to build directly from checkpoint. [RLlib] Fix ConnectorPipelineV2 restoring from checkpoint (by writing information about individual connector pieces to the ctor_args_and_kwargs file). Oct 27, 2024
@sven1977 sven1977 enabled auto-merge (squash) October 27, 2024 19:22
@github-actions github-actions bot added the go add ONLY when ready to merge, run all tests label Oct 27, 2024
@sven1977 sven1977 merged commit 6878aa1 into ray-project:master Oct 27, 2024
6 of 7 checks passed
edoakes pushed a commit to edoakes/ray that referenced this pull request Oct 30, 2024
…ng information about individual connector pieces to the `ctor_args_and_kwargs` file). (ray-project#48213)
Jay-ju pushed a commit to Jay-ju/ray that referenced this pull request Nov 5, 2024
…ng information about individual connector pieces to the `ctor_args_and_kwargs` file). (ray-project#48213)
JP-sDEV pushed a commit to JP-sDEV/ray that referenced this pull request Nov 14, 2024
…ng information about individual connector pieces to the `ctor_args_and_kwargs` file). (ray-project#48213)

Signed-off-by: JP-sDEV <jon.pablo80@gmail.com>
mohitjain2504 pushed a commit to mohitjain2504/ray that referenced this pull request Nov 15, 2024
…ng information about individual connector pieces to the `ctor_args_and_kwargs` file). (ray-project#48213)

Signed-off-by: mohitjain2504 <mohit.jain@dream11.com>
@simonsays1980 simonsays1980 deleted the make-connector-pipelinev2-a-checkpointable branch November 22, 2024 10:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants