Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tune] Refactor TrialRunner to separate out executor calls #33448

Merged
merged 10 commits into from
Mar 21, 2023

Conversation

krfricke
Copy link
Contributor

@krfricke krfricke commented Mar 20, 2023

Why are these changes needed?

We are planning to refactor the Ray Tune trial lifecycle management by offering an improved execution engine. The TrialRunner currently encapsulates most of the execution logic. When switching the Ray execution backend, we would like to retain most of this logic. Thus, we prepare the refactor by restructuring the TrialRunner:

  • The TrialRunner is split into a _TrialRunnerBase and TrialRunner class
  • The _TrialRunnerBase contains all lifecycle logic that is independent from execution. For instance, processing trial results, and scheduling trial training
  • The TrialRunner class contains the communication with the RayTrialExecutor
  • Our future TuneController (which is prototyped) will also inherit from _TrialRunnerBase

Please note that this PR does not change any logic. Is only renames some internal methods and changes the class structure.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Kai Fricke added 8 commits March 10, 2023 18:21
Signed-off-by: Kai Fricke <kai@anyscale.com>
…-refactor

# Conflicts:
#	python/ray/tune/execution/ray_trial_executor.py
#	python/ray/tune/execution/trial_runner.py
#	python/ray/tune/experiment/trial.py
Signed-off-by: Kai Fricke <kai@anyscale.com>
Signed-off-by: Kai Fricke <kai@anyscale.com>
Signed-off-by: Kai Fricke <kai@anyscale.com>
Signed-off-by: Kai Fricke <kai@anyscale.com>
Signed-off-by: Kai Fricke <kai@anyscale.com>
Signed-off-by: Kai Fricke <kai@anyscale.com>
@krfricke krfricke marked this pull request as ready for review March 20, 2023 13:58
@krfricke krfricke requested a review from xwjiang2010 March 20, 2023 13:58
Kai Fricke added 2 commits March 20, 2023 09:01
Signed-off-by: Kai Fricke <kai@anyscale.com>
@krfricke krfricke added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Mar 20, 2023
Copy link
Contributor

@xwjiang2010 xwjiang2010 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve to unblock the remaining one PR, since this PR is mostly splitting the logic in TrialRunner into executor-independent, and executor-dependent one (to be replaced by the new controller).

Overall feedback is that TrialRunner (minus anything searcher/scheduler related) and Trial possibly should become "AIR concept" rather than "Tune concept".

@krfricke krfricke merged commit eeda97c into ray-project:master Mar 21, 2023
@krfricke krfricke deleted the tune/trial-runner-refactor branch March 21, 2023 01:07
edoakes pushed a commit to edoakes/ray that referenced this pull request Mar 22, 2023
…ct#33448)

We are planning to refactor the Ray Tune trial lifecycle management by offering an improved execution engine. The TrialRunner currently encapsulates most of the execution logic. When switching the Ray execution backend, we would like to retain most of this logic. Thus, we prepare the refactor by restructuring the TrialRunner:

- The TrialRunner is split into a _TrialRunnerBase and TrialRunner class
- The _TrialRunnerBase contains all lifecycle logic that is independent from execution. For instance, processing trial results, and scheduling trial training
- The TrialRunner class contains the communication with the RayTrialExecutor
- Our future TuneController (which is prototyped) will also inherit from _TrialRunnerBase

Please note that this PR does not change any logic. Is only renames some internal methods and changes the class structure.

Signed-off-by: Kai Fricke <kai@anyscale.com>
Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
clarng pushed a commit to clarng/ray that referenced this pull request Mar 23, 2023
…ct#33448)

We are planning to refactor the Ray Tune trial lifecycle management by offering an improved execution engine. The TrialRunner currently encapsulates most of the execution logic. When switching the Ray execution backend, we would like to retain most of this logic. Thus, we prepare the refactor by restructuring the TrialRunner:

- The TrialRunner is split into a _TrialRunnerBase and TrialRunner class
- The _TrialRunnerBase contains all lifecycle logic that is independent from execution. For instance, processing trial results, and scheduling trial training
- The TrialRunner class contains the communication with the RayTrialExecutor
- Our future TuneController (which is prototyped) will also inherit from _TrialRunnerBase

Please note that this PR does not change any logic. Is only renames some internal methods and changes the class structure.

Signed-off-by: Kai Fricke <kai@anyscale.com>
elliottower pushed a commit to elliottower/ray that referenced this pull request Apr 22, 2023
…ct#33448)

We are planning to refactor the Ray Tune trial lifecycle management by offering an improved execution engine. The TrialRunner currently encapsulates most of the execution logic. When switching the Ray execution backend, we would like to retain most of this logic. Thus, we prepare the refactor by restructuring the TrialRunner:

- The TrialRunner is split into a _TrialRunnerBase and TrialRunner class
- The _TrialRunnerBase contains all lifecycle logic that is independent from execution. For instance, processing trial results, and scheduling trial training
- The TrialRunner class contains the communication with the RayTrialExecutor
- Our future TuneController (which is prototyped) will also inherit from _TrialRunnerBase

Please note that this PR does not change any logic. Is only renames some internal methods and changes the class structure.

Signed-off-by: Kai Fricke <kai@anyscale.com>
Signed-off-by: elliottower <elliot@elliottower.com>
ProjectsByJackHe pushed a commit to ProjectsByJackHe/ray that referenced this pull request May 4, 2023
…ct#33448)

We are planning to refactor the Ray Tune trial lifecycle management by offering an improved execution engine. The TrialRunner currently encapsulates most of the execution logic. When switching the Ray execution backend, we would like to retain most of this logic. Thus, we prepare the refactor by restructuring the TrialRunner:

- The TrialRunner is split into a _TrialRunnerBase and TrialRunner class
- The _TrialRunnerBase contains all lifecycle logic that is independent from execution. For instance, processing trial results, and scheduling trial training
- The TrialRunner class contains the communication with the RayTrialExecutor
- Our future TuneController (which is prototyped) will also inherit from _TrialRunnerBase

Please note that this PR does not change any logic. Is only renames some internal methods and changes the class structure.

Signed-off-by: Kai Fricke <kai@anyscale.com>
Signed-off-by: Jack He <jackhe2345@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests-ok The tagger certifies test failures are unrelated and assumes personal liability.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants