Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] Preparatory PR for multi-agent multi-GPU learner (alpha-star style) #03 #21652

Merged

Conversation

sven1977
Copy link
Contributor

@sven1977 sven1977 commented Jan 17, 2022

Preparatory PR for multi-agent multi-GPU learner (alpha-star style) #3

  • Use min_time_s_per_reporting instead of min_iter_time_s (deprecated previously).
  • Added Trainer._get_env_creator_from_env_id([env_id]) convenience method.
  • ARS and ES algos: Use Trainer.setup() instead of Trainer._init() (deprecated previously) and rename self._workers into self.workers to match all other algos in RLlib.

Why are these changes needed?

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Copy link
Member

@avnishn avnishn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments. LMK what you think :)

self.validate_config(config)

# Generate `self.env_creator` callable to create an env instance.
self._get_env_creator_from_env_id(self._env_id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for whatever reason this function isn't available on master, but I also didn't find its definition in this pr diff, but only when checking out this branch. Strange :(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be a mistake by me when splitting my local branch (which contained more changes).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

self.validate_config(config)

# Generate `self.env_creator` callable to create an env instance.
self._get_env_creator_from_env_id(self._env_id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we change the name of this function to _set_env_creator_from_env_id. This function sets the self.env_creator variable, but it doesn't return anything. That, or we could return the env creator ourselves and set the attribute ourselves:

self.env_creator = self._get_env_creator_from_env_id(self._env_id)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the catch. I'll check. ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@sven1977 sven1977 requested a review from gjoliver as a code owner January 21, 2022 09:32
Copy link
Member

@gjoliver gjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice cleanup PR.
a couple of minor questions.
on a high level, maybe break this into multiple PRs in the future, so simple changes like config param renaming can be merged much faster.
thanks.

policy_mapping_fn=policy_mapping_fn,
policies_to_train=policies_to_train,
)
worker.add_policy(**kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now that this is a 1-line statement, why bother with the inline fn?
maybe cleaner to just call worker.add_policy(**kwargs) directly below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to pass a function/callable to foreach_worker() below anyways. So it's better to share that code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. get it.

if workers:
workers.stop()
# Stop all optimizers.
if hasattr(self, "optimizer") and self.optimizer:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why get rid of this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seemed really outdated code.
Trainers do not have self.optimizer (anymore), only ES and ARS and those two optimizers do NOT have a stop() method, so this would actually produce errors here.

If users want their Trainers to have a self.optimizer, they can just override Trainer.cleanup() and implement the necessary logic.

@sven1977 sven1977 merged commit d5bfb7b into ray-project:master Jan 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants