Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

[Retiarii] Policy-based RL Strategy #3650

Merged
merged 7 commits into from
May 25, 2021

Conversation

ultmaster
Copy link
Contributor

@ultmaster ultmaster commented May 17, 2021

This PR supports a family of RL strategy based on tianshou. The default built-in algorithm is PPO.

TODOs:

  • logging
  • tests

@ultmaster ultmaster added the NAS label May 17, 2021
@ultmaster ultmaster self-assigned this May 17, 2021
@ultmaster ultmaster marked this pull request as ready for review May 20, 2021 05:43
Takes ``ModelEvaluationEnv`` as input and return a policy. See ``_default_policy_fn`` for an example.
asynchronous : bool
If true, in each step, collector won't wait for all the envs to complete.
This should generally not affect the result, but might affect the efficiency. Note that a slightly more trials
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't understand, why asynchronous does not affect the result?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Synchronous doesn't mean single-process sampling. Both synchronous and asynchronous has parallelism. "Asynchronous" induces a mechanism to give up on some environment when it's not finished.

Refer to https://tianshou.readthedocs.io/en/master/tutorials/cheatsheet.html#parallel-sampling if you feel interested. It's a bit complicated and I don't think I can make it clear here in a few words.

@ultmaster ultmaster merged commit 122b5b8 into microsoft:master May 25, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants