-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] SAC/DQN activate multi-agent learning tests and small bug fix in MultiAgentEpisode
.
#45542
[RLlib] SAC/DQN activate multi-agent learning tests and small bug fix in MultiAgentEpisode
.
#45542
Conversation
…dqn_multi_agent_debugging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Awesome work! Off-policy now learns!
rllib/BUILD
Outdated
@@ -303,6 +292,15 @@ py_test( | |||
args = ["--as-test", "--enable-new-api-stack"] | |||
) | |||
|
|||
py_test( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yey! And we are there! :)
rllib/BUILD
Outdated
@@ -452,6 +450,15 @@ py_test( | |||
args = ["--as-test", "--enable-new-api-stack"] | |||
) | |||
|
|||
py_test( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Next step: synchronized sampling.
rllib/algorithms/algorithm.py
Outdated
EPISODE_RETURN_MIN, | ||
EPISODE_RETURN_MAX, | ||
]: | ||
if must_have not in results[ENV_RUNNER_RESULTS]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, understood. But this means, if a user collects custom metrics via callback and uses Tune on this ... an error could result due to this metrics not being available in an iteration. We should document this somewhere - maybe at the MetricsLogger API ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, that's why I put the comment there. We should actually fix Tune, there is no other solution to this problem, imo.
Tune didn't probably think of this b/c in SL, you don't have "strangely behaving" episodes that don't deliver data sometimes.
rllib/env/multi_agent_episode.py
Outdated
@@ -873,7 +873,12 @@ def concat_episode(self, other: "MultiAgentEpisode") -> None: | |||
) | |||
|
|||
# Concatenate the env- to agent-timestep mappings. | |||
self.env_t_to_agent_t[agent_id].extend(other.env_t_to_agent_t[agent_id]) | |||
j = self.env_t | |||
for i, val in enumerate(other.env_t_to_agent_t[agent_id][1:]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is one of the nasty parts of the MAE. Awesome catch!
.rl_module( | ||
# Settings identical to old stack. | ||
model_config_dict={ | ||
"fcnet_hiddens": [256], | ||
"fcnet_activation": "relu", | ||
"fcnet_activation": "tanh", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
beasty
model_config_dict={ | ||
"fcnet_hiddens": [256, 256], | ||
"fcnet_activation": "relu", | ||
# "post_fcnet_hiddens": [], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part is old stack equivalent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it!
@@ -131,7 +131,7 @@ def add( | |||
""" | |||
episodes: List["MultiAgentEpisode"] = force_list(episodes) | |||
|
|||
new_episode_ids: List[str] = {eps.id_ for eps in episodes} | |||
new_episode_ids: Set[str] = {eps.id_ for eps in episodes} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch!
# TODO (sven, simon): Is there always a mapping? What if not? | ||
# Is then module_id == agent_id? | ||
module_id = ma_episode._agent_to_module_mapping[agent_id] | ||
module_id = ma_episode.module_for(agent_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sweet, this method is just clean!
Signed-off-by: sven1977 <svenmika1977@gmail.com>
…dqn_multi_agent_debugging
…dqn_multi_agent_debugging
…ti_agent_debugging
…dqn_multi_agent_debugging Signed-off-by: sven1977 <svenmika1977@gmail.com> # Conflicts: # rllib/utils/metrics/stats.py
…dqn_multi_agent_debugging Signed-off-by: sven1977 <svenmika1977@gmail.com> # Conflicts: # rllib/BUILD # rllib/algorithms/sac/torch/sac_torch_learner.py # rllib/tuned_examples/sac/multi_agent_pendulum_sac.py # rllib/utils/metrics/stats.py
…dqn_multi_agent_debugging
…dqn_multi_agent_debugging
MultiAgentEpisode.concat()
: self.env_t_to_agent_t is not properly built in resulting episode object.Why are these changes needed?
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.