Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] SAC/DQN activate multi-agent learning tests and small bug fix in MultiAgentEpisode. #45542

Merged
merged 25 commits into from
Jun 23, 2024

Conversation

sven1977
Copy link
Contributor

@sven1977 sven1977 commented May 24, 2024

  • SAC/DQN activates multi-agent learning tests for CartPole (DQN) and Pendulum (SAC)
  • small bug fix in MultiAgentEpisode.concat(): self.env_t_to_agent_t is not properly built in resulting episode object.

Why are these changes needed?

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Copy link
Collaborator

@simonsays1980 simonsays1980 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Awesome work! Off-policy now learns!

rllib/BUILD Outdated
@@ -303,6 +292,15 @@ py_test(
args = ["--as-test", "--enable-new-api-stack"]
)

py_test(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yey! And we are there! :)

rllib/BUILD Outdated
@@ -452,6 +450,15 @@ py_test(
args = ["--as-test", "--enable-new-api-stack"]
)

py_test(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Next step: synchronized sampling.

EPISODE_RETURN_MIN,
EPISODE_RETURN_MAX,
]:
if must_have not in results[ENV_RUNNER_RESULTS]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, understood. But this means, if a user collects custom metrics via callback and uses Tune on this ... an error could result due to this metrics not being available in an iteration. We should document this somewhere - maybe at the MetricsLogger API ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, that's why I put the comment there. We should actually fix Tune, there is no other solution to this problem, imo.

Tune didn't probably think of this b/c in SL, you don't have "strangely behaving" episodes that don't deliver data sometimes.

@@ -873,7 +873,12 @@ def concat_episode(self, other: "MultiAgentEpisode") -> None:
)

# Concatenate the env- to agent-timestep mappings.
self.env_t_to_agent_t[agent_id].extend(other.env_t_to_agent_t[agent_id])
j = self.env_t
for i, val in enumerate(other.env_t_to_agent_t[agent_id][1:]):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the nasty parts of the MAE. Awesome catch!

.rl_module(
# Settings identical to old stack.
model_config_dict={
"fcnet_hiddens": [256],
"fcnet_activation": "relu",
"fcnet_activation": "tanh",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

beasty

model_config_dict={
"fcnet_hiddens": [256, 256],
"fcnet_activation": "relu",
# "post_fcnet_hiddens": [],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is old stack equivalent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it!

@@ -131,7 +131,7 @@ def add(
"""
episodes: List["MultiAgentEpisode"] = force_list(episodes)

new_episode_ids: List[str] = {eps.id_ for eps in episodes}
new_episode_ids: Set[str] = {eps.id_ for eps in episodes}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch!

# TODO (sven, simon): Is there always a mapping? What if not?
# Is then module_id == agent_id?
module_id = ma_episode._agent_to_module_mapping[agent_id]
module_id = ma_episode.module_for(agent_id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet, this method is just clean!

Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
…dqn_multi_agent_debugging

Signed-off-by: sven1977 <svenmika1977@gmail.com>

# Conflicts:
#	rllib/utils/metrics/stats.py
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
…dqn_multi_agent_debugging

Signed-off-by: sven1977 <svenmika1977@gmail.com>

# Conflicts:
#	rllib/BUILD
#	rllib/algorithms/sac/torch/sac_torch_learner.py
#	rllib/tuned_examples/sac/multi_agent_pendulum_sac.py
#	rllib/utils/metrics/stats.py
Signed-off-by: sven1977 <svenmika1977@gmail.com>
@sven1977 sven1977 enabled auto-merge (squash) June 22, 2024 09:29
@github-actions github-actions bot added the go add ONLY when ready to merge, run all tests label Jun 22, 2024
Signed-off-by: sven1977 <svenmika1977@gmail.com>
@github-actions github-actions bot disabled auto-merge June 22, 2024 12:24
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
Signed-off-by: sven1977 <svenmika1977@gmail.com>
@sven1977 sven1977 enabled auto-merge (squash) June 23, 2024 10:20
@sven1977 sven1977 merged commit c942d60 into ray-project:master Jun 23, 2024
7 checks passed
@sven1977 sven1977 deleted the sac_dqn_multi_agent_debugging branch June 23, 2024 13:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants