Action buffer #4612

andrewcoh · 2020-10-29T15:37:32Z

Proposed change(s)

Describe the changes made in this PR.

Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)

Types of change(s)

Checklist

Added tests that prove my fix is effective or that my feature works
Updated the changelog (if applicable)
Updated the documentation (if applicable)
Updated the migration guide (if applicable)

Other comments

…member (no-member)

* Fix Gym for ActionSpec * Fix TF policy test

ml-agents-envs/mlagents_envs/base_env.py

vincentpierre · 2020-11-09T20:57:00Z

ml-agents-envs/mlagents_envs/base_env.py

+    respectively.
+    """
+
+    def __init__(self, continuous: np.ndarray, discrete: np.ndarray):


We need some constructor that will take only continuous or only discrete so the user does not have to create an empty array when using only discrete or only continuous.

Why can't the default be None but in the constructor assigns an empty array when None is specified? This is a common pattern for mutable default parameters

ml-agents-envs/mlagents_envs/environment.py

vincentpierre · 2020-11-09T21:33:26Z

ml-agents/mlagents/trainers/env_manager.py

@@ -143,3 +146,15 @@ def _process_step_infos(self, step_infos: List[EnvironmentStep]) -> int:
                    step_info.environment_stats, step_info.worker_id
                )
        return len(step_infos)
+
+    @staticmethod


ml-agents/mlagents/trainers/demo_loader.py

vincentpierre · 2020-11-09T21:33:53Z

ml-agents-envs/mlagents_envs/tests/test_envs.py

-    decision_steps, terminal_steps = env.get_steps("RealFakeBrain")
-    n_agents = len(decision_steps)
-    env.set_actions("RealFakeBrain", spec.action_spec.empty_action(n_agents) - 1)
-    env.step()
-


ml-agents/mlagents/trainers/policy/tf_policy.py

Co-authored-by: Vincent-Pierre BERGES <vincentpierre@unity3d.com>

chriselion · 2020-11-09T23:01:03Z

ml-agents/mlagents/trainers/policy/policy.py

-            + self.behavior_spec.action_spec.discrete_size
-        )
-        self.previous_action_dict: Dict[str, np.array] = {}
+        self.previous_action_dict: Dict[str, Dict[str, np.ndarray]] = {}


Why not Dict[str, ActionTuple]? a dictionary with two hardcoded keys (in make_empty_previous_action()) doesn't make much sense.

Good call out. I was doing this because it made writing to the buffer easier (e.g. using the string) but in reality we only ever use the previous action if its discrete. I've updated the PR to only save the discrete actions.

ml-agents/mlagents/trainers/policy/torch_policy.py

ml-agents/mlagents/trainers/ppo/optimizer_tf.py

chriselion · 2020-11-09T23:14:57Z

ml-agents/mlagents/trainers/policy/tf_policy.py

@@ -270,6 +271,14 @@ def get_action(
        )

        self.save_memories(global_agent_ids, run_out.get("memory_out"))
+        # For Compatibility with buffer changes for hybrid action support
+        if "log_probs" in run_out:


I know it's not new to this PR, but should we move these strings to constant/enums? "pre_action" and "actions_pre" has me scared, and "continuous" is generally pretty typo-prone.

…nologies/ml-agents into develop-action-buffer

vincentpierre · 2020-11-10T17:35:51Z

ml-agents-envs/mlagents_envs/base_env.py

+
+    def __init__(
+        self,
+        continuous: Optional[np.ndarray] = None,


I think I would prefer having this not be Optional. When the action is discrete only, the continuous action becomes of shape (n_agents, 0). I think it is slightly more general than having a None if there is no action because this way, we can eventually consider all actions to be hybrid, only some of them happen to have 0 discrete or continuous actions. @chriselion what do you think? I might be missing something

I think for initializer arguments, keeping them Optional is nice. If you want to ensure that the fields themselves are non-None, you could "allocate" the np array on construction if they weren't provided (but not sure that's meaningful since there would be 0 elements).

As a side note, I think we're assuming that continuous.shape[0] == discrete.shape[0] - might be good to check that too.

_validate_actions will check this`

ml-agents/mlagents/trainers/torch/utils.py

ervteng · 2020-11-12T22:26:28Z

ml-agents-envs/mlagents_envs/environment.py

    ) -> UnityInputProto:
        rl_in = UnityRLInputProto()
        for b in vector_action:
            n_agents = len(self._env_state[b][0])
            if n_agents == 0:
                continue
            for i in range(n_agents):
-                action = AgentActionProto(vector_actions=vector_action[b][i])
+                # TODO: This check will be removed when the oroto supports hybrid actions


Suggested change

# TODO: This check will be removed when the oroto supports hybrid actions

# TODO: This check will be removed when the proto supports hybrid actions

ml-agents/mlagents/trainers/trajectory.py

ervteng

I think the remaining issues will end up being fixed by #4651, this part LGTM

vincentpierre · 2020-11-16T20:06:17Z

ml-agents-envs/mlagents_envs/base_env.py

-        _expected_shape = (n_agents, _size)
-        if actions.shape != _expected_shape:
+        _expected_shape = (n_agents, self.continuous_size)
+        if self.continuous_size > 0 and actions.continuous.shape != _expected_shape:


why check if self.continuous_size > 0 ? same question line 36 with discrete.

This is removed in the ActionModel because the defaults make it unnecessary

vincentpierre · 2020-11-16T20:10:25Z

ml-agents-envs/mlagents_envs/environment.py

    ) -> UnityInputProto:
        rl_in = UnityRLInputProto()
        for b in vector_action:
            n_agents = len(self._env_state[b][0])
            if n_agents == 0:
                continue
            for i in range(n_agents):
-                action = AgentActionProto(vector_actions=vector_action[b][i])
+                # TODO: This check will be removed when the oroto supports hybrid actions
+                if vector_action[b].continuous.shape[1] > 0:


why check on the shape[1] rather than action_specs.continuous_size?

ActionSpec is not available here. This is the only way I think we can determine if the action space is continuous or discrete in this function. Critically, this is just to support the old proto until the new proto is merged. This is just temporary so that the communication protocol works until the new protos are in place.

andrewcoh and others added 30 commits October 20, 2020 15:17

add ActionSpec; test_simple_rl torch passes

ba8bdcf

remove uneccesary type from set_actions

6c93137

ignoring Instance of 'AbstractContextManager' has no 'enter_context' …

03f7e47

…member (no-member)

fixing tensorflow tests

87d2049

use proper spec in environment.py

a889e8f

fix tf bc test

24be76f

fix mlagents-envs tests

12fa45a

[bug-fix] Fix Gym and some Policy tests for ActionSpec (#4590)

ff4f3b8

* Fix Gym for ActionSpec * Fix TF policy test

remove *_action_* from function names

3e807c6

make_fake_trajectory/step take ActionSpec arg

fed5c20

remove ActionType

d119c1a

remove self.action_spec from policy/bc

5a37dfe

fix action_spec refs

1e5e440

Add __eq__ and __str__ to ActionSpec

194505e

add static method to create continuous/discrete

4baaa7a

fix recurrent sac test

60337b8

fix yamato

841110a

resolve conflicts

ebd50b2

Merge branch 'master' into develop-action-spec

2e4dcf2

fix entropy_sum after merge

1337d07

fix yamato

c05c40e

moved type and shape checking into ActionSpec

1b96170

removed action_spec.size

c940d41

fix specs in torch util

b0d9a48

fixed tests/ -> single validate_action func

d2bb5d0

make is_discrete/is_continuous strict

ad144c3

add docstrings

9090821

rename make_x to creat_x/remove redundant properties

f23e395

make validate action private

785848e

fix advanced vis encoder simple rl

9af9ee9

fix action mask in trajectory

056cf6d

vincentpierre reviewed Nov 9, 2020

View reviewed changes

Update ml-agents-envs/mlagents_envs/environment.py

5691f60

Co-authored-by: Vincent-Pierre BERGES <vincentpierre@unity3d.com>

chriselion reviewed Nov 9, 2020

View reviewed changes

ml-agents/mlagents/trainers/policy/torch_policy.py Outdated Show resolved Hide resolved

chriselion reviewed Nov 9, 2020

View reviewed changes

ml-agents/mlagents/trainers/ppo/optimizer_tf.py Show resolved Hide resolved

chriselion reviewed Nov 9, 2020

View reviewed changes

andrewcoh added 10 commits November 9, 2020 18:18

revert demo

b567fcd

Merge branch 'develop-action-buffer' of https://github.com/Unity-Tech…

116580a

…nologies/ml-agents into develop-action-buffer

fix default random action

b152511

fix reward provider tests

bb9988c

add defaults to ActionTuple constructor

c488e8e

remove unused line in traj

589907a

save only discrete actions as prev

c8ae8da

update make_empty docstring

c651ebc

reuse action dict in torch policy for pre_action

0dc4396

add back removed part of test_envs

434f210

vincentpierre reviewed Nov 10, 2020

View reviewed changes

ml-agents/mlagents/trainers/torch/utils.py Show resolved Hide resolved

andrewcoh added 3 commits November 10, 2020 13:10

fix mock brain prev action

714b444

default ActionTuple to None

65d17fe

default actions are np.array of shape (n_agents, 0)

4fc60d5

ervteng reviewed Nov 12, 2020

View reviewed changes

ml-agents/mlagents/trainers/trajectory.py Show resolved Hide resolved

ervteng approved these changes Nov 14, 2020

View reviewed changes

vincentpierre approved these changes Nov 16, 2020

View reviewed changes

andrewcoh merged commit 0a94d76 into develop-hybrid-action-staging Nov 16, 2020

delete-merged-branch bot deleted the develop-action-buffer branch November 16, 2020 20:46

ykeuter mentioned this pull request Nov 29, 2020

Fix set_action_for_agent #4691

Merged

10 tasks

github-actions bot locked as resolved and limited conversation to collaborators Nov 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Action buffer #4612

Action buffer #4612

andrewcoh commented Oct 29, 2020

vincentpierre Nov 9, 2020

ervteng Nov 12, 2020

vincentpierre Nov 9, 2020

vincentpierre Nov 9, 2020

chriselion Nov 9, 2020

andrewcoh Nov 10, 2020

chriselion Nov 9, 2020

vincentpierre Nov 10, 2020

chriselion Nov 10, 2020

andrewcoh Nov 12, 2020

ervteng Nov 12, 2020

ervteng left a comment

vincentpierre Nov 16, 2020

andrewcoh Nov 16, 2020

vincentpierre Nov 16, 2020

andrewcoh Nov 16, 2020

	# TODO: This check will be removed when the oroto supports hybrid actions
	# TODO: This check will be removed when the proto supports hybrid actions

Action buffer #4612

Action buffer #4612

Conversation

andrewcoh commented Oct 29, 2020

Proposed change(s)

Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)

Types of change(s)

Checklist

Other comments

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ervteng left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment