Policy output actiontuple #4651

andrewcoh · 2020-11-13T16:45:01Z

Proposed change(s)

Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)

Types of change(s)

Checklist

Added tests that prove my fix is effective or that my feature works
Updated the changelog (if applicable)
Updated the documentation (if applicable)
Updated the migration guide (if applicable)

Other comments

…ctiontuple

chriselion · 2020-11-13T20:01:20Z

ml-agents-envs/mlagents_envs/base_env.py

-        self._discrete = discrete
+    def __init__(self):
+        self._continuous = None
+        self._discrete = None


How about keeping the continuous and discrete arguments, and adding e.g.

if continuous is not None: self.add_continuous(continuous) # TODO same for discrete

?

chriselion · 2020-11-13T20:09:20Z

ml-agents/mlagents/trainers/torch/action_log_probs.py

@@ -5,6 +5,38 @@
 from mlagents.trainers.torch.utils import ModelUtils


+class LogProbsTuple:


If this is identical to ActionTuple in functionality, but you want to make sure that they aren't accidentally used interchangeably, you could make them both derive from e.g. _ActionTupleBase

chriselion

I like this a lot better than the dictionary, although the class still feels clunky. I think you can keep the initializer args which should shrink the code a bit.

ervteng · 2020-11-13T21:27:31Z

ml-agents-envs/mlagents_envs/base_env.py

        continuous = np.random.uniform(
            low=-1.0, high=1.0, size=(n_agents, self.continuous_size)
        )
-        discrete = np.zeros((n_agents, self.discrete_size), dtype=np.int32)
+        action_tuple.add_continuous(continuous)


If we add back the continuous and discrete parameters to ActionTuple, we can do something like:

Suggested change

action_tuple.add_continuous(continuous)

_continuous = continuous

_discrete = np.column_stack() if self.discrete_size > 0 else None

action_tuple = ActionTuple(_continuous, _discrete)

andrewcoh added 4 commits November 12, 2020 15:05

default actions are np.array of shape (n_agents, 0)

4fc60d5

broken, policy outputs action tuple

82f559c

Merge branch 'develop-hybrid-actions-singleton' into develop-output-a…

73a24e0

…ctiontuple

action log probs tuple

19dbc6d

chriselion reviewed Nov 13, 2020

View reviewed changes

ervteng reviewed Nov 13, 2020

View reviewed changes

andrewcoh mentioned this pull request Nov 13, 2020

Action buffer #4612

Merged

10 tasks

andrewcoh added 2 commits November 16, 2020 12:55

add _ActionTupleBase

27dffc8

add get_discrete_dtype

c2b703f

ervteng approved these changes Nov 16, 2020

View reviewed changes

andrewcoh merged commit e49f68b into develop-hybrid-actions-singleton Nov 16, 2020

delete-merged-branch bot deleted the develop-output-actiontuple branch November 16, 2020 19:55

github-actions bot locked as resolved and limited conversation to collaborators Nov 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Policy output actiontuple #4651

Policy output actiontuple #4651

andrewcoh commented Nov 13, 2020

chriselion Nov 13, 2020

ervteng Nov 13, 2020

chriselion Nov 13, 2020

chriselion left a comment

ervteng Nov 13, 2020 •

edited

Loading

		@@ -5,6 +5,38 @@
		from mlagents.trainers.torch.utils import ModelUtils


		class LogProbsTuple:

-        action_tuple.add_continuous(continuous)
+        _continuous = continuous
+        _discrete = np.column_stack() if self.discrete_size > 0 else None
+        action_tuple = ActionTuple(_continuous, _discrete)

Policy output actiontuple #4651

Policy output actiontuple #4651

Conversation

andrewcoh commented Nov 13, 2020

Proposed change(s)

Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)

Types of change(s)

Checklist

Other comments

chriselion Nov 13, 2020

Choose a reason for hiding this comment

ervteng Nov 13, 2020

Choose a reason for hiding this comment

chriselion Nov 13, 2020

Choose a reason for hiding this comment

chriselion left a comment

Choose a reason for hiding this comment

ervteng Nov 13, 2020 • edited Loading

Choose a reason for hiding this comment

ervteng Nov 13, 2020 •

edited

Loading