Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Action buffer #4612

Merged
merged 89 commits into from
Nov 16, 2020
Merged
Show file tree
Hide file tree
Changes from 76 commits
Commits
Show all changes
89 commits
Select commit Hold shift + click to select a range
ba8bdcf
add ActionSpec; test_simple_rl torch passes
andrewcoh Oct 20, 2020
6c93137
remove uneccesary type from set_actions
andrewcoh Oct 20, 2020
03f7e47
ignoring Instance of 'AbstractContextManager' has no 'enter_context' …
andrewcoh Oct 20, 2020
87d2049
fixing tensorflow tests
andrewcoh Oct 20, 2020
a889e8f
use proper spec in environment.py
andrewcoh Oct 20, 2020
24be76f
fix tf bc test
andrewcoh Oct 20, 2020
12fa45a
fix mlagents-envs tests
andrewcoh Oct 20, 2020
ff4f3b8
[bug-fix] Fix Gym and some Policy tests for ActionSpec (#4590)
Oct 21, 2020
3e807c6
remove *_action_* from function names
andrewcoh Oct 22, 2020
fed5c20
make_fake_trajectory/step take ActionSpec arg
andrewcoh Oct 22, 2020
d119c1a
remove ActionType
andrewcoh Oct 22, 2020
5a37dfe
remove self.action_spec from policy/bc
andrewcoh Oct 22, 2020
1e5e440
fix action_spec refs
andrewcoh Oct 22, 2020
194505e
Add __eq__ and __str__ to ActionSpec
andrewcoh Oct 22, 2020
4baaa7a
add static method to create continuous/discrete
andrewcoh Oct 22, 2020
60337b8
fix recurrent sac test
andrewcoh Oct 23, 2020
841110a
fix yamato
andrewcoh Oct 23, 2020
ebd50b2
resolve conflicts
andrewcoh Oct 23, 2020
2e4dcf2
Merge branch 'master' into develop-action-spec
andrewcoh Oct 23, 2020
1337d07
fix entropy_sum after merge
andrewcoh Oct 23, 2020
c05c40e
fix yamato
andrewcoh Oct 23, 2020
1b96170
moved type and shape checking into ActionSpec
andrewcoh Oct 23, 2020
c940d41
removed action_spec.size
andrewcoh Oct 23, 2020
b0d9a48
fix specs in torch util
andrewcoh Oct 23, 2020
d2bb5d0
fixed tests/ -> single validate_action func
andrewcoh Oct 23, 2020
ad144c3
make is_discrete/is_continuous strict
andrewcoh Oct 23, 2020
9090821
add docstrings
andrewcoh Oct 23, 2020
f23e395
rename make_x to creat_x/remove redundant properties
andrewcoh Oct 24, 2020
785848e
make validate action private
andrewcoh Oct 24, 2020
9af9ee9
fix advanced vis encoder simple rl
andrewcoh Oct 24, 2020
600d307
fix recurrent/advanced ppo tests
andrewcoh Oct 24, 2020
42bdfce
fix recurrent sac
andrewcoh Oct 24, 2020
754f5b8
reduce visual advanced steps
andrewcoh Oct 24, 2020
a8813fc
reduce recurrent step/increase batch size
andrewcoh Oct 25, 2020
64091cc
add ActionBuffers and utils
andrewcoh Oct 25, 2020
a8204bd
reduce steps_per_update recurrent sac
andrewcoh Oct 26, 2020
b5ca548
fix AgentExperience typing
andrewcoh Oct 26, 2020
ed11b10
recurrent sac passes locally but fails on CI for inexplicable reasons
andrewcoh Oct 26, 2020
442f29a
increase seq length
andrewcoh Oct 26, 2020
8733ec1
rename create random to random action
andrewcoh Oct 26, 2020
199d15b
rename create empty to empty action
andrewcoh Oct 26, 2020
00a824c
Merge branch 'develop-action-spec' into develop-action-buffer
andrewcoh Oct 26, 2020
b0ed241
Merge branch 'master' into develop-action-buffer
andrewcoh Oct 26, 2020
bfaa249
action buffer passes continuous
andrewcoh Oct 27, 2020
d927497
discrete runs/cont passes
andrewcoh Oct 27, 2020
0d33e1f
debugging discrete
andrewcoh Oct 27, 2020
8f06a67
2d discrete passes
andrewcoh Oct 27, 2020
da1c85a
sac continuous and discrete train
andrewcoh Oct 28, 2020
080f3eb
bc tests pass
andrewcoh Oct 29, 2020
f872359
torch reward providers all pass
andrewcoh Oct 29, 2020
5886f74
fixed bug in discrete
andrewcoh Oct 29, 2020
fe8fdd9
test_simple_rl/reward providers pass tf/torch
andrewcoh Oct 29, 2020
9479a65
ml-agents-envs pass
andrewcoh Nov 3, 2020
3a90973
Merge branch 'master' into develop-action-buffer
andrewcoh Nov 3, 2020
dbf819c
rename extract to from_dict
andrewcoh Nov 3, 2020
d1e2b97
agent processor tests
andrewcoh Nov 4, 2020
e87effe
fix demo loader tests
andrewcoh Nov 4, 2020
e0418dc
test_trajectory fixed
andrewcoh Nov 4, 2020
5f571a1
fixed recurrent prev_action issue
andrewcoh Nov 4, 2020
9089e63
fix test_tf_policy
andrewcoh Nov 5, 2020
f8d85fa
fix torch test_ppo
andrewcoh Nov 5, 2020
c21d223
fix torch utils test
andrewcoh Nov 5, 2020
f0f4249
discrete/contionuous unity envs train
andrewcoh Nov 5, 2020
d6eaf8d
agent processor tests
andrewcoh Nov 5, 2020
e9848b1
fix torch test policy
andrewcoh Nov 5, 2020
b25fc3d
remove unused import
andrewcoh Nov 6, 2020
10944f1
add docstrings to AgentAction and ActionLogProbs
andrewcoh Nov 6, 2020
6fcdd3f
revert demo
andrewcoh Nov 6, 2020
6d4738b
Remove print from ppo tf opti
andrewcoh Nov 6, 2020
5c8ec2d
rename to ActionTuple
andrewcoh Nov 9, 2020
0441118
Merge branch 'develop-action-buffer' of https://github.com/Unity-Tech…
andrewcoh Nov 9, 2020
86b6d71
Update ml-agents/mlagents/trainers/torch/utils.py
andrewcoh Nov 9, 2020
2bf004c
ActionTuple default is now np.array, not None
andrewcoh Nov 9, 2020
aaf6c59
fix set_actions_for_agent
andrewcoh Nov 9, 2020
056cf6d
fix action mask in trajectory
andrewcoh Nov 9, 2020
5691f60
Update ml-agents-envs/mlagents_envs/environment.py
andrewcoh Nov 9, 2020
b567fcd
revert demo
andrewcoh Nov 9, 2020
116580a
Merge branch 'develop-action-buffer' of https://github.com/Unity-Tech…
andrewcoh Nov 9, 2020
b152511
fix default random action
andrewcoh Nov 10, 2020
bb9988c
fix reward provider tests
andrewcoh Nov 10, 2020
c488e8e
add defaults to ActionTuple constructor
andrewcoh Nov 10, 2020
589907a
remove unused line in traj
andrewcoh Nov 10, 2020
c8ae8da
save only discrete actions as prev
andrewcoh Nov 10, 2020
c651ebc
update make_empty docstring
andrewcoh Nov 10, 2020
0dc4396
reuse action dict in torch policy for pre_action
andrewcoh Nov 10, 2020
434f210
add back removed part of test_envs
andrewcoh Nov 10, 2020
714b444
fix mock brain prev action
andrewcoh Nov 10, 2020
65d17fe
default ActionTuple to None
andrewcoh Nov 12, 2020
4fc60d5
default actions are np.array of shape (n_agents, 0)
andrewcoh Nov 12, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

98 changes: 63 additions & 35 deletions ml-agents-envs/mlagents_envs/base_env.py
Original file line number Diff line number Diff line change
Expand Up @@ -244,6 +244,32 @@ def empty(spec: "BehaviorSpec") -> "TerminalSteps":
)


class ActionTuple:
"""
An object whose fields correspond to actions of different types.
Continuous and discrete actions are numpy arrays of type float32 and
int32, respectively and are type checked on construction.
Dimensions are of (n_agents, continuous_size) and (n_agents, discrete_size),
respectively.
"""

def __init__(self, continuous: np.ndarray, discrete: np.ndarray):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need some constructor that will take only continuous or only discrete so the user does not have to create an empty array when using only discrete or only continuous.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't the default be None but in the constructor assigns an empty array when None is specified? This is a common pattern for mutable default parameters

if continuous.dtype != np.float32:
andrewcoh marked this conversation as resolved.
Show resolved Hide resolved
continuous = continuous.astype(np.float32, copy=False)
self._continuous = continuous
if discrete.dtype != np.int32:
discrete = discrete.astype(np.int32, copy=False)
self._discrete = discrete

@property
def continuous(self) -> np.ndarray:
return self._continuous

@property
def discrete(self) -> np.ndarray:
return self._discrete


class ActionSpec(NamedTuple):
"""
A NamedTuple containing utility functions and information about the action spaces
Expand Down Expand Up @@ -287,62 +313,61 @@ def discrete_size(self) -> int:
"""
return len(self.discrete_branches)

def empty_action(self, n_agents: int) -> np.ndarray:
def empty_action(self, n_agents: int) -> ActionTuple:
"""
Generates a numpy array corresponding to an empty action (all zeros)
Generates ActionTuple corresponding to an empty action (all zeros)
for a number of agents.
:param n_agents: The number of agents that will have actions generated
"""
if self.is_continuous():
return np.zeros((n_agents, self.continuous_size), dtype=np.float32)
return np.zeros((n_agents, self.discrete_size), dtype=np.int32)
continuous = np.zeros((n_agents, self.continuous_size), dtype=np.float32)
discrete = np.zeros((n_agents, self.discrete_size), dtype=np.int32)
return ActionTuple(continuous, discrete)

def random_action(self, n_agents: int) -> np.ndarray:
def random_action(self, n_agents: int) -> ActionTuple:
"""
Generates a numpy array corresponding to a random action (either discrete
Generates ActionTuple corresponding to a random action (either discrete
or continuous) for a number of agents.
:param n_agents: The number of agents that will have actions generated
"""
if self.is_continuous():
action = np.random.uniform(
low=-1.0, high=1.0, size=(n_agents, self.continuous_size)
).astype(np.float32)
else:
branch_size = self.discrete_branches
action = np.column_stack(
continuous = np.random.uniform(
low=-1.0, high=1.0, size=(n_agents, self.continuous_size)
)
discrete = np.array([])
andrewcoh marked this conversation as resolved.
Show resolved Hide resolved
if self.discrete_size > 0:
discrete = np.column_stack(
[
np.random.randint(
0,
branch_size[i], # type: ignore
self.discrete_branches[i], # type: ignore
size=(n_agents),
dtype=np.int32,
)
for i in range(self.discrete_size)
]
)
return action
return ActionTuple(continuous, discrete)

def _validate_action(
andrewcoh marked this conversation as resolved.
Show resolved Hide resolved
self, actions: np.ndarray, n_agents: int, name: str
) -> np.ndarray:
self, actions: ActionTuple, n_agents: int, name: str
) -> ActionTuple:
"""
Validates that action has the correct action dim
for the correct number of agents and ensures the type.
"""
if self.continuous_size > 0:
_size = self.continuous_size
else:
_size = self.discrete_size
_expected_shape = (n_agents, _size)
if actions.shape != _expected_shape:
_expected_shape = (n_agents, self.continuous_size)
if actions.continuous.shape != _expected_shape:
raise UnityActionException(
f"The behavior {name} needs a continuous input of dimension "
f"{_expected_shape} for (<number of agents>, <action size>) but "
f"received input of dimension {actions.continuous.shape}"
)
_expected_shape = (n_agents, self.discrete_size)
if actions.discrete.shape != _expected_shape:
raise UnityActionException(
f"The behavior {name} needs an input of dimension "
f"The behavior {name} needs a discrete input of dimension "
f"{_expected_shape} for (<number of agents>, <action size>) but "
f"received input of dimension {actions.shape}"
f"received input of dimension {actions.discrete.shape}"
)
_expected_type = np.float32 if self.is_continuous() else np.int32
if actions.dtype != _expected_type:
actions = actions.astype(_expected_type)
return actions

@staticmethod
Expand Down Expand Up @@ -420,27 +445,30 @@ def behavior_specs(self) -> MappingType[str, BehaviorSpec]:
"""

@abstractmethod
def set_actions(self, behavior_name: BehaviorName, action: np.ndarray) -> None:
def set_actions(self, behavior_name: BehaviorName, action: ActionTuple) -> None:
"""
Sets the action for all of the agents in the simulation for the next
step. The Actions must be in the same order as the order received in
the DecisionSteps.
:param behavior_name: The name of the behavior the agents are part of
:param action: A two dimensional np.ndarray corresponding to the action
(either int or float)
:param action: ActionTuple tuple of continuous and/or discrete action.
Actions are np.arrays with dimensions (n_agents, continuous_size) and
(n_agents, discrete_size), respectively.
"""

@abstractmethod
def set_action_for_agent(
self, behavior_name: BehaviorName, agent_id: AgentId, action: np.ndarray
self, behavior_name: BehaviorName, agent_id: AgentId, action: ActionTuple
) -> None:
"""
Sets the action for one of the agents in the simulation for the next
step.
:param behavior_name: The name of the behavior the agent is part of
:param agent_id: The id of the agent the action is set for
:param action: A one dimensional np.ndarray corresponding to the action
(either int or float)
:param action: ActionTuple tuple of continuous and/or discrete action
Actions are np.arrays with dimensions (1, continuous_size) and
(1, discrete_size), respectively. Note, this initial dimensions of 1 is because
this action is meant for a single agent.
"""

@abstractmethod
Expand Down
21 changes: 15 additions & 6 deletions ml-agents-envs/mlagents_envs/environment.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
DecisionSteps,
TerminalSteps,
BehaviorSpec,
ActionTuple,
BehaviorName,
AgentId,
BehaviorMapping,
Expand Down Expand Up @@ -236,7 +237,7 @@ def __init__(

self._env_state: Dict[str, Tuple[DecisionSteps, TerminalSteps]] = {}
self._env_specs: Dict[str, BehaviorSpec] = {}
self._env_actions: Dict[str, np.ndarray] = {}
self._env_actions: Dict[str, ActionTuple] = {}
self._is_first_message = True
self._update_behavior_specs(aca_output)

Expand Down Expand Up @@ -336,7 +337,7 @@ def _assert_behavior_exists(self, behavior_name: str) -> None:
f"agent group in the environment"
)

def set_actions(self, behavior_name: BehaviorName, action: np.ndarray) -> None:
def set_actions(self, behavior_name: BehaviorName, action: ActionTuple) -> None:
self._assert_behavior_exists(behavior_name)
if behavior_name not in self._env_state:
return
Expand All @@ -346,7 +347,7 @@ def set_actions(self, behavior_name: BehaviorName, action: np.ndarray) -> None:
self._env_actions[behavior_name] = action

def set_action_for_agent(
andrewcoh marked this conversation as resolved.
Show resolved Hide resolved
self, behavior_name: BehaviorName, agent_id: AgentId, action: np.ndarray
self, behavior_name: BehaviorName, agent_id: AgentId, action: ActionTuple
) -> None:
self._assert_behavior_exists(behavior_name)
if behavior_name not in self._env_state:
Expand All @@ -366,7 +367,10 @@ def set_action_for_agent(
agent_id
)
) from ie
self._env_actions[behavior_name][index] = action
if action_spec.continuous_size > 0:
self._env_actions[behavior_name].continuous[index] = action.continuous[0, :]
if action_spec.discrete_size > 0:
self._env_actions[behavior_name].discrete[index] = action.discrete[0, :]

def get_steps(
self, behavior_name: BehaviorName
Expand Down Expand Up @@ -410,15 +414,20 @@ def _close(self, timeout: Optional[int] = None) -> None:

@timed
def _generate_step_input(
self, vector_action: Dict[str, np.ndarray]
self, vector_action: Dict[str, ActionTuple]
) -> UnityInputProto:
rl_in = UnityRLInputProto()
for b in vector_action:
n_agents = len(self._env_state[b][0])
if n_agents == 0:
continue
for i in range(n_agents):
action = AgentActionProto(vector_actions=vector_action[b][i])
# TODO: extend to AgentBuffers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this TODO mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meant as a TODO for the C# changes that change the proto to accept both continuous and discrete. Poorly worded on my part.

if vector_action[b].continuous is not None:
_act = vector_action[b].continuous[i]
else:
_act = vector_action[b].discrete[i]
action = AgentActionProto(vector_actions=_act)
rl_in.agent_actions[b].value.extend([action])
rl_in.command = STEP
rl_in.side_channel = bytes(
Expand Down
5 changes: 0 additions & 5 deletions ml-agents-envs/mlagents_envs/tests/test_envs.py
Original file line number Diff line number Diff line change
Expand Up @@ -97,11 +97,6 @@ def test_step(mock_communicator, mock_launcher):
env.step()
with pytest.raises(UnityActionException):
env.set_actions("RealFakeBrain", spec.action_spec.empty_action(n_agents - 1))
decision_steps, terminal_steps = env.get_steps("RealFakeBrain")
n_agents = len(decision_steps)
env.set_actions("RealFakeBrain", spec.action_spec.empty_action(n_agents) - 1)
env.step()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this piece of test removed ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bump

env.close()
assert isinstance(decision_steps, DecisionSteps)
assert isinstance(terminal_steps, TerminalSteps)
Expand Down
27 changes: 19 additions & 8 deletions ml-agents-envs/mlagents_envs/tests/test_steps.py
Original file line number Diff line number Diff line change
Expand Up @@ -81,24 +81,35 @@ def test_specs():
assert specs.discrete_branches == ()
assert specs.discrete_size == 0
assert specs.continuous_size == 3
assert specs.empty_action(5).shape == (5, 3)
assert specs.empty_action(5).dtype == np.float32
assert specs.empty_action(5).continuous.shape == (5, 3)
assert specs.empty_action(5).continuous.dtype == np.float32

specs = ActionSpec.create_discrete((3,))
assert specs.discrete_branches == (3,)
assert specs.discrete_size == 1
assert specs.continuous_size == 0
assert specs.empty_action(5).shape == (5, 1)
assert specs.empty_action(5).dtype == np.int32
assert specs.empty_action(5).discrete.shape == (5, 1)
assert specs.empty_action(5).discrete.dtype == np.int32

specs = ActionSpec(3, (3,))
assert specs.continuous_size == 3
assert specs.discrete_branches == (3,)
assert specs.discrete_size == 1
assert specs.empty_action(5).continuous.shape == (5, 3)
assert specs.empty_action(5).continuous.dtype == np.float32
assert specs.empty_action(5).discrete.shape == (5, 1)
assert specs.empty_action(5).discrete.dtype == np.int32


def test_action_generator():
# Continuous
action_len = 30
specs = ActionSpec.create_continuous(action_len)
zero_action = specs.empty_action(4)
zero_action = specs.empty_action(4).continuous
assert np.array_equal(zero_action, np.zeros((4, action_len), dtype=np.float32))
random_action = specs.random_action(4)
print(specs.random_action(4))
random_action = specs.random_action(4).continuous
print(random_action)
assert random_action.dtype == np.float32
assert random_action.shape == (4, action_len)
assert np.min(random_action) >= -1
Expand All @@ -107,10 +118,10 @@ def test_action_generator():
# Discrete
action_shape = (10, 20, 30)
specs = ActionSpec.create_discrete(action_shape)
zero_action = specs.empty_action(4)
zero_action = specs.empty_action(4).discrete
assert np.array_equal(zero_action, np.zeros((4, len(action_shape)), dtype=np.int32))

random_action = specs.random_action(4)
random_action = specs.random_action(4).discrete
assert random_action.dtype == np.int32
assert random_action.shape == (4, len(action_shape))
assert np.min(random_action) >= 0
Expand Down
19 changes: 15 additions & 4 deletions ml-agents/mlagents/trainers/agent_processor.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
from typing import List, Dict, TypeVar, Generic, Tuple, Any, Union
from collections import defaultdict, Counter
import queue
import numpy as np

from mlagents_envs.base_env import (
DecisionSteps,
Expand Down Expand Up @@ -129,14 +130,24 @@ def _process_step(
done = terminated # Since this is an ongoing step
interrupted = step.interrupted if terminated else False
# Add the outputs of the last eval
action = stored_take_action_outputs["action"][idx]
action_dict = stored_take_action_outputs["action"]
action: Dict[str, np.ndarray] = {}
for act_type, act_array in action_dict.items():
action[act_type] = act_array[idx]
if self.policy.use_continuous_act:
action_pre = stored_take_action_outputs["pre_action"][idx]
else:
action_pre = None
action_probs = stored_take_action_outputs["log_probs"][idx]
action_probs_dict = stored_take_action_outputs["log_probs"]
action_probs: Dict[str, np.ndarray] = {}
for prob_type, prob_array in action_probs_dict.items():
action_probs[prob_type] = prob_array[idx]

action_mask = stored_decision_step.action_mask
prev_action = self.policy.retrieve_previous_action([global_id])[0, :]
prev_action = self.policy.retrieve_previous_action([global_id])
prev_action_dict: Dict[str, np.ndarray] = {}
for _prev_act_type, _prev_act in prev_action.items():
prev_action_dict[_prev_act_type] = _prev_act[0, :]
experience = AgentExperience(
obs=obs,
reward=step.reward,
Expand All @@ -145,7 +156,7 @@ def _process_step(
action_probs=action_probs,
action_pre=action_pre,
action_mask=action_mask,
prev_action=prev_action,
prev_action=prev_action_dict,
interrupted=interrupted,
memory=memory,
)
Expand Down
2 changes: 1 addition & 1 deletion ml-agents/mlagents/trainers/buffer.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ class AgentBuffer(dict):

class AgentBufferField(list):
"""
AgentBufferField is a list of numpy arrays. When an agent collects a field, you can add it to his
AgentBufferField is a list of numpy arrays. When an agent collects a field, you can add it to its
AgentBufferField with the append method.
"""

Expand Down
9 changes: 8 additions & 1 deletion ml-agents/mlagents/trainers/demo_loader.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,14 @@ def make_demo_buffer(
for i, obs in enumerate(split_obs.visual_observations):
demo_raw_buffer["visual_obs%d" % i].append(obs)
demo_raw_buffer["vector_obs"].append(split_obs.vector_observations)
demo_raw_buffer["actions"].append(current_pair_info.action_info.vector_actions)
if behavior_spec.action_spec.is_continuous():
demo_raw_buffer["continuous_action"].append(
current_pair_info.action_info.vector_actions
)
else:
demo_raw_buffer["discrete_action"].append(
current_pair_info.action_info.vector_actions
)
andrewcoh marked this conversation as resolved.
Show resolved Hide resolved
demo_raw_buffer["prev_action"].append(previous_action)
if next_done:
demo_raw_buffer.resequence_and_append(
Expand Down
Loading