-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Action buffer #4612
Action buffer #4612
Conversation
…member (no-member)
* Fix Gym for ActionSpec * Fix TF policy test
respectively. | ||
""" | ||
|
||
def __init__(self, continuous: np.ndarray, discrete: np.ndarray): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need some constructor that will take only continuous or only discrete so the user does not have to create an empty array when using only discrete or only continuous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't the default be None
but in the constructor assigns an empty array when None is specified? This is a common pattern for mutable default parameters
@@ -143,3 +146,15 @@ def _process_step_infos(self, step_infos: List[EnvironmentStep]) -> int: | |||
step_info.environment_stats, step_info.worker_id | |||
) | |||
return len(step_infos) | |||
|
|||
@staticmethod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bump
decision_steps, terminal_steps = env.get_steps("RealFakeBrain") | ||
n_agents = len(decision_steps) | ||
env.set_actions("RealFakeBrain", spec.action_spec.empty_action(n_agents) - 1) | ||
env.step() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bump
Co-authored-by: Vincent-Pierre BERGES <vincentpierre@unity3d.com>
+ self.behavior_spec.action_spec.discrete_size | ||
) | ||
self.previous_action_dict: Dict[str, np.array] = {} | ||
self.previous_action_dict: Dict[str, Dict[str, np.ndarray]] = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not Dict[str, ActionTuple]
? a dictionary with two hardcoded keys (in make_empty_previous_action()
) doesn't make much sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call out. I was doing this because it made writing to the buffer easier (e.g. using the string) but in reality we only ever use the previous action if its discrete. I've updated the PR to only save the discrete actions.
@@ -270,6 +271,14 @@ def get_action( | |||
) | |||
|
|||
self.save_memories(global_agent_ids, run_out.get("memory_out")) | |||
# For Compatibility with buffer changes for hybrid action support | |||
if "log_probs" in run_out: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it's not new to this PR, but should we move these strings to constant/enums? "pre_action" and "actions_pre" has me scared, and "continuous" is generally pretty typo-prone.
…nologies/ml-agents into develop-action-buffer
|
||
def __init__( | ||
self, | ||
continuous: Optional[np.ndarray] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I would prefer having this not be Optional. When the action is discrete only, the continuous action becomes of shape (n_agents, 0). I think it is slightly more general than having a None if there is no action because this way, we can eventually consider all actions to be hybrid, only some of them happen to have 0 discrete or continuous actions. @chriselion what do you think? I might be missing something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for initializer arguments, keeping them Optional is nice. If you want to ensure that the fields themselves are non-None, you could "allocate" the np array on construction if they weren't provided (but not sure that's meaningful since there would be 0 elements).
As a side note, I think we're assuming that continuous.shape[0] == discrete.shape[0]
- might be good to check that too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_validate_actions
will check this`
) -> UnityInputProto: | ||
rl_in = UnityRLInputProto() | ||
for b in vector_action: | ||
n_agents = len(self._env_state[b][0]) | ||
if n_agents == 0: | ||
continue | ||
for i in range(n_agents): | ||
action = AgentActionProto(vector_actions=vector_action[b][i]) | ||
# TODO: This check will be removed when the oroto supports hybrid actions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# TODO: This check will be removed when the oroto supports hybrid actions | |
# TODO: This check will be removed when the proto supports hybrid actions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the remaining issues will end up being fixed by #4651, this part LGTM
_expected_shape = (n_agents, _size) | ||
if actions.shape != _expected_shape: | ||
_expected_shape = (n_agents, self.continuous_size) | ||
if self.continuous_size > 0 and actions.continuous.shape != _expected_shape: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why check if self.continuous_size > 0 ? same question line 36 with discrete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is removed in the ActionModel because the defaults make it unnecessary
) -> UnityInputProto: | ||
rl_in = UnityRLInputProto() | ||
for b in vector_action: | ||
n_agents = len(self._env_state[b][0]) | ||
if n_agents == 0: | ||
continue | ||
for i in range(n_agents): | ||
action = AgentActionProto(vector_actions=vector_action[b][i]) | ||
# TODO: This check will be removed when the oroto supports hybrid actions | ||
if vector_action[b].continuous.shape[1] > 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why check on the shape[1] rather than action_specs.continuous_size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ActionSpec is not available here. This is the only way I think we can determine if the action space is continuous or discrete in this function. Critically, this is just to support the old proto until the new proto is merged. This is just temporary so that the communication protocol works until the new protos are in place.
Proposed change(s)
Describe the changes made in this PR.
Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)
Types of change(s)
Checklist
Other comments