Skip to content

Commit

Permalink
Adds reset and step method to the BaseEnv class (#239)
Browse files Browse the repository at this point in the history
# Description

The current `omni.isaac.orbit.envs.BaseEnv` does not include the methods
to `reset` and `step`, while the `RLTaskEnv` adds those functionalities.
This PR unifies the structure of an `Env` and adds these core components
to the `BaseEnv` as well.


## Type of change

- New feature (non-breaking change which adds functionality)
- This change requires a documentation update

## Checklist

- [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with
`./orbit.sh --format`
- [ ] I have made corresponding changes to the documentation
- [x] My changes generate no new warnings
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have updated the changelog and the corresponding version in the
extension's `config/extension.toml` file
- [x] I have added my name to the `CONTRIBUTORS.md` or my name already
exists there

---------

Co-authored-by: Mayank Mittal <mittalma@leggedrobotics.com>
  • Loading branch information
pascal-roth and Mayankm96 authored Nov 16, 2023
1 parent 849d9b4 commit 39f4e96
Show file tree
Hide file tree
Showing 6 changed files with 681 additions and 70 deletions.
2 changes: 1 addition & 1 deletion source/extensions/omni.isaac.orbit/config/extension.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
[package]

# Note: Semantic Versioning is used: https://semver.org/
version = "0.9.43"
version = "0.9.44"

# Description
title = "ORBIT framework for Robot Learning"
Expand Down
10 changes: 10 additions & 0 deletions source/extensions/omni.isaac.orbit/docs/CHANGELOG.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,16 @@
Changelog
---------

0.9.44 (2023-11-16)
~~~~~~~~~~~~~~~~~~~

Added
^^^^^

* Added methods :meth:`reset` and :meth:`step` to the :class:`omni.isaac.orbit.envs.BaseEnv`. This unifies
the environment interface for simple standalone applications with the class.


0.9.43 (2023-11-16)
~~~~~~~~~~~~~~~~~~~

Expand Down
118 changes: 118 additions & 0 deletions source/extensions/omni.isaac.orbit/omni/isaac/orbit/envs/base_env.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@
from __future__ import annotations

import builtins
import torch
from typing import Any, Dict, Sequence, Union

import omni.isaac.core.utils.torch as torch_utils

Expand All @@ -16,6 +18,29 @@

from .base_env_cfg import BaseEnvCfg

VecEnvObs = Dict[str, Union[torch.Tensor, Dict[str, torch.Tensor]]]
"""Observation returned by the environment.
The observations are stored in a dictionary. The keys are the group to which the observations belong.
This is useful for various setups such as reinforcement learning with asymmetric actor-critic or
multi-agent learning. For non-learning paradigms, this may include observations for different components
of a system.
Within each group, the observations can be stored either as a dictionary with keys as the names of each
observation term in the group, or a single tensor obtained from concatenating all the observation terms.
For example, for asymmetric actor-critic, the observation for the actor and the critic can be accessed
using the keys ``"policy"`` and ``"critic"`` respectively.
Note:
By default, most learning frameworks deal with default and privileged observations in different ways.
This handling must be taken care of by the wrapper around the :class:`RLTaskEnv` instance.
For included frameworks (RSL-RL, RL-Games, skrl), the observations must have the key "policy". In case,
the key "critic" is also present, then the critic observations are taken from the "critic" group.
Otherwise, they are the same as the "policy" group.
"""


class BaseEnv:
"""The base environment encapsulates the simulation scene and the environment managers.
Expand Down Expand Up @@ -112,6 +137,9 @@ def __init__(self, cfg: BaseEnvCfg):
# if no window, then we don't need to store the window
self._window = None

# allocate dictionary to store metrics
self.extras = {}

def __del__(self):
"""Cleanup for the environment."""
self.close()
Expand Down Expand Up @@ -171,6 +199,66 @@ def load_managers(self):
Operations - MDP.
"""

def reset(self, seed: int | None = None, options: dict[str, Any] | None = None) -> tuple[VecEnvObs, dict]:
"""Resets all the environments and returns observations.
Args:
seed: The seed to use for randomization. Defaults to None, in which case the seed is not set.
options: Additional information to specify how the environment is reset. Defaults to None.
Note:
This argument is used for compatibility with Gymnasium environment definition.
Returns:
A tuple containing the observations and extras.
"""
# set the seed
if seed is not None:
self.seed(seed)
# reset state of scene
indices = torch.arange(self.num_envs, dtype=torch.int64, device=self.device)
self._reset_idx(indices)
# return observations
return self.observation_manager.compute(), self.extras

def step(self, action: torch.Tensor) -> VecEnvObs:
"""Execute one time-step of the environment's dynamics.
The environment steps forward at a fixed time-step, while the physics simulation is
decimated at a lower time-step. This is to ensure that the simulation is stable. These two
time-steps can be configured independently using the :attr:`BaseEnvCfg.decimation` (number of
simulation steps per environment step) and the :attr:`BaseEnvCfg.sim.dt` (physics time-step).
Based on these parameters, the environment time-step is computed as the product of the two.
Args:
action: The actions to apply on the environment. Shape is ``(num_envs, action_dim)``.
Returns:
A tuple containing the observations and extras.
"""
# process actions
self.action_manager.process_action(action)
# perform physics stepping
for _ in range(self.cfg.decimation):
# set actions into buffers
self.action_manager.apply_action()
# set actions into simulator
self.scene.write_data_to_sim()
# simulate
self.sim.step(render=False)
# update buffers at sim dt
self.scene.update(dt=self.physics_dt)
# perform rendering if gui is enabled
if self.sim.has_gui():
self.sim.render()

# post-step: step interval randomization
if "interval" in self.randomization_manager.available_modes:
self.randomization_manager.randomize(mode="interval", dt=self.step_dt)

# return observations and extras
return self.observation_manager.compute(), self.extras

@staticmethod
def seed(seed: int = -1) -> int:
"""Set the seed for the environment.
Expand Down Expand Up @@ -202,3 +290,33 @@ def close(self):
self._window = None
# update closing status
self._is_closed = True

"""
Helper functions.
"""

def _reset_idx(self, env_ids: Sequence[int]):
"""Reset environments based on specified indices.
Args:
env_ids: List of environment ids which must be reset
"""
# reset the internal buffers of the scene elements
self.scene.reset(env_ids)
# randomize the MDP for environments that need a reset
if "reset" in self.randomization_manager.available_modes:
self.randomization_manager.randomize(env_ids=env_ids, mode="reset")

# iterate over all managers and reset them
# this returns a dictionary of information which is stored in the extras
# note: This is order-sensitive! Certain things need be reset before others.
self.extras["log"] = dict()
# -- observation manager
info = self.observation_manager.reset(env_ids)
self.extras["log"].update(info)
# -- action manager
info = self.action_manager.reset(env_ids)
self.extras["log"].update(info)
# -- randomization manager
info = self.randomization_manager.reset(env_ids)
self.extras["log"].update(info)
Original file line number Diff line number Diff line change
Expand Up @@ -9,40 +9,16 @@
import math
import numpy as np
import torch
from typing import Any, ClassVar, Dict, Sequence, Tuple, Union
from typing import Any, ClassVar, Dict, Sequence, Tuple

from omni.isaac.version import get_version

from omni.isaac.orbit.command_generators import CommandGeneratorBase
from omni.isaac.orbit.managers import CurriculumManager, RewardManager, TerminationManager

from .base_env import BaseEnv
from .base_env import BaseEnv, VecEnvObs
from .rl_task_env_cfg import RLTaskEnvCfg

VecEnvObs = Dict[str, Union[torch.Tensor, Dict[str, torch.Tensor]]]
"""Observation returned by the environment.
The observations are stored in a dictionary. The keys are the group to which the observations belong.
This is useful for various learning setups beyond vanilla reinforcement learning, such as asymmetric
actor-critic, multi-agent, or hierarchical reinforcement learning.
For example, for asymmetric actor-critic, the observation for the actor and the critic can be accessed
using the keys ``"policy"`` and ``"critic"`` respectively.
Within each group, the observations can be stored either as a dictionary with keys as the names of each
observation term in the group, or a single tensor obtained from concatenating all the observation terms.
Note:
By default, most learning frameworks deal with default and privileged observations in different ways.
This handling must be taken care of by the wrapper around the :class:`RLTaskEnv` instance.
For included frameworks (RSL-RL, RL-Games, skrl), the observations must have the key "policy". In case,
the key "critic" is also present, then the critic observations are taken from the "critic" group.
Otherwise, they are the same as the "policy" group.
"""


VecEnvStepReturn = Tuple[VecEnvObs, torch.Tensor, torch.Tensor, torch.Tensor, Dict]
"""The environment signals processed at the end of each step.
Expand Down Expand Up @@ -76,6 +52,14 @@ class RLTaskEnv(BaseEnv, gym.Env):
environment. Thus, to reduce complexity, we directly use the :class:`gym.Env` over
here and leave it up to library-defined wrappers to take care of wrapping this
environment for their agents.
Note:
For vectorized environments, it is recommended to **only** call the :meth:`reset`
method once before the first call to :meth:`step`, i.e. after the environment is created.
After that, the :meth:`step` function handles the reset of terminated sub-environments.
This is because the simulator does not support resetting individual sub-environments
in a vectorized environment.
"""

is_vector_env: ClassVar[bool] = True
Expand Down Expand Up @@ -107,8 +91,6 @@ def __init__(self, cfg: RLTaskEnvCfg, render_mode: str | None = None, **kwargs):
self.common_step_counter = 0
# -- init buffers
self.episode_length_buf = torch.zeros(self.num_envs, device=self.device, dtype=torch.long)
# -- allocate dictionary to store metrics
self.extras = {}

# setup the action and observation spaces for Gym
self._configure_gym_env_spaces()
Expand Down Expand Up @@ -158,48 +140,18 @@ def load_managers(self):
Operations - MDP
"""

def reset(self, seed: int | None = None, options: dict[str, Any] | None = None) -> tuple[VecEnvObs, dict]:
"""Resets all the environments and returns observations and extras.
Note:
This function (if called) must **only** be called before the first call to :meth:`step`, i.e.
after the environment is created. After that, the :meth:`step` function handles the reset
of terminated sub-environments.
Args:
seed: The seed to use for randomization. Defaults to None, in which case the seed is not set.
options: Additional information to specify how the environment is reset. Defaults to None.
Note:
This is not used in the current implementation. It is mostly there for compatibility with
Gymnasium environment definition.
Returns:
A tuple containing the observations and extras.
"""
# set the seed
if seed is not None:
gym.Env.reset(self, seed=seed)
self.seed(seed)
# reset state of scene
indices = torch.arange(self.num_envs, dtype=torch.int64, device=self.device)
self._reset_idx(indices)
# return observations
return self.observation_manager.compute(), self.extras

def step(self, action: torch.Tensor) -> VecEnvStepReturn:
"""Run one timestep of the environment's dynamics and reset terminated environments.
"""Execute one time-step of the environment's dynamics and reset terminated environments.
The environment dynamics may comprise of many steps of the physics engine. The number of steps
is controlled by the :attr:`RLTaskEnvCfg.decimation` parameter in the configuration. This means
that the agent control can happen at a slower rate than the physics simulation. This is useful
for real-time control of the robot, where the control loop may be slower than the frequency of
the actual dynamics.
Unlike the :class:`BaseEnv.step` class, the function performs the following operations:
The function also handles resetting of the terminated environments, at the end of the physics
stepping and computation of the reward and terminated signals. This is because it is not
possible to reset the sub-environments individually due to the vectorized implementation
of sub-environments in the simulator.
1. Process the actions.
2. Perform physics stepping.
3. Perform rendering if gui is enabled.
4. Update the environment counters and compute the rewards and terminations.
5. Reset the environments that terminated.
6. Compute the observations.
7. Return the observations, rewards, resets and extras.
Args:
action: The actions to apply on the environment. Shape is ``(num_envs, action_dim)``.
Expand Down Expand Up @@ -255,12 +207,12 @@ def render(self) -> np.ndarray | None:
By convention, if mode is:
- **human**: render to the current display and return nothing. Usually for human consumption.
- **human**: Render to the current display and return nothing. Usually for human consumption.
- **rgb_array**: Return an numpy.ndarray with shape (x, y, 3), representing RGB values for an
x-by-y pixel image, suitable for turning into a video.
Returns:
The rendered image as a numpy array if mode is "rgb_array".
The rendered image as a numpy array if mode is "rgb_array". Otherwise, returns None.
Raises:
RuntimeError: If mode is set to "rgb_data" and simulation render mode does not support it.
Expand Down
Loading

0 comments on commit 39f4e96

Please sign in to comment.