Release maro-0.3.2a1 🚀 · microsoft/maro

Refine RL workflow
- Add **kwargs to support more problem setting (e.g., Graph based ones) (#589)
  - add **kwargs to RL models' forward funcs and _shape_check()
  - add **kwargs to RL policies' get_action related funcs and _post_check()
  - add **kwargs to choose_actions of AbsEnvSampler; remain it None in current sample() and eval()
- Add detached loss to the return value of update_critic() and update_actor() of current TrainOps; add default False early_stop to update_actor() of current TrainOps (#589)
- Refine random seed setting logic in RL workflow (#584)
- Refine rollout workflow (#577) to support:
  - Run a specific number of steps in rollout
  - Run a specific number of episodes during evaluation with num_eval_episodes
  - Flexible metrics management during rollout with AbsEnvSampler.metrics
- Add AbsEnvSampler.metrics to support flexible metrics management during roullout (#577)
- Add Callback as a general interface to support customized operations in each phase of the workflow.
  - Two instances Checkpoint and MetricsRecorder are added. (#577)
  - Add customized_callbacks to RLComponentBundle. (#589)
- Re-organize RL job's output paths. (#577)
- Fix several RL algorithm bugs. (#577, #589)
Replace the numpy data type with python common data type in whole project (#571)
Add RL benchmark on Mujoco as a module to tests/, compared with spinning up benchmark, performance results can be found in tests/rl/performance.md (#575, #577, #583, #584)
Other minor code refinements

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

maro-0.3.2a1 🚀