How to implement parallel training for model-based RL #292

jintaoXue · 2023-12-07T03:01:32Z

jintaoXue
Dec 7, 2023

I am currently using the Omnisafe library to implement a custom environment and observation space for training. However, I've noticed that the training speed is very slow. In response to this, I'd like to explore parallel training. However, it seems that Omnisafe currently does not support this. I've attempted to design a parallel environment on my own, but due to my limited knowledge of parallel libraries, debugging has proven to be challenging. Therefore, I would like to ask whether implementing model-based parallel training is feasible. If so, what modifications would be necessary?

in omnisafe/utils/config.py

def __check_parallel_and_vectorized(configs: Config, algo_type: str) -> None:
    """Check parallel and vectorized configs.

    This function is used to check the parallel and vectorized configs.

    Args:
        configs (Config): The configs to be checked.
        algo_type (str): The algorithm type.
    """
    if algo_type in {'off-policy', 'model-based', 'offline'}:
        assert (
            configs.train_cfgs.parallel == 1
        ), 'off-policy, offline and model-based only support parallel==1!'
    if configs.algo in ['PPOEarlyTerminated', 'TRPOEarlyTerminated']:
        assert (
            configs.train_cfgs.vector_env_nums == 1
        ), 'PPOEarlyTerminated or TRPOEarlyTerminated only support vector_env_nums == 1!'

in omnisafe/envs/safety_gymnasium_modelbased.py

        if num_envs == 1:
            self._env = safety_gymnasium.make(
                id=env_id.replace('-modelbased', ''),
                autoreset=False,
                **kwargs,
            )
            assert isinstance(self._env.action_space, Box), 'Only support Box action space.'
            assert isinstance(
                self._env.observation_space,
                Box,
            ), 'Only support Box observation space.'
            self._action_space = self._env.action_space
        else:
            raise NotImplementedError

Answered by Gaiejj

Dec 7, 2023

The implementation of parallelization is theoretically plausible.

A3C Parallel is established with the logic of the parallel code in the on-policy part, as indicated in the policy_gradient.py file, which contains the core of A3C. The crux lies in modifying steps_per_epoch based on the number of parallels, averaging the gradients of the actor-critic and world model before the optimizer's step.
Environment Parallel uses the vectorized environment for parallel data collection. You can also refer to omnsiafe/envs/safety_gymnasium.py to enable a vector environment.

Initial attempts at implementing parallelization were made, but subsequent tests suggested that the overhead of planning might o…

View full answer

Gaiejj · 2023-12-07T12:49:18Z

Gaiejj
Dec 7, 2023
Maintainer

The implementation of parallelization is theoretically plausible.

A3C Parallel is established with the logic of the parallel code in the on-policy part, as indicated in the policy_gradient.py file, which contains the core of A3C. The crux lies in modifying steps_per_epoch based on the number of parallels, averaging the gradients of the actor-critic and world model before the optimizer's step.
Environment Parallel uses the vectorized environment for parallel data collection. You can also refer to omnsiafe/envs/safety_gymnasium.py to enable a vector environment.

Initial attempts at implementing parallelization were made, but subsequent tests suggested that the overhead of planning might offset the accelerated effects of parallelization, leading to eventual discontinuation. Regardless, if you're interested, you might try to implement it and perhaps gain effective results.

In essence, you could remove warnings about the prohibition of parallel usage in the model-based context and refer to policy_gradient.py for inspiration to attempt A3C or environmental parallelization. We would like to encourage you first to identify whether the bottleneck in time consumption is due to the algorithm or ecological data collection.

You could run your environment using a model-free algorithm for reference. This is advised, since when algorithm updates or the planning process is the bottleneck, parallel support may not necessarily accelerate the speed.

1 reply

jintaoXue Dec 7, 2023
Author

Thanks for your reply!
And I found the bottleneck is not parallel environments. The task I'm working on is related to autonomous driving, and the observation space in the simulation environment is in polyline format, not images. Therefore, padding is required for a batch of data, slowing down the speed.
However, the more critical factor is that the transition model I designed can only predict the trajectories of other vehicles.
Therefore, when employing the planner (such as Arcplanner) for rollout in my task, there is a need to acquire new local map information from the simulation every time the ego vehicle's position is updated. This is leading to a slowdown in the training speed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to implement parallel training for model-based RL #292

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

How to implement parallel training for model-based RL #292

jintaoXue Dec 7, 2023

Replies: 1 comment · 1 reply

Gaiejj Dec 7, 2023 Maintainer

jintaoXue Dec 7, 2023 Author

jintaoXue
Dec 7, 2023

Replies: 1 comment 1 reply

Gaiejj
Dec 7, 2023
Maintainer

jintaoXue Dec 7, 2023
Author