You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I find that DI-Engine aims to support IMPALA for both continuous and discrete action spaces, however, there is no config examples for continuous action tasks. One thing I notice is that IMPALA uses arg-max for evaluation, which I suppose to be designed for discrete action tasks specifically? I am wondering if we can have some demos of IMPALA on continuous action tasks?
I tried to make one, but it fails to run:
from easydict import EasyDict
bipedalwalker_impala_config = dict(
exp_name='bipedalwalker_impala_seed0',
env=dict(
env_id='BipedalWalker-v3',
collector_env_num=8,
evaluator_env_num=5,
# (bool) Scale output action into legal range.
act_scale=True,
n_evaluator_episode=5,
stop_value=300,
rew_clip=True,
# The path to save the game replay
# replay_path='./bipedalwalker_ppo_seed0/video',
),
policy=dict(
cuda=False,
action_space='continuous',
model=dict(
action_space='continuous',
obs_shape=24,
action_shape=4,
),
learn=dict(
# (int) collect n_sample data, train model update_per_collect times
# here we follow ppo serial pipeline
update_per_collect=4,
# (int) the number of data for a train iteration
batch_size=16,
learning_rate=0.0005,
# (float) loss weight of the value network, the weight of policy network is set to 1
value_weight=0.5,
# (float) loss weight of the entropy regularization, the weight of policy network is set to 1
entropy_weight=0.0001,
# (float) discount factor for future reward, defaults int [0, 1]
discount_factor=0.9,
# (float) additional discounting parameter
lambda_=0.95,
# (int) the trajectory length to calculate v-trace target
unroll_len=32,
# (float) clip ratio of importance weights
rho_clip_ratio=1.0,
# (float) clip ratio of importance weights
c_clip_ratio=1.0,
# (float) clip ratio of importance sampling
rho_pg_clip_ratio=1.0,
),
collect=dict(
# (int) collect n_sample data, train model n_iteration times
n_sample=16,
# (int) the trajectory length to calculate v-trace target
unroll_len=32,
# (float) discount factor for future reward, defaults int [0, 1]
discount_factor=0.9,
gae_lambda=0.95,
collector=dict(collect_print_freq=1000, ),
),
eval=dict(evaluator=dict(eval_freq=200, )),
other=dict(replay_buffer=dict(
replay_buffer_size=1000,
max_use=16,
), ),
),
)
bipedalwalker_impala_config = EasyDict(bipedalwalker_impala_config)
main_config = bipedalwalker_impala_config
bipedalwalker_impala_create_config = dict(
env=dict(
type='bipedalwalker',
import_names=['dizoo.box2d.bipedalwalker.envs.bipedalwalker_env'],
),
env_manager=dict(type='base'),
policy=dict(type='impala'),
)
bipedalwalker_impala_create_config = EasyDict(bipedalwalker_impala_create_config)
create_config = bipedalwalker_impala_create_config
if __name__ == "__main__":
# or you can enter `ding -m serial_onpolicy -c bipedalwalker_impala_config.py -s 0`
from ding.entry import serial_pipeline_onpolicy
serial_pipeline_onpolicy([main_config, create_config], seed=0)
The text was updated successfully, but these errors were encountered:
The current implementation only supports discrete action space, we will add a continuous version and related examples next week. You can continue to follow this issue.
Hi, I find that DI-Engine aims to support IMPALA for both continuous and discrete action spaces, however, there is no config examples for continuous action tasks. One thing I notice is that IMPALA uses arg-max for evaluation, which I suppose to be designed for discrete action tasks specifically? I am wondering if we can have some demos of IMPALA on continuous action tasks?
I tried to make one, but it fails to run:
The text was updated successfully, but these errors were encountered: