Robust Deep RL with a Soft Actor-Critic approach with adversarial perturbation on state observations
I designed new Robust Deep RL with a Soft Actor-Critic approach with adversarial perturbation on state observations. My work is based on SA-MDP, which is proposed by Zhang et al. (2020). For more detailed explanation, please check attached pdf file. **2022 Spring Semester, Personal Project Research _Kyungphil Park
SA-MDP assumes that the fixed-adversarial attack is the situation of the worst-case with the most minimized Q value following equations, and Zhang et al. (2020) newly define it as a SA-MDP. **Zhang et al. (2020)
In our work, we need to solve a minimax problem: minimizing the policy loss for a worst case
- object function
I designed Robust Deep RL with a soft actor critic approach in discrete action space. I tested SA-SAC in a several atari gym environments. SAC codes are based on the **bernomone's github codes.
At first, make new three directories saved_models
, vidoes
and Logs
.
- Before you start training, set
n_steps
,memory_size
,train_start
,reg_train_start
… at theconfig01.json
file. n_steps
: total nubmer of steps you want to train.memory_size
: buffer memory sizetrain_start:
number of steps when training begins.reg_train_start
: number of steps when training with SA-Regularizer begins.
train.py
--config=config01.json(default)
--new=1(default) # set 0 when you load pretrained models
--game=BeamRider(default) # set any atari game environment
- example:
python train.py
,python [train.py](http://train.py) —game=Assault
robust_train.py
--config=config01.json(default)
--new=1(default) # set 0 when you load pretrained models
--game=BeamRider(default) # set any atari game environment
- example:
python robust_train.py
,python robust_[train.py](http://train.py) —game=Assault
- render atari game video with your trained models.
generate_match_video.py
--config=config01.json(default)
--seed=0(default)
--game=BeamRider(default) # set any atari game environment
--random=False(default) # set 1 when you want to test random action.
- example:
python generate_match_video.py
,python generate_match_video[.py](http://train.py) —game=Assault --random=1
(+ PGD attack(adversarial perturbation on state observation)
- render atari game video with your trained models
PGD_generate_video.py
--config=config01.json(default)
--seed=0(default)
--game=BeamRider(default) # set any atari game environment
--steps=10(default) # set PGD attack steps number.
- example:
python PGD_generate_video.py
,python PGD_generate_video[.py](http://train.py) —game=Assault
- test trained models for several episodes.
evalulation.py
--config=config01.json(default)
--seed=0(default)
--game=BeamRider(default) # set any atari game environment
--iter=10(default) # set iteration number(tot episode number).
- example:
python evalulation.py
,python evalulation[.py](http://train.py) —game=Assault —iter=30
(+ PGD attack(adversarial perturbation on state observation)
- test trained models for several episodes.
pgd_evalulation.py
--config=config01.json(default)
--seed=0(default)
--game=BeamRider(default) # set any atari game environment
--iter=10(default) # set iteration number(tot episode number).
- example:
python pgd_evalulation.py
,python pgd_evalulation[.py](http://train.py) —game=Assault —iter=30