Skip to content

2020 "Good Robot!" Paper Release - Stack and Row Models

Compare
Choose a tag to compare
@ahundt ahundt released this 06 Oct 17:21
· 1051 commits to master since this release
8b05df4

"Good Robot!" Efficient Reinforcement Learning for Multi-Step Visual Tasks with Sim to Real Transfer

Andrew Hundt, Benjamin Killeen, Nicholas Greene, Hongtao Wu, Heeyeon Kwon, Chris Paxton, and Gregory D. Hager

Click the image to watch the video:

"Good Robot!": Efficient Reinforcement Learning for Multi Step Visual Tasks via Reward Shaping

Paper, Abstract, and Citations

Good Robot! Paper on IEEE Xplore,
Good Robot! Paper on ArXiV

@article{hundt2020good,
	title="“Good Robot!”: Efficient Reinforcement Learning for Multi-Step Visual Tasks with Sim to Real Transfer",
	author="Andrew {Hundt} and Benjamin {Killeen} and Nicholas {Greene} and Hongtao {Wu} and Heeyeon {Kwon} and Chris {Paxton} and Gregory D. {Hager}",
	journal="IEEE Robotics and Automation Letters (RA-L)",
	volume="5",
	number="4",
	pages="6724--6731",
	year="2020",
	url={https://arxiv.org/abs/1909.11730}
}

Abstract— Current Reinforcement Learning (RL) algorithms struggle with long-horizon tasks where time can be wasted exploring dead ends and task progress may be easily reversed. We develop the SPOT framework, which explores within action safety zones, learns about unsafe regions without exploring them, and prioritizes experiences that reverse earlier progress to learn with remarkable efficiency.

The SPOT framework successfully completes simulated trials of a variety of tasks, improving a baseline trial success rate from 13% to 100% when stacking 4 cubes, from 13% to 99% when creating rows of 4 cubes, and from 84% to 95% when clearing toys arranged in adversarial patterns. Efficiency with respect to actions per trial typically improves by 30% or more, while training takes just 1-20k actions, depending on the task.

Furthermore, we demonstrate direct sim to real transfer. We are able to create real stacks in 100% of trials with 61% efficiency and real rows in 100% of trials with 59% efficiency by directly loading the simulation-trained model on the real robot with no additional real-world fine-tuning. To our knowledge, this is the first instance of reinforcement learning with successful sim to real transfer applied to long term multi-step tasks such as block-stacking and row-making with consideration of progress reversal. Code is available at https://github.com/jhu-lcsr/good_robot.

Raw data for key final models

Stacking Run Model with Trial Reward and SPOT-Q

2020-05-13-12-51-39_Sim-Stack-SPOT-Trial-Reward-Masked-Training-Sim-Stack-SPOT-Trial-Reward-Masked-Training_success_plot

SIM TO REAL TESTING STACK - TEST - SPOT-Q-MASKED - COMMON SENSE - TRIAL REWARD - FULL FEATURED RUN - SORT TRIAL REWARD - REWARD SCHEDULE 0.1, 1, 1 - costar 2020-05-13 - test on costar 2020-06-05
----------------------------------------------------------------------------------------
export CUDA_VISIBLE_DEVICES="0" && python3 main.py --num_obj 8  --push_rewards --experience_replay --explore_rate_decay --trial_reward --common_sense --check_z_height --place --future_reward_discount 0.65 --is_testing --random_seed 1238 --max_test_trials 10 --save_visualizations --random_actions --snapshot_file /media/costar/f5f1f858-3666-4832-beea-b743127f1030/real_good_robot/logs/2020-05-13-12-51-39_Sim-Stack-SPOT-Trial-Reward-Masked-Training/models/snapshot.reinforcement_action_efficiency_best_value.pth
/media/costar/f5f1f858-3666-4832-beea-b743127f1030/real_good_robot/logs/2020-05-13-12-51-39_Sim-Stack-SPOT-Trial-Reward-Masked-Training/models/snapshot.reinforcement_action_efficiency_best_value.pth
Commit: cb55d6b8a6e8abfb1185dd945c0689ddf40546b0



Creating data logging session: /media/costar/f5f1f858-3666-4832-beea-b743127f1030/real_good_robot/logs/2020-06-05-18-28-46_Real-Stack-SPOT-Trial-Reward-Masked-Testing
Testing Complete! Dir: /media/costar/f5f1f858-3666-4832-beea-b743127f1030/real_good_robot/logs/2020-06-05-18-28-46_Real-Stack-SPOT-Trial-Reward-Masked-Testing
Testing results: 
 {'trial_success_rate_best_value': 1.0, 'trial_success_rate_best_index': 108, 'grasp_success_rate_best_value': 0.703125, 'grasp_success_rate_best_index': 108, 'place_success_rate_best_value': 0.8888888888888888, 'place_success_rate_best_index': 110, 'action_efficiency_best_value': 0.6111111111111112, 'action_efficiency_best_index': 110}


Row Model with Progress Reward and SPOT-Q


SIM TO REAL ROW - TEST - Task Progress SPOT-Q MASKED - REWARD SCHEDULE 0.1, 1, 1 - workstation named spot 2020-06-03 - test on costar 2020-06-07
----------------------------------------------------------------------------------------
export CUDA_VISIBLE_DEVICES="0" && python3 main.py --num_obj 4 --push_rewards --experience_replay --explore_rate_decay --check_row --check_z_height --place --future_reward_discount 0.65  --is_testing --random_seed 1238 --max_test_trials 10 --random_actions --save_visualizations --common_sense --snapshot_file "/home/costar/src/real_good_robot/logs/2020-06-03-12-05-28_Sim-Rows-Two-Step-Reward-Masked-Training/models/snapshot.reinforcement_trial_success_rate_best_value.pth"
SIM export CUDA_VISIBLE_DEVICES="1" && python3 main.py --is_sim --obj_mesh_dir objects/blocks --num_obj 4 --push_rewards --experience_replay --explore_rate_decay --check_row --tcp_port 19998 --place --future_reward_discount 0.65 --max_train_actions 20000 --random_actions --common_sense
SIM on spot workstation Creating data logging session: /home/ahundt/src/real_good_robot/logs/2020-06-03-12-05-28_Sim-Rows-Two-Step-Reward-Masked-Training
SIM Commit: 12d9481717486342dbfcaff191ddb1428f102406  release tag:v0.16.1
SIM GPU 1, Tab 1, port 19998, center left v-rep window, v-rep tab 8

SIM Random Testing Complete! Dir: /home/ahundt/src/real_good_robot/logs/2020-06-03-12-05-28_Sim-Rows-Two-Step-Reward-Masked-Training/2020-06-06-21-34-07_Sim-Rows-Two-Step-Reward-Masked-Testing
SIM Random Testing results: {'trial_success_rate_best_value': 1.0, 'trial_success_rate_best_index': 667, 'grasp_success_rate_best_value': 0.850415512465374, 'grasp_success_rate_best_index': 667, 'place_success_rate_best_value': 0.7752442996742671, 'place_success_rate_best_index': 667, 'action_efficiency_best_value': 0.9265367316341829, 'action_efficiency_best_index': 667}
"snapshot_file": "/home/ahundt/src/real_good_robot/logs/2020-06-03-12-05-28_Sim-Rows-Two-Step-Reward-Masked-Training/models/snapshot.reinforcement_trial_success_rate_best_value.pth"

Pre-trained model snapshot loaded from: /home/costar/src/real_good_robot/logs/2020-06-03-12-05-28_Sim-Rows-Two-Step-Reward-Masked-Training/models/snapshot.reinforcement_trial_success_rate_best_value.pth
Creating data logging session: /media/costar/f5f1f858-3666-4832-beea-b743127f1030/real_good_robot/logs/2020-06-07-17-19-34_Real-Rows-Two-Step-Reward-Masked-Testing

Note on trial 8 or 9 a row was completed correctly, but the sensor didn't pick it up, so I slid the blocks into the middle of the space while maintaining the exact relative position so it would be scored correctly by the row detector (one extra action took place).

    > STACK:  trial: 11 actions/partial: 3.0714285714285716  actions/full stack: 7.818181818181818 (lower is better)  Grasp Count: 52, grasp success rate: 0.6538461538461539 place_on_stack_rate: 0.8235294117647058 place_attempts: 34  partial_stack_successes: 28  stack_successes: 11 trial_success_rate: 1.0 stack goal: None current_height: 0.3236363636363636
    > Move to Home Position Complete
    > Move to Home Position Complete
    > trial_complete_indices: [ 7.  9. 17. 21. 30. 50. 54. 59. 69. 73. 85.]
    > Max trial success rate: 1.0, at action iteration: 82. (total of 84 actions, max excludes first 82 actions)
    > Max grasp success rate: 0.68, at action iteration: 83. (total of 84 actions, max excludes first 82 actions)
    > Max place success rate: 0.8181818181818182, at action iteration: 83. (total of 84 actions, max excludes first 82 actions)
    > Max action efficiency: 0.8780487804878049, at action iteration: 84. (total of 85 actions, max excludes first 82 actions)
    > saving trial success rate: /media/costar/f5f1f858-3666-4832-beea-b743127f1030/real_good_robot/logs/2020-06-07-17-19-34_Real-Rows-Two-Step-Reward-Masked-Testing/transitions/trial-success-rate.log.csv
    > saving grasp success rate: /media/costar/f5f1f858-3666-4832-beea-b743127f1030/real_good_robot/logs/2020-06-07-17-19-34_Real-Rows-Two-Step-Reward-Masked-Testing/transitions/grasp-success-rate.log.csv
    > saving place success rate: /media/costar/f5f1f858-3666-4832-beea-b743127f1030/real_good_robot/logs/2020-06-07-17-19-34_Real-Rows-Two-Step-Reward-Masked-Testing/transitions/place-success-rate.log.csv
    > saving action efficiency: /media/costar/f5f1f858-3666-4832-beea-b743127f1030/real_good_robot/logs/2020-06-07-17-19-34_Real-Rows-Two-Step-Reward-Masked-Testing/transitions/action-efficiency.log.csv
    > saving plot: 2020-06-07-17-19-34_Real-Rows-Two-Step-Reward-Masked-Testing-Real-Rows-Two-Step-Reward-Masked-Testing_success_plot.png
    > saving best stats to: /media/costar/f5f1f858-3666-4832-beea-b743127f1030/real_good_robot/logs/2020-06-07-17-19-34_Real-Rows-Two-Step-Reward-Masked-Testing/data/best_stats.json
    > saving best stats to: /media/costar/f5f1f858-3666-4832-beea-b743127f1030/real_good_robot/logs/2020-06-07-17-19-34_Real-Rows-Two-Step-Reward-Masked-Testing/best_stats.json
    > Testing Complete! Dir: /media/costar/f5f1f858-3666-4832-beea-b743127f1030/real_good_robot/logs/2020-06-07-17-19-34_Real-Rows-Two-Step-Reward-Masked-Testing
    > Testing results:
    > {'trial_success_rate_best_value': 1.0, 'trial_success_rate_best_index': 82, 'grasp_success_rate_best_value': 0.68, 'grasp_success_rate_best_index': 83, 'place_success_rate_best_value': 0.8181818181818182, 'place_success_rate_best_index': 83, 'action_efficiency_best_value': 0.8780487804878049, 'action_efficiency_best_index': 84}