Training - Transfer grasp to stack - any toy - check_z_height - Situation Removal - EfficientNet-B0 - V0.6
Pre-releaseThis is Training progress for transfering grasp any toy weights to stack any toy weights with situation removal enabled, where any action that undoes past progress will be given a reward of 0.
This release also now supports continuous z height stacking based on the depth image data projected on to a height map. The highest value after a 5x5 median blur is applied is used as the stacking progress height.
Here is a placement action which failed, but it looks good visually and is a nice example of the training process:
Training images of successful stacks:
Status printout:
Training iteration: 7244
Change detected: True (value: 2160)
Primitive confidence scores: 1.819177 (push), 1.674433 (grasp), 2.063278 (place)
Strategy: exploit (exploration probability: 0.117432)
Action: push at (8, 140, 132)
Executing: push at (-0.460000, 0.056000, 0.001002)
Trainer.get_label_value(): Current reward: 0.337089 Future reward: 1.960318 Expected reward: 0.337089 + 0.650000 x 1.960318 = 1.611296
Training loss: 0.004335
Experience replay 16755: history timestep index 775, action: place, surprise value: 0.590137
prev_height: 0.6741789133816113 max_z: 0.6746745066568761 goal_success: False <<<<<<<<<<<
check_stack() stack_height: 0.6746745066568761 stack matches current goal: False partial_stack_success: False Does the code think a reset is needed: False
Push motion successful (no crash, need not move blocks): True
STACK: trial: 630 actions/partial: 14.261811023622048 actions/full stack: 1035.0 (lower is better) Grasp Count: 2838, grasp success rate: 0.715292459478506 place_on_stack_rate: 0.25024630541871923 place_attempts: 2030 partial_stack_successes: 508 stack_succ
esses: 7 trial_success_rate: 0.011111111111111112 stack goal: None
Trainer.get_label_value(): Current reward: 0.781250 Future reward: 2.106640 Expected reward: 0.781250 + 0.650000 x 2.106640 = 2.150566
Training loss: 0.002624
Time elapsed: 5.915041
Trainer iteration: 7245.000000
Training images showing a sequence of actions which lead to a successful stack:
Initial command run, with no situation removal:
export CUDA_VISIBLE_DEVICES="0" && python3 main.py --is_sim --obj_mesh_dir 'objects/toys' --num_obj 10 --push_rewards --experience_replay --explore_rate_decay --place --check_z_height --future_reward_discount 0.65 --transfer_grasp_to_place --load_snapshot --snapshot_file '/home/costar/src/costar_visual_stacking/logs/2019-08-17.20:54:32-train-grasp-place-split-efficientnet-21k-acc-0.80/models/snapshot.reinforcement.pth'
Second resume command run, we started this from the weights of a training run which didn't have situation removal, but this run did have situation removal:
export CUDA_VISIBLE_DEVICES="0" && python3 main.py --is_sim --obj_mesh_dir 'objects/toys' --num_obj 10 --push_rewards --experience_replay --explore_rate_decay --place --check_z_height --future_reward_discount 0.65 --load_snapshot --snapshot_file '/home/costar/src/costar_visual_stacking/logs/2019-09-08.18:13:13/models/snapshot.reinforcement.pth'