Release Training + Testing - grasp and push with trial reward - EfficientNet-B0

export CUDA_VISIBLE_DEVICES="0" && python3 main.py --is_sim --obj_mesh_dir 'objects/toys' --num_obj 10  --push_rewards --experience_replay --explore_rate_decay --trial_reward --future_reward_discount 0.65 --tcp_port 19996

The trial ended because a failure of the simulator to return the numpy array of data at iteration 16k, but the results are quite good for the low number of iterations.

Training iteration: 16685
Change detected: True (value: 1261)
Primitive confidence scores: 0.967202 (push), 1.646582 (grasp)
Strategy: exploit (exploration probability: 0.100000)
Action: grasp at (7, 83, 107)
Executing: grasp at (-0.510000, -0.058000, 0.051002)
Trainer.get_label_value(): Current reward: 0.500000 Current reward multiplier: 1.000000 Predicted Future reward: 1.596124 Expected reward: 0.500000 + 0.650000 x 1.596124 = 1.537481
Training loss: 0.428985
Experience replay 63188: history timestep index 54, action: push, surprise value: 6.170048
Training loss: 0.071536
gripper position: 0.05317854881286621
gripper position: 0.034931570291519165
gripper position: 0.0285988450050354
Experience replay 63189: history timestep index 565, action: grasp, surprise value: 1.438415
Training loss: 0.018565
Experience replay 63190: history timestep index 12019, action: grasp, surprise value: 0.444365
Training loss: 0.134922
Experience replay 63191: history timestep index 1633, action: grasp, surprise value: 0.568288
Training loss: 0.011571
Experience replay 63192: history timestep index 15569, action: grasp, surprise value: 0.642771
Grasp successful: False
Training loss: 0.465558
Grasp Count: 14151, grasp success rate: 0.8217793795491485
Experience replay 63193: history timestep index 1054, action: push, surprise value: 1.732153
Training loss: 0.016847
Time elapsed: 18.942974
Trainer iteration: 16686.000000

Training iteration: 16686
Change detected: True (value: 134)
Primitive confidence scores: 1.140632 (push), 1.452001 (grasp)
Strategy: exploit (exploration probability: 0.100000)
Action: grasp at (15, 83, 91)
Executing: grasp at (-0.542000, -0.058000, 0.050999)
Trainer.get_label_value(): Current reward: 0.000000 Current reward multiplier: 1.000000 Predicted Future reward: 1.316792 Expected reward: 0.000000 + 0.650000 x 1.316792 = 0.855915
Training loss: 0.009576
Experience replay 63194: history timestep index 5062, action: grasp, surprise value: 0.308253
Training loss: 0.106279
Experience replay 63195: history timestep index 7109, action: grasp, surprise value: 0.205714
Training loss: 0.453820
gripper position: 0.030108928680419922
gripper position: 0.026779592037200928
gripper position: 0.0063852667808532715
Experience replay 63196: history timestep index 1226, action: grasp, surprise value: 1.184422
Training loss: 0.017869
Experience replay 63197: history timestep index 347, action: grasp, surprise value: 0.265588
Training loss: 0.030336
Experience replay 63198: history timestep index 778, action: grasp, surprise value: 1.168766
Training loss: 0.008899
Experience replay 63199: history timestep index 6223, action: push, surprise value: 0.247791
Training loss: 0.817960
gripper position: 0.00013843178749084473
gripper position: 5.0634145736694336e-05
Experience replay 63200: history timestep index 14762, action: grasp, surprise value: 0.546939
Grasp successful: True
Training loss: 0.038645
ERROR: PROBLEM DETECTED IN SCENE, NO CHANGES FOR OVER 20 SECONDS, RESETTING THE OBJECTS TO RECOVER...
Traceback (most recent call last):
  File "main.py", line 1078, in <module>
    parser.add_argument('--test_preset_cases', dest='test_preset_cases', action='store_true', default=False)
  File "main.py", line 831, in main
    trainer.model = trainer.model.cuda()
  File "main.py", line 892, in get_and_save_images
    prev_color_success = nonlocal_variables['grasp_color_success']
  File "/home/ahundt/src/costar_visual_stacking/robot.py", line 420, in get_camera_data
    color_img.shape = (resolution[1], resolution[0], 3)
IndexError: list index out of range

Note: There were bugs in multi-step tasks code at the time this was started, but we are fairly certain they did not affect this run since it was pushing and grasping only.

test command and log dir:

± export CUDA_VISIBLE_DEVICES="0" && python3 main.py --is_sim --obj_mesh_dir 'objects/toys' --num_obj 10  --push_rewards --experience_replay --explore_rate_decay --trial_reward --future_reward_discount 0.65 --tcp_port 19996 --is_testing --random_seed 1238 --load_snapshot --snapshot_file '/home/ahundt/src/costar_visual_stacking/logs/2019-09-12.18:21:37-push-grasp-16k-trial-reward/models/snapshot.reinforcement.pth'
Connected to simulation.
CUDA detected. Running with GPU acceleration.
Loaded pretrained weights for efficientnet-b0
Loaded pretrained weights for efficientnet-b0
DILATED EfficientNet models created, num_dilation: 1
/home/ahundt/.local/lib/python3.5/site-packages/torch/nn/_reduction.py:46: UserWarning: size_average and reduce args will be deprecated, please use reduction='none' instead.
  warnings.warn(warning.format(ret))
Pre-trained model snapshot loaded from: /home/ahundt/src/costar_visual_stacking/logs/2019-09-12.18:21:37-push-grasp-16k-trial-reward/models/snapshot.reinforcement.pth

Adversarial test:

Average % clearance: 92.7
Average % grasp success per clearance: 79.2
Average % action efficiency: 54.6
Average grasp to push ratio: 77.0
ahundt@femur|~/src/costar_visual_stacking on trial_reward!?
± python3 evaluate.py --session_directory /home/ahundt/src/costar_visual_stacking/logs/2019-09-16.02:11:25  --method reinforcement --num_obj_complete 6 --preset

Random test:

Testing iteration: 1160
Change detected: True (value: 679)
Primitive confidence scores: 1.414877 (push), 2.010269 (grasp)
Strategy: exploit (exploration probability: 0.000000)
Action: grasp at (13, 75, 157)
Executing: grasp at (-0.410000, -0.074000, 0.037545)
Trainer.get_label_value(): Current reward: 0.000000 Current reward multiplier: 1.000000 Predicted Future reward: 1.946442 Expected reward: 0.000000 + 0.650000 x 1.946442 = 1.265187
Training loss: 0.018206
gripper position: 0.030663982033729553
gripper position: 0.026504114270210266
gripper position: 0.003155328333377838
gripper position: 0.0009501874446868896
Grasp successful: True
Grasp Count: 1098, grasp success rate: 0.8697632058287796
Time elapsed: 16.090325
Trainer iteration: 1161.000000

Testing iteration: 1161
There have not been changes to the objects for for a long time [push, grasp]: [0, 0], or there are not enough objects in view (value: 0)! Repositioning objects.

Testing iteration: 1161
Change detected: True (value: 6463)
Trainer.get_label_value(): Current reward: 1.000000 Current reward multiplier: 1.000000 Predicted Future reward: 2.221963 Expected reward: 1.000000 + 0.650000 x 2.221963 = 2.444276
Trial logging complete: 100 --------------------------------------------------------------
Training loss: 0.225644
ahundt@femur|~/src/costar_visual_stacking on trial_reward!?
± export CUDA_VISIBLE_DEVICES="0" && python3 main.py --is_sim --obj_mesh_dir 'objects/toys' --num_obj 10  --push_rewards --experience_replay --explore_rate_decay --trial_reward --future_reward_discount 0.65 --tcp_port 19996 --is_testing --random_seed 1238 --load_snapshot --snapshot_file '/home/ahundt/src/costar_visual_stacking/logs/2019-09-12.18:21:37-push-grasp-16k-trial-reward/models/snapshot.reinforcement.pth' --max_test_trials 10 --test_preset_cases

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training + Testing - grasp and push with trial reward - EfficientNet-B0 - v0.7.1