Skip to content

Releases: jhu-lcsr/good_robot

MISLABELED - Training - grasp and push with trial reward - EfficientNet-B0 - v0.7.0

15 Sep 18:42
8c3e0ba
Compare
Choose a tag to compare

Note: the information here is from the incorrect folder. This is actually the data from a run with push/grasp/place action. Just ignore this, or feel welcome to plumb the depths of the logs to figure out the correct notes.

Here we did a run of pushing and grasping with trial reward.

export CUDA_VISIBLE_DEVICES="0" && python3 main.py --is_sim --obj_mesh_dir 'objects/toys' --num_obj 10  --push_rewards --experience_replay --explore_rate_decay --trial_reward --future_reward_discount 0.65 --tcp_port 19996

The trial ended because a failure of the simulator to return the numpy array of data at iteration 16k, but the results should be ok.

Training iteration: 16685
Change detected: True (value: 1261)
Primitive confidence scores: 0.967202 (push), 1.646582 (grasp)
Strategy: exploit (exploration probability: 0.100000)
Action: grasp at (7, 83, 107)
Executing: grasp at (-0.510000, -0.058000, 0.051002)
Trainer.get_label_value(): Current reward: 0.500000 Current reward multiplier: 1.000000 Predicted Future reward: 1.596124 Expected reward: 0.500000 + 0.650000 x 1.596124 = 1.537481
Training loss: 0.428985
Experience replay 63188: history timestep index 54, action: push, surprise value: 6.170048
Training loss: 0.071536
gripper position: 0.05317854881286621
gripper position: 0.034931570291519165
gripper position: 0.0285988450050354
Experience replay 63189: history timestep index 565, action: grasp, surprise value: 1.438415
Training loss: 0.018565
Experience replay 63190: history timestep index 12019, action: grasp, surprise value: 0.444365
Training loss: 0.134922
Experience replay 63191: history timestep index 1633, action: grasp, surprise value: 0.568288
Training loss: 0.011571
Experience replay 63192: history timestep index 15569, action: grasp, surprise value: 0.642771
Grasp successful: False
Training loss: 0.465558
Grasp Count: 14151, grasp success rate: 0.8217793795491485
Experience replay 63193: history timestep index 1054, action: push, surprise value: 1.732153
Training loss: 0.016847
Time elapsed: 18.942974
Trainer iteration: 16686.000000

Training iteration: 16686
Change detected: True (value: 134)
Primitive confidence scores: 1.140632 (push), 1.452001 (grasp)
Strategy: exploit (exploration probability: 0.100000)
Action: grasp at (15, 83, 91)
Executing: grasp at (-0.542000, -0.058000, 0.050999)
Trainer.get_label_value(): Current reward: 0.000000 Current reward multiplier: 1.000000 Predicted Future reward: 1.316792 Expected reward: 0.000000 + 0.650000 x 1.316792 = 0.855915
Training loss: 0.009576
Experience replay 63194: history timestep index 5062, action: grasp, surprise value: 0.308253
Training loss: 0.106279
Experience replay 63195: history timestep index 7109, action: grasp, surprise value: 0.205714
Training loss: 0.453820
gripper position: 0.030108928680419922
gripper position: 0.026779592037200928
gripper position: 0.0063852667808532715
Experience replay 63196: history timestep index 1226, action: grasp, surprise value: 1.184422
Training loss: 0.017869
Experience replay 63197: history timestep index 347, action: grasp, surprise value: 0.265588
Training loss: 0.030336
Experience replay 63198: history timestep index 778, action: grasp, surprise value: 1.168766
Training loss: 0.008899
Experience replay 63199: history timestep index 6223, action: push, surprise value: 0.247791
Training loss: 0.817960
gripper position: 0.00013843178749084473
gripper position: 5.0634145736694336e-05
Experience replay 63200: history timestep index 14762, action: grasp, surprise value: 0.546939
Grasp successful: True
Training loss: 0.038645
ERROR: PROBLEM DETECTED IN SCENE, NO CHANGES FOR OVER 20 SECONDS, RESETTING THE OBJECTS TO RECOVER...
Traceback (most recent call last):
  File "main.py", line 1078, in <module>
    parser.add_argument('--test_preset_cases', dest='test_preset_cases', action='store_true', default=False)
  File "main.py", line 831, in main
    trainer.model = trainer.model.cuda()
  File "main.py", line 892, in get_and_save_images
    prev_color_success = nonlocal_variables['grasp_color_success']
  File "/home/ahundt/src/costar_visual_stacking/robot.py", line 420, in get_camera_data
    color_img.shape = (resolution[1], resolution[0], 3)
IndexError: list index out of range

Note: There were bugs in multi-step tasks code at the time this was started, but we are fairly certain they did not affect this run since it was pushing and grasping only.

Training - Transfer grasp to stack - any toy - check_z_height - Situation Removal - EfficientNet-B0 - V0.6

10 Sep 15:08
1911fc2
Compare
Choose a tag to compare

This is Training progress for transfering grasp any toy weights to stack any toy weights with situation removal enabled, where any action that undoes past progress will be given a reward of 0.

This release also now supports continuous z height stacking based on the depth image data projected on to a height map. The highest value after a 5x5 median blur is applied is used as the stacking progress height.

Here is a placement action which failed, but it looks good visually and is a nice example of the training process:
Screenshot from 2019-09-09 20-01-15

Training images of successful stacks:
000777 1 color
000529 1 color
000228 1 color
000463 1 color

Status printout:

                                                                                                                                                                                                                           
Training iteration: 7244                                                                                                              
Change detected: True (value: 2160)                                                                                                                       
Primitive confidence scores: 1.819177 (push), 1.674433 (grasp), 2.063278 (place)                                                                                                                                                                                      
Strategy: exploit (exploration probability: 0.117432)                                                                                 
Action: push at (8, 140, 132)                                                                                                         
Executing: push at (-0.460000, 0.056000, 0.001002)                                                                                    
Trainer.get_label_value(): Current reward: 0.337089 Future reward: 1.960318 Expected reward: 0.337089 + 0.650000 x 1.960318 = 1.611296
Training loss: 0.004335                                                                                                                                   
Experience replay 16755: history timestep index 775, action: place, surprise value: 0.590137                                                                                                                                                                          
prev_height: 0.6741789133816113 max_z: 0.6746745066568761 goal_success: False <<<<<<<<<<<                                                                                                                                                                             
check_stack() stack_height: 0.6746745066568761 stack matches current goal: False partial_stack_success: False Does the code think a reset is needed: False                                                                                                            
Push motion successful (no crash, need not move blocks): True                                                                         
STACK:  trial: 630 actions/partial: 14.261811023622048  actions/full stack: 1035.0 (lower is better)  Grasp Count: 2838, grasp success rate: 0.715292459478506 place_on_stack_rate: 0.25024630541871923 place_attempts: 2030  partial_stack_successes: 508  stack_succ
esses: 7 trial_success_rate: 0.011111111111111112 stack goal: None                                                                                                                                                                                         
Trainer.get_label_value(): Current reward: 0.781250 Future reward: 2.106640 Expected reward: 0.781250 + 0.650000 x 2.106640 = 2.150566                                                                                                                             
Training loss: 0.002624                                                                                                                                                                                                                                               
Time elapsed: 5.915041
Trainer iteration: 7245.000000

Training images showing a sequence of actions which lead to a successful stack:
000223 0 color
000224 0 color
000225 0 color
000226 0 color
000227 0 color
000228 0 color
000228 1 color

Initial command run, with no situation removal:

export CUDA_VISIBLE_DEVICES="0" && python3 main.py --is_sim --obj_mesh_dir 'objects/toys' --num_obj 10  --push_rewards --experience_replay --explore_rate_decay --place --check_z_height --future_reward_discount 0.65 --transfer_grasp_to_place --load_snapshot --snapshot_file '/home/costar/src/costar_visual_stacking/logs/2019-08-17.20:54:32-train-grasp-place-split-efficientnet-21k-acc-0.80/models/snapshot.reinforcement.pth' 

Second resume command run, we started this from the weights of a training run which didn't have situation removal, but this run did have situation removal:


export CUDA_VISIBLE_DEVICES="0" && python3 main.py --is_sim --obj_mesh_dir 'objects/toys' --num_obj 10  --push_rewards --experience_replay --explore_rate_decay --place --check_z_height --future_reward_discount 0.65 --load_snapshot --snapshot_file '/home/costar/src/costar_visual_stacking/logs/2019-09-08.18:13:13/models/snapshot.reinforcement.pth'

Transfer grasp to place blocks -- EfficientNet v0, run 25k, training data v0.5.0

10 Sep 15:09
1911fc2
Compare
Choose a tag to compare

Training with initial grasping weights, transferred to place for block stacking.

Status printout:

Training iteration: 25210
WARNING variable mismatch num_trials + 1: 3202 nonlocal_variables[stack].trial: 3200
Change detected: True (value: 285)
Primitive confidence scores: 2.386978 (push), 2.751169 (grasp), 4.171329 (place)
Action: place at (4, 124, 132)
Executing: place at (-0.460000, 0.024000, 0.154936)
gripper position: 0.0038686394691467285
gripper position: 0.0038346052169799805
Trainer.get_label_value(): Current reward: 1.593750 Future reward: 4.186584 Expected reward: 1.593750 + 0.500000 x 4.186584 = 3.687042
Training loss: 0.136524
Experience replay 28447: history timestep index 8051, action: place, surprise value: 0.571094
current_position: [-0.6120171   0.08751236  0.03504968]
current_obj_z_location: 0.0650496843457222
goal_position: 0.21493569494035264 goal_position_margin: 0.3149356949403527
has_moved: True near_goal: False place_success: False
check_stack() False, not enough nearby objects for a successful stack! expected at least 4 nearby objects, but only counted: 3
check_stack() current detected stack height: 3
check_stack() stack_height: 3 stack matches current goal: False partial_stack_success: False Does the code think a reset is needed: False
STACK:  trial: 3200 actions/partial: 3.4918282548476456  actions/full stack: 17.20887372013652 (lower is better)  Grasp Count: 11830, grasp success rate: 0.8363482671174979 place_on_stack_rate: 0.7297351930462906 place_attempts: 9894  partial_stack_successes: 7220  stack_successes: 1465 trial_success_rate: 0.4578125 stack goal: [3 2 0 1]
Trainer.get_label_value(): Current reward: 2.343750 Future reward: 3.842178 Expected reward: 2.343750 + 0.500000 x 3.842178 = 4.264839
Training loss: 0.030964
Time elapsed: 9.990684
Trainer iteration: 25211.000000

Training iteration: 25211
WARNING variable mismatch num_trials + 1: 3202 nonlocal_variables[stack].trial: 3200
Change detected: True (value: 928)
Primitive confidence scores: 2.417877 (push), 4.037498 (grasp), 2.971273 (place)
Strategy: exploit (exploration probability: 0.100000)
Action: grasp at (10, 156, 58)
Executing: grasp at (-0.608000, 0.088000, 0.050952)
Trainer.get_label_value(): Current reward: 0.000000 Future reward: 3.908672 Expected reward: 0.000000 + 0.500000 x 3.908672 = 1.954336
gripper position: 0.03016480803489685
gripper position: 0.025715917348861694
gripper position: 0.005510151386260986
Training loss: 0.735216
Experience replay 28448: history timestep index 7290, action: push, surprise value: 0.239789
gripper position: 0.003706216812133789
gripper position: 0.0036499202251434326
Grasp successful: True
check_stack() current detected stack height: 3
check_stack() stack_height: 3 stack matches current goal: True partial_stack_success: True Does the code think a reset is needed: False
STACK:  trial: 3200 actions/partial: 3.49196675900277  actions/full stack: 17.209556313993176 (lower is better)  Grasp Count: 11831, grasp success rate: 0.836362099568929 place_on_stack_rate: 0.7297351930462906 place_attempts: 9894  partial_stack_successes: 7220  stack_successes: 1465 trial_success_rate: 0.4578125 stack goal: [3 2 0 1]
Trainer.get_label_value(): Current reward: 0.500000 Future reward: 3.531380 Expected reward: 0.500000 + 0.500000 x 3.531380 = 2.265690
Training loss: 0.024574
Time elapsed: 9.348486
Trainer iteration: 25212.000000

Training iteration: 25212
WARNING variable mismatch num_trials + 1: 3202 nonlocal_variables[stack].trial: 3200
Change detected: True (value: 680)
Primitive confidence scores: 2.646177 (push), 3.183239 (grasp), 4.247540 (place)

Train command:

python3 main.py --is_sim --obj_mesh_dir 'objects/blocks' --num_obj 4  --push_rewards --experience_replay --explore_rate_decay --place --transfer_grasp_to_place --load_snapshot --snapshot_file '/home/ahundt/Downloads/snapshot.reinforcement.pth'

Ablation - no situation removal - Transfer grasp to stack EfficientNet V0.4

07 Sep 17:55
Compare
Choose a tag to compare

These are the first 3.5k actions of training on weight transfer from grasping to stacking.

Command to run:

python3 main.py --is_sim --obj_mesh_dir 'objects/blocks' --num_obj 4  --push_rewards --experience_replay --explore_rate_decay --place --transfer_grasp_to_place --load_snapshot --snapshot_file '/home/ahundt/Downloads/snapshot.reinforcement.pth'

The attached zip 2019-09-06.15.29.13-fist-grasp-to-stack-transfer-run-3k-actions-stack.zip is the TRAINING log data, NOT test data. This is just an interim run, we plan on a more substantial run in the near future.

The second run 2019-09-08.19.19.22-check-z-height-no-decrease-threshold.zip is for stacking any objects with check_z_height() and no resets if the height decreases, also a training run, this doesn't make great progress & spends lots of time with objects spread out. It may make a good ablation comparison against a version with frequent resets.

Push Grasp - Clear Toys Adversarial - Efficientnet-B0 Test Results v0.3.2

04 Sep 16:57
Compare
Choose a tag to compare

Adversarial Pushing Grasping Results v0.3.2

Average % clearance: 99.1                                                                                      
Average % grasp success per clearance: 62.8                                                                    
Average % action efficiency: 50.9                                                                              
Average grasp to push ratio: 90.8    

video:

CoSTAR Visual Stacking v0.2 test run video

status printout:

Testing iteration: 994
Change detected: True (value: 1439)
Primitive confidence scores: 1.498990 (push), 1.806918 (grasp)
Strategy: exploit (exploration probability: 0.000000)
Action: grasp at (15, 100, 123)
Executing: grasp at (-0.478000, -0.024000, 0.051004)
Trainer.get_label_value(): Current reward: 1.000000 Future reward: 1.847297 Expected reward: 1.000000 + 0.500000 x 1.847297 = 1.923648
Training loss: 0.055023
gripper position: 0.030909866094589233
gripper position: 0.02564963698387146
gripper position: 0.0006891787052154541
gripper position: -0.010575711727142334
Grasp successful: False
Grasp Count: 882, grasp success rate: 0.5850340136054422
Time elapsed: 6.893619
Trainer iteration: 995.000000

Testing iteration: 995
Change detected: True (value: 876)
Primitive confidence scores: 1.365109 (push), 2.030846 (grasp)
Strategy: exploit (exploration probability: 0.000000)
Action: grasp at (9, 140, 124)
Executing: grasp at (-0.476000, 0.056000, 0.032713)
Trainer.get_label_value(): Current reward: 0.000000 Future reward: 2.069253 Expected reward: 0.000000 + 0.500000 x 2.069253 = 1.034626
Training loss: 0.076920
gripper position: 0.03028649091720581
gripper position: 0.02625584602355957
gripper position: 0.004431545734405518
gripper position: 0.003403604030609131
gripper position: 0.003234654664993286
gripper position: -0.001414567232131958
Grasp successful: True
Grasp Count: 883, grasp success rate: 0.5855039637599094
Time elapsed: 7.405238
Trainer iteration: 996.000000

Testing iteration: 996
There have not been changes to the objects for for a long time [push, grasp]: [1, 0], or there are not enough objects in view (value: 0)! Repositioning objects.
loading case file: /home/costar/src/costar_visual_stacking/simulation/test-cases/test-10-obj-10.txt

Testing iteration: 996
Change detected: True (value: 3244)
Trainer.get_label_value(): Current reward: 1.000000 Future reward: 1.899974 Expected reward: 1.000000 + 0.500000 x 1.899974 = 1.949987
Trial logging complete: 110 --------------------------------------------------------------
Training loss: 0.001421

000000 0 color
000001 0 color
000002 0 color
000003 0 color
000004 0 color
000005 0 color

000000 0 color
000001 0 color
000002 0 color
000003 0 color
000004 0 color
000005 0 color

000000 0 depth
000001 0 depth
000002 0 depth
000003 0 depth
000004 0 depth
000005 0 depth

000000 grasp
000000 push
000001 grasp
000001 push
000002 grasp
000002 push
000003 grasp
000003 push
000004 grasp
000004 push
000005 grasp
000005 push

Test command:

export CUDA_VISIBLE_DEVICES="0" && python3 main.py --is_sim --obj_mesh_dir 'objects/toys' --num_obj 10  --push_rewards --experience_replay --explore_rate_decay --load_snapshot --snapshot_file '/home/costar/src/costar_visual_stacking/logs/2019-08-17.20:54:32-train-grasp-place-split-efficientnet-21k-acc-0.80/models/snapshot.reinforcement.pth' --random_seed 1238 --is_testing --save_visualizations --test_preset_cases --max_test_trials 10

Evaluate command:

python3 evaluate.py --session_directory '/home/costar/src/costar_visual_stacking/logs/2019-08-17.20:54:32-train-grasp-place-split-efficientnet-21k-acc-0.80-TEST-ADVERSARIAL-PRESET-2019-09-04.10:29:04'  --method reinforcement --preset --preset_num_trials 10 > /home/costar/src/costar_visual_stacking/logs/2019-08-17.20:54:32-train-grasp-place-split-efficientnet-21k-acc-0.80-TEST-ADVERSARIAL-PRESET-2019-09-04.10:29:04/grasp_push_evaluation_clearance_results.txt

(Errors in data, use images/vid only) Push Grasp - Clear Toys Adversarial - Efficientnet-B0 Test Results v0.3.1

03 Sep 21:26
Compare
Choose a tag to compare

Please note: ERRORS IN COUNTING CODE, DO NOT USE THE COUNT INFO. VIDEOS AND IMAGES ARE OK

Push Grasp - Clear Toys Adversarial - 10 trials per scenario, 11 scenarios (110 total trials) - test preset cases

Testing iteration: 508
Change detected: True (value: 1129)
Primitive confidence scores: 1.476669 (push), 1.880263 (grasp)
Strategy: exploit (exploration probability: 0.000000)
Action: grasp at (5, 34, 140)
Executing: grasp at (-0.444000, -0.156000, 0.045688)
Trainer.get_label_value(): Current reward: 1.000000 Future reward: 1.887018 Expected reward: 1.000000 + 0.500000 x 1.887018 = 1.943509
Training loss: 0.030470
gripper position: 0.031651049852371216
gripper position: 0.02618650160729885
gripper position: 0.0022711530327796936
gripper position: 0.0028414130210876465
Grasp successful: True
Grasp Count: 445, grasp success rate: 0.5730337078651685
Time elapsed: 7.026935
Trainer iteration: 509.000000

Testing iteration: 509
Change detected: True (value: 330)
Primitive confidence scores: 1.593701 (push), 1.954336 (grasp)
Strategy: exploit (exploration probability: 0.000000)
Action: grasp at (11, 34, 108)
Executing: grasp at (-0.508000, -0.156000, 0.021827)
Trainer.get_label_value(): Current reward: 1.000000 Future reward: 1.923524 Expected reward: 1.000000 + 0.500000 x 1.923524 = 1.961762
Training loss: 0.013612
gripper position: 0.039991289377212524
gripper position: 0.027741700410842896
gripper position: 0.005253970623016357
gripper position: 0.002297341823577881
gripper position: 0.002274245023727417
gripper position: 0.002187401056289673
gripper position: 0.0003167688846588135
Grasp successful: True
Grasp Count: 446, grasp success rate: 0.5739910313901345
Time elapsed: 6.916038
Trainer iteration: 510.000000

Testing iteration: 510
There have not been changes to the objects for for a long time [push, grasp]: [0, 0], or there are not enough objects in view (value: 0)! Repositioning objects.
loading case file: /home/costar/src/costar_visual_stacking/simulation/test-cases/test-10-obj-05.txt


Test command:

export CUDA_VISIBLE_DEVICES="0" && python3 main.py --is_sim --obj_mesh_dir 'objects/toys' --num_obj 10  --push_rewards --experience_replay --explore_rate_decay --load_snapshot --snapshot_file '/home/costar/src/costar_visual_stacking/logs/2019-08-17.20:54:32-train-grasp-place-split-efficientnet-21k-acc-0.80/models/snapshot.reinforcement.pth' --random_seed 1238 --is_testing --save_visualizations --test_preset_cases --max_test_trials 10

Video:

CoSTAR Visual Stacking v0.2 test run video

action 357:
000357 0 color

000357 0 color
grasp viz:
000357 grasp
push viz:
000357 push

action 360:

000360 0 color
000360 0 color
grasp viz:
000360 grasp
push viz:
000360 push

Images of various preset scenarios:
000000 0 color
000030 0 color
000038 0 color

000060 0 color
000066 0 color
000140 0 color

000142 0 color

000143 0 color

000205 0 color
000245 0 color
000312 0 color

000394 0 color

000394 0 color

000429 0 color
000465 0 color

Any Stack Efficientnet-B0 Test Results v0.2.1

03 Sep 18:29
Compare
Choose a tag to compare

This is an updated testing evaluation run with the same weights as https://github.com/jhu-lcsr/costar_visual_stacking/releases/tag/stack_any_color_test_v0.2 .

key bugfix:

  • stack successes were previously counted as 1 success and 1 failure in certain situations.
  • note that the final printout below actually has the trial count incremented slightly too high, there have been 100 trials, but it says 101 (because that is the current trial number if it were to keep running).

Testing iteration: 692
Change detected: True (value: 1143)
Primitive confidence scores: 2.290141 (push), 3.529900 (grasp), 3.480633 (place)
Strategy: exploit (exploration probability: 0.000000)
Action: grasp at (15, 132, 50)
Executing: grasp at (-0.624000, 0.040000, 0.050975)
Trainer.get_label_value(): Current reward: 2.343750 Future reward: 3.546631 Expected reward: 2.343750 + 0.500000 x 3.546631 = 4.117065
Training loss: 0.018018
gripper position: 0.029527246952056885
gripper position: 0.02606511116027832
gripper position: 0.004493534564971924
gripper position: 0.0033941268920898438
Grasp successful: True
check_stack() current detected stack height: 3
check_stack() stack_height: 3 stack matches current goal: True partial_stack_success: True Does the code think a reset is needed: False
STACK:  trial: 100 actions/partial: 2.6150943396226416  actions/full stack: 9.493150684931507 (lower is better)  Grasp Count: 346, grasp success rate: 0.9277456647398844 place_on_stack_rate: 0.828125 place_attempts: 320  partial_stack_successes: 265  stack_successes: 73 trial_success_rate: 0.73 stack goal: [3 1 0 2]
Time elapsed: 6.047638
Trainer iteration: 693.000000

Testing iteration: 693
Change detected: True (value: 742)
Primitive confidence scores: 2.079691 (push), 2.910911 (grasp), 4.339453 (place)
Action: place at (7, 50, 91)
Executing: place at (-0.542000, -0.124000, 0.154757)
gripper position: 0.003488779067993164
gripper position: 0.00347745418548584
Trainer.get_label_value(): Current reward: 1.593750 Future reward: 4.317965 Expected reward: 1.593750 + 0.500000 x 4.317965 = 3.752733
gripper position: 0.0034421682357788086
Training loss: 0.034908
current_position: [-0.53655535 -0.12567917  0.18165806]
current_obj_z_location: 0.2116580593585968
goal_position: 0.21475671135502566 goal_position_margin: 0.3147567113550257
has_moved: True near_goal: True place_success: True
check_stack() current detected stack height: 4
check_stack() stack_height: 4 stack matches current goal: True partial_stack_success: True Does the code think a reset is needed: False
STACK:  trial: 101 actions/partial: 2.6090225563909772  actions/full stack: 9.378378378378379 (lower is better)  Grasp Count: 346, grasp success rate: 0.9277456647398844 place_on_stack_rate: 0.8286604361370716 place_attempts: 321  partial_stack_successes: 266  stack_successes: 74 trial_success_rate: 0.7326732673267327 stack goal: [3 1 0 2]
Time elapsed: 11.902205
Trainer iteration: 694.000000

Testing iteration: 694
Change detected: True (value: 2809)
Trainer.get_label_value(): Current reward: 3.125000 Future reward: 2.442585 Expected reward: 3.125000 + 0.500000 x 2.442585 = 4.346292
Trial logging complete: 100 --------------------------------------------------------------
Training loss: 0.006022

Command for test run:

export CUDA_VISIBLE_DEVICES="0" && python3 main.py --is_sim --obj_mesh_dir 'objects/blocks' --num_obj 4  --push_rewards --experience_replay --explore_rate_decay --place --load_snapshot --snapshot_file '/media/costar/f5f1f858-3666-4832-beea-b743127f1030/costar_visual_stacking_logs/logs/2019-08-19.16:07:34-any-stack-v2-steps-37k/models/snapshot.reinforcement-best-stack-rate.pth' --random_seed 1238 --is_testing --save_visualizations

Video:

CoSTAR Visual Stacking v0.2 test run video

Images from first stack attempt (successful):
000000 0 color
000001 0 color
000002 0 color
000003 0 color
000004 0 color
000005 0 color
000005 1 color

000000 grasp
000000 place
000000 push
000001 grasp
000001 place
000001 push
000002 grasp
000002 place
000002 push
000003 grasp
000003 place
000003 push
000004 grasp
000004 place
000004 push
000005 grasp
000005 place
000005 push

000000 0 color
000001 0 color
000002 0 color
000003 0 color
000004 0 color
000005 0 color
000005 1 color

000000 0 depth
000001 0 depth
000002 0 depth
000003 0 depth
000004 0 depth
000005 0 depth
000005 1 depth

Ablation of Any Stack WITHOUT HEIGHT REWARD Efficientnet-B0 Test Results v0.2

30 Aug 16:14
Compare
Choose a tag to compare

Stacking ablation study results, same as main stacking algorithm but stack height reward multiplier is removed.

Testing iteration: 853
Change detected: False (value: 31)
Primitive confidence scores: 1.093426 (push), 1.144373 (grasp), 1.133123 (place)
Strategy: exploit (exploration probability: 0.000000)
Action: grasp at (11, 214, 206)
Executing: grasp at (-0.312000, 0.204000, -0.000100)
Trainer.get_label_value(): Current reward: 0.000000 Future reward: 1.205803 Expected reward: 0.000000 + 0.500000 x 1.205803 = 0.602901
Training loss: 0.006264
gripper position: 0.029546404257416725
gripper position: 0.025838494300842285
gripper position: 0.0007359087467193604
gripper position: -0.023179635405540466
gripper position: -0.04147481918334961
Grasp successful: False
check_stack() False, not enough nearby objects for a successful stack! expected at least 3 nearby objects, but only counted: 1
check_stack() current detected stack height: 1
check_stack() stack_height: 1 stack matches current goal: False partial_stack_success: False Does the code think a reset is needed: True
main.py check_stack() DETECTED A MISMATCH between the goal height: 3 and current workspace stack height: 1, RESETTING the objects, goals, and action success to FALSE...STACK:  trial: 101 actions/partial: 6.46969696969697  actions/full stack: 170.8 (lower is better)  Grasp Count: 353, grasp success rate: 0.8385269121813032 place_on_stack_rate: 0.44594594594594594 place_attempts: 296  partial_stack_successes: 132  stack_successes: 5 trial_success_rate: 0.04950495049504951 stack goal: [2 0 3 1]
Time elapsed: 10.143962
Trainer iteration: 854.000000

Testing iteration: 854
Change detected: True (value: 2846)
Trainer.get_label_value(): Current reward: 0.000000 Future reward: 1.463371 Expected reward: 0.000000 + 0.500000 x 1.463371 = 0.731685

Notice that this ablation model cannot differentiate between placing on top of varied-height stacks:

000008 grasp
000008 place
000008 push
000009 grasp
000009 place
000009 push
000010 grasp
000010 place
000010 push
000008 0 color
000009 0 color
000010 0 color
000008 0 color
000009 0 color
000010 0 color

Command for training:

export CUDA_VISIBLE_DEVICES="0" && python3 main.py --is_sim --obj_mesh_dir 'objects/blocks' --num_obj 4  --push_rewards --experience_replay --explore_rate_decay --place --no_height_reward

command for testing:

export CUDA_VISIBLE_DEVICES="0" && python3 main.py --is_sim --obj_mesh_dir 'objects/blocks' --num_obj 4  --push_rewards --experience_replay --explore_rate_decay --place --load_snapshot --snapshot_file '/home/costar/src/costar_visual_stacking/logs/2019-08-27.17:48:02-no-height-multiplier-35k/models/snapshot.reinforcement-best-stack-rate.pth' --random_seed 1238 --is_testing --save_visualizations

Push Grasp Efficientnet-B0 Test Results v0.3

27 Aug 21:22
Compare
Choose a tag to compare

Grasping Results, release v0.3.

Testing iteration: 1223
Change detected: False (value: 1)
Trainer.get_label_value(): Current reward: 0.000000 Future reward: 0.000000 Expected reward: 0.000000 + 0.500000 x 0.000000 = 0.000000
Primitive confidence scores: 1.462715 (push), 2.052955 (grasp)
Strategy: exploit (exploration probability: 0.000000)
Action: grasp at (13, 67, 148)
Training loss: 0.795837
Executing: grasp at (-0.428000, -0.090000, 0.001002)
gripper position: 0.03202284872531891
gripper position: 0.026405101642012596
gripper position: 0.0013546496629714966
gripper position: -0.021843165159225464
gripper position: -0.021720416843891144
gripper position: -0.021846182644367218
Grasp successful: True
Grasp Count: 1138, grasp success rate: 0.8717047451669596
Time elapsed: 6.122154
Trainer iteration: 1224.000000

000040 grasp
000040 push
000040 0 color
000040 0 color heightmap
000040 0 depth heightmap

Train Command:

export CUDA_VISIBLE_DEVICES="0" && python3 main.py --is_sim --obj_mesh_dir 'objects/toys' --num_obj 10  --push_rewards --experience_replay --explore_rate_decay

Test Command:

 export CUDA_VISIBLE_DEVICES="0" && python3 main.py --is_sim --obj_mesh_dir 'objects/toys' --num_obj 10  --push_rewards --experience_replay --explore_rate_decay --load_snapshot --snapshot_file '/home/costar/src/costar_visual_stacking/logs/2019-08-17.20:54:32-train-grasp-place-split-efficientnet-21k-acc-0.80/models/snapshot.reinforcement.pth' --random_seed 1238 --is_testing --save_visualizations

Any Stack Efficientnet-B0 Test Results v0.2

22 Aug 22:10
Compare
Choose a tag to compare

Results V0.2 of CoSTAR Visual Stacking.
This test run is configured so that any stack is considered a success.
Logged data and the model at the end is attached.
Click on the image below for the complete test run video:

CoSTAR Visual Stacking v0.2 test run video
V0.2 First stacking simulation only numerical summary test results are as follows:

Testing iteration: 388
Change detected: True (value: 864)
Primitive confidence scores: 2.407725 (push), 3.081561 (grasp), 4.667992 (place)
Action: place at (11, 91, 115)
Executing: place at (-0.494000, -0.042000, 0.154922)
Trainer.get_label_value(): Current reward: 1.593750 Future reward: 4.841486 Expected reward: 1.593750 + 0.500000 x 4.841486 = 4.014493
gripper position: 0.0032510757446289062
gripper position: 0.0032268762588500977
Training loss: 0.028107
current_position: [-0.5070473  -0.04663853  0.18176569]
current_obj_z_location: 0.21176569044589996
goal_position: 0.21492192623705114 goal_position_margin: 0.3149219262370512
has_moved: True near_goal: True place_success: True
check_stack() current detected stack height: 4
check_stack() stack_height: 4 stack matches current goal: True partial_stack_success: True Does the code think a reset is needed: False
STACK:  trial: 102 actions/partial: 2.576158940397351  actions/full stack: 8.644444444444444 (lower is better)  Grasp Count: 201, grasp success rate: 0.9154228855721394 place_on_stack_rate: 0.8206521739130435 place_attempts: 184  partial_stack_successes: 151  stack_successes: 45 trial_success_rate: 0.4411764705882353 stack goal: [2 3 0 1]
Time elapsed: 11.492945
Trainer iteration: 389.000000

Below is an example visualization of the Q function for each action in the order grasp, place, push, the final action actually taken was place:

000029 grasp
000029 place
000029 push

Command line command to collect this data:

export CUDA_VISIBLE_DEVICES="0" && python3 main.py --is_sim --obj_mesh_dir 'objects/blocks' --num_obj 4  --push_rewards --experience_replay --explore_rate_decay --place --load_snapshot --snapshot_file '/media/costar/f5f1f858-3666-4832-beea-b743127f1030/costar_visual_stacking_logs/logs/2019-08-19.16:07:34-any-stack-v2-steps-37k/models/snapshot.reinforcement-best-stack-rate.pth' --random_seed 1238 --is_testing --save_visualizations