Port from Stable Baselines 2 to 3 (test code only) #40

AdamGleave · 2020-10-06T16:31:23Z

We have a stub to do some sanity checking via RL training of our environments, using Stable Baselines 2 PPO. Move to Stable Baselines 3 so we can drop the TensorFlow dependency.

codecov · 2020-10-06T16:36:08Z

Codecov Report

Merging #40 into master will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##            master       #40   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           24        24           
  Lines          684       683    -1     
=========================================
- Hits           684       683    -1

Impacted Files	Coverage Δ
tests/test_mujoco_rl.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0e21a77...397798d. Read the comment docs.

shwang

LGTM

AdamGleave · 2020-10-07T10:56:40Z

The tests are skipped by default, so I ran them on my machine, and half of them failed.
test_failure.log

I'll try running them before the change to see if they're still flaky. I'm wondering if we should just delete these tests?

AdamGleave · 2020-10-07T12:17:13Z

Three failures on SB2 as well. HalfCheetah and Humanoid failed in both runs. Hopper failed in SB3, Walker2D failed in SB2. But these may well vary by seed.

pytest_sb2.log

Maybe this suggests something broken with the environments? Don't have bandwidth to investigate further. I suspect combination of varying across seed, and mean reward genuinely being lower in seals vs Gym because fixed-horizon environments are less sample efficient for RL, and we're only training for 200k timesteps.

shwang · 2020-10-07T22:51:11Z

Interesting to note that Humanoid and HalfCheetah are doing much worse on the SB2 run (large negative return) than on the SB3 run (positive return).

shwang · 2020-10-07T22:52:05Z

I vaguely remember that when we introduced the Mujoco envs we decided to skip this tests because they never passed consistently. Here's the relevant comment thread:

#6 (comment)

shwang · 2020-10-07T23:02:50Z

My vote is to leave the tests in (continuing to skip them on CI), maybe leaving a TODO, and then inspect them more throughly when someone has the bandwidth to investigate what's going on with multiple seeds.

If we suddenly have a ton of extra time on our hands then this seems like a good place to try tracking average performance via airspeed velocity.

AdamGleave · 2020-10-08T10:48:35Z

Airspeed Velocity seems a much better fit for this in that we care about changes over time rather than having a particular pass threshold in mind. I'm OK leaving the tests in with a TODO.

…-sb3

AdamGleave added 2 commits October 6, 2020 17:27

SB2->SB3 (only used in tests)

c45a634

Merge branch 'master' into use-sb3

397798d

AdamGleave requested review from qxcv and shwang October 6, 2020 16:43

shwang approved these changes Oct 7, 2020

View reviewed changes

AdamGleave added 2 commits October 8, 2020 11:50

[skip ci] Add caveat comment

d3b81d4

Merge branch 'use-sb3' of github.com:HumanCompatibleAI/seals into use…

b070477

…-sb3

AdamGleave merged commit ef5046b into master Oct 8, 2020

AdamGleave deleted the use-sb3 branch October 8, 2020 10:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Port from Stable Baselines 2 to 3 (test code only) #40

Port from Stable Baselines 2 to 3 (test code only) #40

AdamGleave commented Oct 6, 2020

codecov bot commented Oct 6, 2020 •

edited

Loading

shwang left a comment

AdamGleave commented Oct 7, 2020

AdamGleave commented Oct 7, 2020

shwang commented Oct 7, 2020

shwang commented Oct 7, 2020

shwang commented Oct 7, 2020 •

edited

Loading

AdamGleave commented Oct 8, 2020

Port from Stable Baselines 2 to 3 (test code only) #40

Port from Stable Baselines 2 to 3 (test code only) #40

Conversation

AdamGleave commented Oct 6, 2020

codecov bot commented Oct 6, 2020 • edited Loading

Codecov Report

shwang left a comment

Choose a reason for hiding this comment

AdamGleave commented Oct 7, 2020

AdamGleave commented Oct 7, 2020

shwang commented Oct 7, 2020

shwang commented Oct 7, 2020

shwang commented Oct 7, 2020 • edited Loading

AdamGleave commented Oct 8, 2020

codecov bot commented Oct 6, 2020 •

edited

Loading

shwang commented Oct 7, 2020 •

edited

Loading