Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port from Stable Baselines 2 to 3 (test code only) #40

Merged
merged 4 commits into from
Oct 8, 2020
Merged

Conversation

AdamGleave
Copy link
Member

We have a stub to do some sanity checking via RL training of our environments, using Stable Baselines 2 PPO. Move to Stable Baselines 3 so we can drop the TensorFlow dependency.

@codecov
Copy link

codecov bot commented Oct 6, 2020

Codecov Report

Merging #40 into master will not change coverage.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff            @@
##            master       #40   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           24        24           
  Lines          684       683    -1     
=========================================
- Hits           684       683    -1     
Impacted Files Coverage Δ
tests/test_mujoco_rl.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0e21a77...397798d. Read the comment docs.

@AdamGleave AdamGleave requested review from qxcv and shwang October 6, 2020 16:43
Copy link
Member

@shwang shwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@AdamGleave
Copy link
Member Author

The tests are skipped by default, so I ran them on my machine, and half of them failed.
test_failure.log

I'll try running them before the change to see if they're still flaky. I'm wondering if we should just delete these tests?

@AdamGleave
Copy link
Member Author

Three failures on SB2 as well. HalfCheetah and Humanoid failed in both runs. Hopper failed in SB3, Walker2D failed in SB2. But these may well vary by seed.

pytest_sb2.log

Maybe this suggests something broken with the environments? Don't have bandwidth to investigate further. I suspect combination of varying across seed, and mean reward genuinely being lower in seals vs Gym because fixed-horizon environments are less sample efficient for RL, and we're only training for 200k timesteps.

@shwang
Copy link
Member

shwang commented Oct 7, 2020

Interesting to note that Humanoid and HalfCheetah are doing much worse on the SB2 run (large negative return) than on the SB3 run (positive return).

@shwang
Copy link
Member

shwang commented Oct 7, 2020

I vaguely remember that when we introduced the Mujoco envs we decided to skip this tests because they never passed consistently. Here's the relevant comment thread:

#6 (comment)

@shwang
Copy link
Member

shwang commented Oct 7, 2020

My vote is to leave the tests in (continuing to skip them on CI), maybe leaving a TODO, and then inspect them more throughly when someone has the bandwidth to investigate what's going on with multiple seeds.

If we suddenly have a ton of extra time on our hands then this seems like a good place to try tracking average performance via airspeed velocity.

@AdamGleave
Copy link
Member Author

Airspeed Velocity seems a much better fit for this in that we care about changes over time rather than having a particular pass threshold in mind. I'm OK leaving the tests in with a TODO.

@AdamGleave AdamGleave merged commit ef5046b into master Oct 8, 2020
@AdamGleave AdamGleave deleted the use-sb3 branch October 8, 2020 10:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants