-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for MultiBinary / MultiDiscrete spaces #13
Conversation
Even though it is a draft for now, please keep the full PR template ;) |
You can push on the same branch afterward to update the PR |
@araffin are the namings and overall code design ok? |
Not sure to have the time today to review... and Gitlab CI does work for forks yet, so I created a branch to check the status: https://github.com/DLR-RM/stable-baselines3/tree/rolandgvc/master |
some errors in the pipeline: https://gitlab.com/araffin/stable-baselines3/pipelines/144486506 |
Fixed the List problem. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this is good work =)
Some minor things (double checking the shapes returned)
Missing:
- running tests with the current algorithms (you can find test environments for that in
identity_env.py
) - updating the changelog
PS: I recommend you to run the tests locally (at least the type check which is fast to run)
Before I forget, you should also update |
Thanks for the input 👍 Will have everything done in the next couple of days |
Shouldn't
Like here:
|
yes, I think this choice was made for simplicity (at the end we use |
@rolandgvc you should pull, I updated your PR to include automated tests ;) |
I'm having the following error when trying to train SAC with the multidiscrete identity env:
Any hints where I could look? |
Or maybe I am misunderstanding how multidiscrete observations work? |
Be careful, I think you need to change the action space when using SAC with this env (because SAC and TD3 only support continuous actions, aka Box space) |
And for observation space [3,3], I'm getting:
|
How do I change the action space? |
@araffin have a look |
I will try to do a full review today, but no promise. Looking at the test logs, there are some worrying warnings "test with pytest":
|
yes, I was working on that now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, that's a good work =)
Some minor comments/improvements here and there ;)
Once you addressed them, I will help you to write tests for SAC/TD3
Ready when you are 👍 |
I will try to take a look today or tomorrow ;) |
@rolandgvc could you give me access to your fork? I could not push some changes... See https://github.com/DLR-RM/stable-baselines3/tree/pull_13 |
I meant write access to your fork, so I can push my changes. (normally, you should have ticked "allow edits from maintainers" when creating this PR). |
Update doc and tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you very much for the good work =)
Description
MultiDiscrete
andMultiBinary
observation / action spaces for PPO and A2CMultiCategorical
andBernoulli
distributionsMultiCategorical
andBernoulli
distributions and actions spacesMotivation and Context
closes #5
closes #4
Types of changes
Checklist:
make lint
pytest
andpytype
both pass.