Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Can PPO support graph style spaces? #1280

Open
1 task done
BlueBug12 opened this issue Jan 16, 2023 · 4 comments
Open
1 task done

[Feature Request] Can PPO support graph style spaces? #1280

BlueBug12 opened this issue Jan 16, 2023 · 4 comments
Labels
enhancement New feature or request

Comments

@BlueBug12
Copy link

🚀 Feature

Support graph style data structure as the observation and action spaces for RL algorithms like PPO or others.

Motivation

After version 0.25.0, gym has support graph style observation or action spaces. Some remarkable works like A graph placement methodology for fast chip design has proved that using PPO combined with GNN feature extractor can reach an excellent result. Since GNN has become a common neural network architecture, it should be supported for the environment spaces.

Pitch

No response

Alternatives

No response

Additional context

No response

Checklist

  • I have checked that there is no similar issue in the repo
@BlueBug12 BlueBug12 added the enhancement New feature or request label Jan 16, 2023
@BlueBug12
Copy link
Author

Thanks for your information, it's very helpful to me. I also found another repo https://github.com/YinqiangZhang/custom_stable_baselines that is very close to what I need, so I may directly use it.

@aabbas90
Copy link

aabbas90 commented Feb 3, 2023

@araffin:
Carrying on discussion about graph issue here:

One big hurdle IMO can be removed by allowing action and value modules directly output the actions, values respectively instead of only outputting the embeddings. Thus removing the need for 'extra linear layer' mentioned in docs which might not be the right thing to do in this usecase.

@aabbas90
Copy link

aabbas90 commented Feb 4, 2023

My approach to tackle:

b. On reading a new graph from disk I need to recreate/change the environment but train the same agent. Some example in this direction would be good.

Would it be a good idea to create batch_size many environment objects but randomly load a graph from disk whenever env.reset() is called thus changing the environment parameters. Does it seem to be a good solution to train on large dataset where each instance is a separate environment? Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants