-
Notifications
You must be signed in to change notification settings - Fork 857
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generalize rewards manager #221
Conversation
Thanks for this MR. Will review this in the following days and add feedback. Probably also want to do this for termination manager. The main blocking question is that with this, do we expect that the |
The way I implemented this at the moment, is that it will return a single Tensor if there are no reward groups and a dict of tensors when there are multiple. I'm not sure how much downstream complexity this would involve for the multi-critic case in learning frameworks other than Generally, to me it is always a bit confusing when "major" information for learning is put into the "extras" dictionary. Then it suddenly becomes a core thing, depending on the situation. |
3e7a470
to
cfcabba
Compare
@Fe1777 Thank you for the updated PR description. We've been using this internally since then and it works fine for us. |
Superseded by #729 |
Description
Generalizes the reward manager to allow for multiple "reward groups" made of multiple terms instead of a single one.
Is backwards compatible with the previous setup, if just reward terms are added to the config.
Fixes #220
Type of change
Checklist
pre-commit
checks with./orbit.sh --format
./orbit.sh --test
and they passconfig/extension.toml
fileCONTRIBUTORS.md
or my name already exists there