You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Current behavior: r = zero even if no rewards were produced
"{"reward":0.0}"
wanted:
"{"reward":[]} # no rewards were produced
"{"reward":[0, 3.4, 0, 5]} # rewards in the order in which they were generated
or:
{1: [], 2: [3], 3: [0, 0, 100]}
Here, in the XML / mission spec, the user would define an id for each reward handler. So for example, 1 could be rewardFromTouchingBlockType, 2 could be some other reward handler.
The text was updated successfully, but these errors were encountered:
Each RewardProducer has a new optional int attribute dimension, default=0.
TimestampedFloat is renamed to TimestampedFloats and contains a map of dimension:float.
A reward message is sent only if one of the rewards has been triggered.
The existing parameter reward.value returns the reward for dimension 0, if there is one.
This way, most of the sample code and xml files are unchanged. If the user wants multi-dimensional rewards (which several people have asked for) then we support it. If the user wants to separate rewards by their RewardProducer then we support that too. Discrete agents with single-dimension rewards become simpler, since they only need to check if a reward has been received, not that it is non-zero. It allows a reward of zero for taking a step, which is currently problematic in tabular_q_learning.py.
Current behavior: r = zero even if no rewards were produced
wanted:
or:
Here, in the XML / mission spec, the user would define an id for each reward handler. So for example, 1 could be rewardFromTouchingBlockType, 2 could be some other reward handler.
The text was updated successfully, but these errors were encountered: