You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When calculating im_loss (such as in ICM, E3B, RIDE, and Pseudo-counts), the calculation
# use a random mask to select a subset of the training datamask=th.rand(len(im_loss), device=self.device)
mask= (mask<self.update_proportion).type(th.FloatTensor).to(self.device)
# get the masked lossesim_loss= (im_loss*mask).sum() /th.max(
mask.sum(), th.tensor([1], device=self.device, dtype=th.float32)
)
(as seen on line 221 in icm.py) returns an error as a result of the im_loss being of size BATCH_SIZE x N_ACTIONS and mask being of size BATCH_SIZE x 1, so they cannot be multiplied.
Alternatively, I have 2 potential solutions, depending on how implementation is expected.
Use same mask for all actions at time t.
im_mask=mask.unsqueeze(1).repeat(1, 3)
# get the masked lossesim_loss= (im_loss*im_mask).sum() /th.max(
im_mask.sum(), th.tensor([1], device=self.device, dtype=th.float32)
)
Create unique mask values for all actions (which would be different from the fm_loss mask)
# use a random mask to select a subset of the training dataim_mask=th.rand(im_loss.shape, device=self.device)
im_mask= (im_mask<self.update_proportion).type(th.FloatTensor).to(self.device)
fm_mask=th.rand(len(im_loss), device=self.device) # or could be len(fm_loss) or fm_loss.shapefm_mask= (fm_mask<self.update_proportion).type(th.FloatTensor).to(self.device)
# get the masked lossesim_loss= (im_loss*im_mask).sum() /th.max(
im_mask.sum(), th.tensor([1], device=self.device, dtype=th.float32)
)
fm_loss= (fm_loss*fm_mask).sum() /th.max(
fm_mask.sum(), th.tensor([1], device=self.device, dtype=th.float32)
)
To Reproduce
RLE-Foundation/RLeXplore#21 describes a means of replicating it which may be simple. I have no simple means of replicating the issue without a large amount of code.
I was running ICM on an environment with a continuous action space of 3 actions and have had the same result with E3B.
Relevant log output / Error message
`File "/home/longarm_wsl/anaconda3/envs/metaworld3.12/lib/python3.11/site-packages/rllte/xplore/reward/icm.py", line 225, in update im_loss = (im_loss *mask).sum() / th.max( ~~~~~~~~^~~~~~ RuntimeError: The size of tensor a (8) must match the size of tensor b (256) at non-singleton dimension 1`
System Info
No response
Checklist
I have checked that there is no similar issue in the repo
🐛 Bug
When calculating im_loss (such as in ICM, E3B, RIDE, and Pseudo-counts), the calculation
(as seen on line 221 in icm.py) returns an error as a result of the im_loss being of size BATCH_SIZE x N_ACTIONS and mask being of size BATCH_SIZE x 1, so they cannot be multiplied.
Croip3 claimed to have a solution in RLE-Foundation/RLeXplore#21
Alternatively, I have 2 potential solutions, depending on how implementation is expected.
To Reproduce
RLE-Foundation/RLeXplore#21 describes a means of replicating it which may be simple. I have no simple means of replicating the issue without a large amount of code.
I was running ICM on an environment with a continuous action space of 3 actions and have had the same result with E3B.
Relevant log output / Error message
`File "/home/longarm_wsl/anaconda3/envs/metaworld3.12/lib/python3.11/site-packages/rllte/xplore/reward/icm.py", line 225, in update im_loss = (im_loss * mask).sum() / th.max( ~~~~~~~~^~~~~~ RuntimeError: The size of tensor a (8) must match the size of tensor b (256) at non-singleton dimension 1`
System Info
No response
Checklist
The text was updated successfully, but these errors were encountered: