[Bug]: Action Spaces > 1 not working in Rewards w/ IM Loss #52

Zach-Attach · 2024-10-30T17:17:44Z

🐛 Bug

When calculating im_loss (such as in ICM, E3B, RIDE, and Pseudo-counts), the calculation

# use a random mask to select a subset of the training data
mask = th.rand(len(im_loss), device=self.device)
mask = (mask < self.update_proportion).type(th.FloatTensor).to(self.device)
# get the masked losses
im_loss = (im_loss * mask).sum() / th.max(
                mask.sum(), th.tensor([1], device=self.device, dtype=th.float32)
            )

(as seen on line 221 in icm.py) returns an error as a result of the im_loss being of size BATCH_SIZE x N_ACTIONS and mask being of size BATCH_SIZE x 1, so they cannot be multiplied.

Croip3 claimed to have a solution in RLE-Foundation/RLeXplore#21

Alternatively, I have 2 potential solutions, depending on how implementation is expected.

Use same mask for all actions at time t.

im_mask = mask.unsqueeze(1).repeat(1, 3)
# get the masked losses
im_loss = (im_loss * im_mask).sum() / th.max(
    im_mask.sum(), th.tensor([1], device=self.device, dtype=th.float32)
)

Create unique mask values for all actions (which would be different from the fm_loss mask)

# use a random mask to select a subset of the training data
im_mask = th.rand(im_loss.shape, device=self.device)
im_mask = (im_mask < self.update_proportion).type(th.FloatTensor).to(self.device)

fm_mask = th.rand(len(im_loss), device=self.device) # or could be len(fm_loss) or fm_loss.shape
fm_mask = (fm_mask < self.update_proportion).type(th.FloatTensor).to(self.device)
# get the masked losses
im_loss = (im_loss * im_mask).sum() / th.max(
    im_mask.sum(), th.tensor([1], device=self.device, dtype=th.float32)
)
fm_loss = (fm_loss * fm_mask).sum() / th.max(
    fm_mask.sum(), th.tensor([1], device=self.device, dtype=th.float32)
)

To Reproduce

RLE-Foundation/RLeXplore#21 describes a means of replicating it which may be simple. I have no simple means of replicating the issue without a large amount of code.

I was running ICM on an environment with a continuous action space of 3 actions and have had the same result with E3B.

Relevant log output / Error message

`File "/home/longarm_wsl/anaconda3/envs/metaworld3.12/lib/python3.11/site-packages/rllte/xplore/reward/icm.py", line 225, in update im_loss = (im_loss * mask).sum() / th.max( ~~~~~~~~^~~~~~ RuntimeError: The size of tensor a (8) must match the size of tensor b (256) at non-singleton dimension 1`

System Info

No response

Checklist

I have checked that there is no similar issue in the repo
I have read the documentation
I have provided a minimal working example to reproduce the bug
I've used the markdown code blocks for both code and stack traces.

Zach-Attach added the bug Something isn't working label Oct 30, 2024

Zach-Attach linked a pull request Nov 8, 2024 that will close this issue

Bug Fix for IM Loss for Action Spaces > 1 #54

Open

15 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Action Spaces > 1 not working in Rewards w/ IM Loss #52

[Bug]: Action Spaces > 1 not working in Rewards w/ IM Loss #52

Zach-Attach commented Oct 30, 2024 •

edited

Loading

[Bug]: Action Spaces > 1 not working in Rewards w/ IM Loss #52

[Bug]: Action Spaces > 1 not working in Rewards w/ IM Loss #52

Comments

Zach-Attach commented Oct 30, 2024 • edited Loading

🐛 Bug

To Reproduce

Relevant log output / Error message

System Info

Checklist

Zach-Attach commented Oct 30, 2024 •

edited

Loading