Modifies loss storage in DDPG, TD3, and Soft Actor Critic #195

prabhatnagarajan · 2024-04-13T00:00:05Z

As per this discussion:

I have switched many of the variable.detach().cpu().numpy() with variable.item() in our DDPG-TD3-SAC family of agents.

I have ran the training scripts for these agents for a bit and checked the scores.txt and the values are indeed being written to scores.txt without issues.

prabhatnagarajan · 2024-04-29T06:55:07Z

Friendly reminder on this :)

muupan · 2024-04-29T07:03:07Z

The changes will probably not break anything, but can you elaborate on why you suggest them? If it is just for consistency, I think it is better to force recorded losses to be floats, not scalar ndarrays, for simplicity.

prabhatnagarajan · 2024-04-29T07:18:11Z

I think to justify it I need more data. But I did it mainly for consistency. But I think it may require less memory in some instances (at least some informal experiments showed slight improvements in memory usage though I can't be sure without a more complete experiment).

If it is just for consistency, I think it is better to force recorded losses to be floats, not scalar ndarrays, for simplicity.

To be clear, we do indeed force them to be floats, right? We simply do loss.detach().cpu().numpy() before casting them to floats.

Do you remember why we did detach().cpu().numpy() originally?

muupan · 2024-04-29T08:01:00Z

To be clear, we do indeed force them to be floats, right? We simply do loss.detach().cpu().numpy() before casting them to floats.

Ah sorry I missed float around it. Right, both are eventually casting losses to floats. So I think they should function in the same way. Directly casting to float could be more efficient as it could skip making a numpy.ndarray, but I don't know the implementation details of Torch.__float__.

Do you remember why we did detach().cpu().numpy() originally?

IIRC I was just unaware it is possible to directly cast torch.Tensor to float. After I learned it is safe to cast (pytorch/pytorch#1129) I switched.

I think it is better to choose the simpler way unless there is a memory leak issue in it. There seems to be yet another way of doing this, Tensor.item(), perhaps now recommended by pytorch:
https://pytorch.org/docs/stable/tensors.html#initializing-and-basic-operations

Use torch.Tensor.item() to get a Python number from a tensor containing a single value:

prabhatnagarajan · 2024-04-29T08:19:15Z

Oh, thanks!

Yes, tensor.item() actually seems quite like what we want. The documentation also states that: This operation is not differentiable.

This tells me that it is implicitly detaching from the computation graph (which is what we apparently we do when we detach()).

Perhaps I can replace instances of detach().cpu().numpy() with tensor.item()? And check the tests and maybe some runs and so forth?

muupan · 2024-04-29T08:33:55Z

Perhaps I can replace instances of detach().cpu().numpy() with tensor.item()? And check the tests and maybe some runs and so forth?

Sounds good. I think it is ok to write like self.q_func1_loss_record.append(loss1.item()) as torch.zeros(1, dtype=torch.float32).item() is already float.

muupan · 2024-08-04T14:37:02Z

/test

pfn-ci-bot · 2024-08-04T14:37:09Z

Successfully created a job for commit 41c0e92:

Dashboard for commit 41c0e92

muupan

LGTM. Sorry for the delay.

prabhatnagarajan added 7 commits September 15, 2022 15:36

tries to fix issue

6a9f758

using packaging version

e8be181

minor bug fix

e66d5a0

updates version to match ours

05ed456

merges with upstream/master

776bb8c

Merge remote-tracking branch 'upstream/master'

83459b4

detaches losses for records

b7e6478

github-actions bot requested a review from muupan April 13, 2024 00:00

reverts numpy version

b5b35d5

prabhatnagarajan added 3 commits July 7, 2024 12:25

Uses tensor.item()

999fe74

reverts q record computation

50bd939

for extends, undoes things

41c0e92

prabhatnagarajan changed the title ~~Modifies loss storage in TD3 and Soft Actor Critic~~ Modifies loss storage in DDPG, TD3, and Soft Actor Critic Jul 8, 2024

muupan approved these changes Aug 4, 2024

View reviewed changes

muupan merged commit c8cb332 into pfnet:master Aug 4, 2024
6 of 7 checks passed

prabhatnagarajan deleted the loss_storage branch August 4, 2024 22:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modifies loss storage in DDPG, TD3, and Soft Actor Critic #195

Modifies loss storage in DDPG, TD3, and Soft Actor Critic #195

prabhatnagarajan commented Apr 13, 2024 •

edited

Loading

prabhatnagarajan commented Apr 29, 2024

muupan commented Apr 29, 2024

prabhatnagarajan commented Apr 29, 2024 •

edited

Loading

muupan commented Apr 29, 2024 •

edited

Loading

prabhatnagarajan commented Apr 29, 2024 •

edited

Loading

muupan commented Apr 29, 2024

muupan commented Aug 4, 2024

pfn-ci-bot commented Aug 4, 2024

muupan left a comment

Modifies loss storage in DDPG, TD3, and Soft Actor Critic #195

Modifies loss storage in DDPG, TD3, and Soft Actor Critic #195

Conversation

prabhatnagarajan commented Apr 13, 2024 • edited Loading

prabhatnagarajan commented Apr 29, 2024

muupan commented Apr 29, 2024

prabhatnagarajan commented Apr 29, 2024 • edited Loading

muupan commented Apr 29, 2024 • edited Loading

prabhatnagarajan commented Apr 29, 2024 • edited Loading

muupan commented Apr 29, 2024

muupan commented Aug 4, 2024

pfn-ci-bot commented Aug 4, 2024

muupan left a comment

Choose a reason for hiding this comment

prabhatnagarajan commented Apr 13, 2024 •

edited

Loading

prabhatnagarajan commented Apr 29, 2024 •

edited

Loading

muupan commented Apr 29, 2024 •

edited

Loading

prabhatnagarajan commented Apr 29, 2024 •

edited

Loading