Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RNaD Entropy Schedule #1076

Closed
spktrm opened this issue May 25, 2023 · 4 comments
Closed

RNaD Entropy Schedule #1076

spktrm opened this issue May 25, 2023 · 4 comments
Labels
bug Something isn't working fixed This is fixed internally, and will be merged in the next github sync!

Comments

@spktrm
Copy link
Contributor

spktrm commented May 25, 2023

s = EntropySchedule(sizes=(10,))
print([s(i) for i in range(20)])
[
(0.0, False), (0.2, False), (0.4, False), (0.6, False), (0.8, False), 
(1.0, False), (1.0, False), (1.0, False), (1.0, False), (1.0, False), 
(0.0, True), (0.2, False), (0.4, False), (0.6, False), (0.8, False), 
(1.0, False), (1.0, False), (1.0, False), (1.0, False), (1.0, False)
]

Shouldn't the alpha value be 1.0 when the regularization nets are updated?

Since it is 0 this effectively disrupts the linear interpolation between regularization policies

@lanctot
Copy link
Collaborator

lanctot commented Jun 1, 2023

@perolat, @bartdevylder: any ideas?

@perolat
Copy link

perolat commented Jun 13, 2023

Hi @spktrm ,

Thanks for the question.

It should becomes 0 after we update the two regularisation policies. Unless we missed an edge case there should only be continuous interpolations between regularisation networks.

It should go this way:

  • alpha goes from 0 to 1 linearly over half of the interval (interpolate from pi_{reg, 0} to pi_{reg, 1}),
  • then alpha stays at one for the rest of the interval (here the regularisation policy is pi_{reg, 1}),
  • we update the networks (to interpolate between pi_{reg, 1} and pi_{reg, 2})
  • start a new interval from alpha=0 (so at the first step of this interval we start with the regularisation policy pi_{reg, 1})

Let me know if you see something that doesn't match this intended behaviour.

Julien

@spktrm
Copy link
Contributor Author

spktrm commented Jun 13, 2023

In the current code, on the steps the regularisation nets are updated, alpha = 0. The regularisation nets are updated after alpha is used to compute the parameter updates. As such, the current entropy schedule implementation outputs alpha = 0 when update_target_net is True.

@lanctot
Copy link
Collaborator

lanctot commented Aug 31, 2023

Update: We have identified this as a bug and Eugene Tarassov has a fix. Will be fixed on next sync to github.

@lanctot lanctot added bug Something isn't working fixed This is fixed internally, and will be merged in the next github sync! labels Aug 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working fixed This is fixed internally, and will be merged in the next github sync!
Projects
None yet
Development

No branches or pull requests

3 participants