-
Notifications
You must be signed in to change notification settings - Fork 612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updates on Reanalyse / Sample Efficiency (Re-executing MCTS, Parallelization, Stabilization with a target model, etc.) #142
base: master
Are you sure you want to change the base?
Conversation
…strapping stabilization with a target network
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line enables saving a negligible amount of memory but is incompatible with some environments as suggested by @theword.
It should therefore be deleted.
replay_buffer.py
Outdated
# Import the game class to enable MCTS updates | ||
game_module = importlib.import_module("games." + self.config.game_filename) | ||
self.game = game_module.Game() | ||
self.game.env = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.game.env = None |
This is great. But after loading a checkpoint and restarting the training, the reanalyse_priorities will be initialized to None. That leads to an error when self.reanalyse_priorities +=1 |
torch.squeeze(values).detach().cpu().numpy() | ||
) | ||
# re-execute MCTS to update targets (child visist and root_values) | ||
l = len(game_history.root_values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey,
Should there not be an else on this part?
If I understand correctly, the if in line 346 is there to trigger "do not use re-executed MCTS tree roots, use updated values directly from the value function instead".
So these lines should trigger only if the if is not firing, I believe
Apologies if the mistake is mine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I'm seeing now I missed the if in line 392, which solves the problem of updating with tree roots without intending to.
For computation efficiency though, wouldn't it still be better to add the else, instead of computing both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe there shouldn't be an "else" on this since we have to execute MCTS to obtain a "fresh" policy (used as target for training, 80% of the time as mentioned in the paper, MuZero Reanalyze appendix). Meanwhile the value function is either updated via re-running MCTS or from a target network.
This pull requests aims to provide the following additions :