-
[√] Implement and test distributed agent(agent_distributed.py)
-
Implement and test diffusion stochastic MuZero
- Add diffusion model components(Rectified Flow)
- Decide jax or tensorflow, by reading mctx
- Test Stochastic MuZero(refer to mctx.stochastic_muzero_policy)
- Implement sampled MuZero mechanism
- Add diffusion model components(Rectified Flow)
-
Implement and test learning MCTS as policy improvement
-
Environments:
- [√] Open spiel Game Go
- Atari 100k
- dm_control
- safety (review)
-
Experiment with different search policies
- on open spiel game Go
- on stochastic Multi-Arm Bandits
- implement ltr with similar processing as AlphaZero
- find out equation equivalent for ltr
- Each implementation should include comprehensive testing
- Start Date: [11/6/2024]
- Target Completion: [12/15/2024]