Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supervised learning #48

Closed
MichaelGreisberger opened this issue Jun 19, 2021 · 6 comments
Closed

Supervised learning #48

MichaelGreisberger opened this issue Jun 19, 2021 · 6 comments

Comments

@MichaelGreisberger
Copy link

Hello,

I am interested in implementing supervised learning in AlphaZero.jl. Since it's mentioned on the contribution page, I assume it hasn't been implemented yet? Has anyone already thought about this?

I would like to implement the following features:

  • Generate examples (I am not sure if this makes sense at all. At this point, there is usually a lack of a good enough agent to generate good examples. One possibility could be to use the already implemented baseline agents (MCTS, Minimax))
  • Train with examples

Does anyone have an idea where I should best start?

@jonathan-laurent
Copy link
Owner

Using supervised learning to get a good initial policy is most interesting for complicated games where:

  • Training an agent from scratch is not realistic on the compute power you have
  • A lot of human-play data is available.

This is especially true for games such as chess and go (9x9 or 19x19) and I believe this is the only way you can plausibly train a good player with AlphaZero on a single machine.

That being said, the idea could still be tried out on simpler games and in the absence of human data by recording games from existing AIs playing against each other (minmax, alphabeta, vanilla MCTS...) and using this data to train an initial policy. (We could also imagine a more advanced design like the one of the initial AlphaGo agent where the supervised policy is used as a second oracle within MCTS).

More generally, I would be very curious to see a blog post where someones uses a mix of AlphaZero and supervised learning to train an agent that is better than what could be achieved using one of those techniques alone on a given resource budget. I would certainly value and advertise such a contribution.

PS: if you are interested in contributing, a great first contribution may be to demonstrate AphaZero.jl on a game from OpenSpiel.jl, as this has been requested by multiple people already (see open issues). It should not be too much work, especially since I've already written a wrapper for CommonRLInterface.jl that can be used as an example. (No pressure here, just a suggestion!)

@MichaelGreisberger
Copy link
Author

Another use case for supervised learning could be hyperparameter tuning. Training a network with AlphaZero without knowing which parameters are useful for the game can be very inefficient, as self-play takes quite some time. This is especially true for games with high complexity.

The idea of comparing two or three different agents that all have the same resources, but have been trained differently, also sounds appealing to me.

I will look into implementing supervised learning for AlphaZero.jl. Could you point me in the right direction to get started? Do you know of any obstacles I will encounter in the process?

@jonathan-laurent
Copy link
Owner

Another use case for supervised learning could be hyperparameter tuning. Training a network with AlphaZero without knowing which parameters are useful for the game can be very inefficient, as self-play takes quite some time. This is especially true for games with high complexity.

It is true that using supervised learning can be useful to tune the architecture of the neural network. Also, it may be interesting to tune MCTS based on an oracle that results from supervised learning.

I will look into implementing supervised learning for AlphaZero.jl. Could you point me in the right direction to get started? Do you know of any obstacles I will encounter in the process?

Before thinking in terms of contributing to AlphaZero.jl, you should be thinking about a cool and meaningful experiment you want to run and write about. Then, we can look at the code of your experiment and see if there are things that we want to integrate into AlphaZero.jl so that other people can have an easier time. As a starting point, I would recommend looking at the game of connect-four, which is the canonical benchmark for AlphaZero.jl. A perfect solver is available for connect-four so you can use it to generate a lot of experience. Then, you can train a neural network policy on this data and see how it compares to a policy trained from self-play using AlphaZero. You can also compare several network architectures and see how well they perform (e.g. how useful is it to have 8 resnet blocks instead of 5? how well does a fully-connected network perform?). Finally, you can try a hybrid approach where you train an agent based on a mix of supervised learning and self-play.

@MichaelGreisberger
Copy link
Author

Besides the contribution to AlphaZero.jl, the main reason I want to use/implement supervised learning is that I need it for an experiment. I have already implemented a game (Fanorona) and want to train an agent to compete with several other MCTS-based agents that I have previously implemented for Fanorona. I already have some training data from previous tournaments I have run with my other agents. I could use at least parts of this data for supervised learning.

The idea of using the connect-four solver or trying OpenSpiel sounds great from a scientific point of view, but is out of my scope. I hope that my implementation of supervised learning can still be useful. But we can revisit this question after the implementation. I have already made a fork of AlphaZero.jl and will implement everything there. It should not be a problem to integrate features from this fork back into AlphaZero.jl, but the decision if and what to integrate is of course up to you.

With this in mind, it would be great if you could give me some advice on where to start and if you can think of any hurdles to implementing supervised learning.

@jonathan-laurent
Copy link
Owner

I guess what you've have to do is generate many samples of the kind that are stored in AlphaZero's memory buffer. You can take these samples either from human play data or have other players play against each other to generate data. If you do so, be careful to add some exploration so that the same game is not played again and again and that you get some diversity in your data. Once you've got the data, you can either use the Trainer utility in learning.jl or just write your training procedure yourself in Flux.

@MichaelGreisberger
Copy link
Author

Ok, got it. Thank you for helping me out. I guess we can close this issue for now. I will come back to you when i finish supervised learning or have any more questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants