Supervised learning #48

MichaelGreisberger · 2021-06-19T12:56:14Z

Hello,

I am interested in implementing supervised learning in AlphaZero.jl. Since it's mentioned on the contribution page, I assume it hasn't been implemented yet? Has anyone already thought about this?

I would like to implement the following features:

Generate examples (I am not sure if this makes sense at all. At this point, there is usually a lack of a good enough agent to generate good examples. One possibility could be to use the already implemented baseline agents (MCTS, Minimax))
Train with examples

Does anyone have an idea where I should best start?

jonathan-laurent · 2021-06-19T14:33:46Z

Using supervised learning to get a good initial policy is most interesting for complicated games where:

Training an agent from scratch is not realistic on the compute power you have
A lot of human-play data is available.

This is especially true for games such as chess and go (9x9 or 19x19) and I believe this is the only way you can plausibly train a good player with AlphaZero on a single machine.

That being said, the idea could still be tried out on simpler games and in the absence of human data by recording games from existing AIs playing against each other (minmax, alphabeta, vanilla MCTS...) and using this data to train an initial policy. (We could also imagine a more advanced design like the one of the initial AlphaGo agent where the supervised policy is used as a second oracle within MCTS).

More generally, I would be very curious to see a blog post where someones uses a mix of AlphaZero and supervised learning to train an agent that is better than what could be achieved using one of those techniques alone on a given resource budget. I would certainly value and advertise such a contribution.

PS: if you are interested in contributing, a great first contribution may be to demonstrate AphaZero.jl on a game from OpenSpiel.jl, as this has been requested by multiple people already (see open issues). It should not be too much work, especially since I've already written a wrapper for CommonRLInterface.jl that can be used as an example. (No pressure here, just a suggestion!)

MichaelGreisberger · 2021-06-19T17:06:33Z

Another use case for supervised learning could be hyperparameter tuning. Training a network with AlphaZero without knowing which parameters are useful for the game can be very inefficient, as self-play takes quite some time. This is especially true for games with high complexity.

The idea of comparing two or three different agents that all have the same resources, but have been trained differently, also sounds appealing to me.

I will look into implementing supervised learning for AlphaZero.jl. Could you point me in the right direction to get started? Do you know of any obstacles I will encounter in the process?

jonathan-laurent · 2021-06-19T19:48:08Z

Another use case for supervised learning could be hyperparameter tuning. Training a network with AlphaZero without knowing which parameters are useful for the game can be very inefficient, as self-play takes quite some time. This is especially true for games with high complexity.

It is true that using supervised learning can be useful to tune the architecture of the neural network. Also, it may be interesting to tune MCTS based on an oracle that results from supervised learning.

I will look into implementing supervised learning for AlphaZero.jl. Could you point me in the right direction to get started? Do you know of any obstacles I will encounter in the process?

Before thinking in terms of contributing to AlphaZero.jl, you should be thinking about a cool and meaningful experiment you want to run and write about. Then, we can look at the code of your experiment and see if there are things that we want to integrate into AlphaZero.jl so that other people can have an easier time. As a starting point, I would recommend looking at the game of connect-four, which is the canonical benchmark for AlphaZero.jl. A perfect solver is available for connect-four so you can use it to generate a lot of experience. Then, you can train a neural network policy on this data and see how it compares to a policy trained from self-play using AlphaZero. You can also compare several network architectures and see how well they perform (e.g. how useful is it to have 8 resnet blocks instead of 5? how well does a fully-connected network perform?). Finally, you can try a hybrid approach where you train an agent based on a mix of supervised learning and self-play.

MichaelGreisberger · 2021-06-20T10:32:16Z

Besides the contribution to AlphaZero.jl, the main reason I want to use/implement supervised learning is that I need it for an experiment. I have already implemented a game (Fanorona) and want to train an agent to compete with several other MCTS-based agents that I have previously implemented for Fanorona. I already have some training data from previous tournaments I have run with my other agents. I could use at least parts of this data for supervised learning.

The idea of using the connect-four solver or trying OpenSpiel sounds great from a scientific point of view, but is out of my scope. I hope that my implementation of supervised learning can still be useful. But we can revisit this question after the implementation. I have already made a fork of AlphaZero.jl and will implement everything there. It should not be a problem to integrate features from this fork back into AlphaZero.jl, but the decision if and what to integrate is of course up to you.

With this in mind, it would be great if you could give me some advice on where to start and if you can think of any hurdles to implementing supervised learning.

jonathan-laurent · 2021-06-20T15:30:22Z

I guess what you've have to do is generate many samples of the kind that are stored in AlphaZero's memory buffer. You can take these samples either from human play data or have other players play against each other to generate data. If you do so, be careful to add some exploration so that the same game is not played again and again and that you get some diversity in your data. Once you've got the data, you can either use the Trainer utility in learning.jl or just write your training procedure yourself in Flux.

MichaelGreisberger · 2021-06-21T09:22:22Z

Ok, got it. Thank you for helping me out. I guess we can close this issue for now. I will come back to you when i finish supervised learning or have any more questions.

jonathan-laurent closed this as completed Jun 21, 2021

StepHaze mentioned this issue May 10, 2022

StackOverflowError (during training) #116

Closed

smart-fr mentioned this issue Jan 19, 2023

Stack overflow with lots of RAM still available #164

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Supervised learning #48

Supervised learning #48

MichaelGreisberger commented Jun 19, 2021

jonathan-laurent commented Jun 19, 2021

MichaelGreisberger commented Jun 19, 2021

jonathan-laurent commented Jun 19, 2021

MichaelGreisberger commented Jun 20, 2021

jonathan-laurent commented Jun 20, 2021

MichaelGreisberger commented Jun 21, 2021

Supervised learning #48

Supervised learning #48

Comments

MichaelGreisberger commented Jun 19, 2021

jonathan-laurent commented Jun 19, 2021

MichaelGreisberger commented Jun 19, 2021

jonathan-laurent commented Jun 19, 2021

MichaelGreisberger commented Jun 20, 2021

jonathan-laurent commented Jun 20, 2021

MichaelGreisberger commented Jun 21, 2021