This project is the successor to Phillip. While the original Phillip used pure deep RL, this one starts with behavioral cloning on slippi replays, which makes it play a lot more like a human. There is a discord channel for discussion/feedback/support.
The bot is available to play via netplay on my twitch channel.
I am hesitant to release any trained agents as I don't want people using them on ranked/unranked, so at the moment the bot isn't available to play against locally.
Phillip has played a number of top players:
My youtube channel also has some recordings and clips.
- Huge thanks to Fizzi for writing the fast-forward gecko code that significantly speeds up RL training, for providing most of the imitation training data in the form of anonymized ranked collections (link in the Slippi discord), and of course for giving us Slippi in the first place. Even prior to rollback netcode, slippi replays were what rekindled my interest in melee AI, and are what gave name to this repo.
- Big thanks also to altf4 for creating the libmelee interface to slippi dolphin, making melee AI development accessible to everyone.
- Thank you to the many players who have generously shared their replays.
Phillip is trained in two stages. In the first stage, it learns to imitate human play from a large dataset of slippi replays. The resulting imitation policy is ok, but makes a lot of mistakes. In the second stage, the imitation policy is refined by playing against itself with Reinforcement Learning. This results in much stronger agents that have their own style of play.
The first step is preprocess your slippi replays using slippi_db/parse_local.py
. See the documentation in that file for more details.
Note: local parsing currently depends on peppi-py version 0.6.0 which you may need to build manually.
The output of this step will be a Parsed
directory of preprocessed games and a meta.json
metadata file.
The entry point for imitation learning is scripts/train.py
. See scripts/imitation_example.sh
for appropriate arguments.
Metrics are logged to wandb during training. To use your own wandb account, set the WANDB_API_KEY
environment variable. The key metric to look at is eval.policy.loss
-- once this has plateaued you can stop training. On a good GPU (e.g. a 3080Ti), imitation learning should take a few days to a week. The agent checkpoint will be periodically written to experiments/<tag>/latest.pkl
.
There are two entry points for RL: slippi_ai/rl/run.py
for training an agent in the ditto, and slippi_ai/rl/train_two.py
which trains two agents simultaneously. The arguments are similar for both; see scripts/rl_example.sh
for an example ditto training script.
To play a trained agent or watch two trained agents play each other, use scripts/eval_two.py
. To do a full evalution of two agents against each other, use scripts/run_evaluator.py
.