Slippi-AI (Phillip II)

This project is the successor to Phillip. While the original Phillip used pure deep RL, this one starts with behavioral cloning on slippi replays, which makes it play a lot more like a human. There is a discord channel for discussion/feedback/support.

Playing the Bot

The bot is available to play via netplay on my twitch channel.

I am hesitant to release any trained agents as I don't want people using them on ranked/unranked, so at the moment the bot isn't available to play against locally.

Recordings

Phillip has played a number of top players:

My youtube channel also has some recordings and clips.

Acknowledgements

Huge thanks to Fizzi for writing the fast-forward gecko code that significantly speeds up RL training, for providing most of the imitation training data in the form of anonymized ranked collections (link in the Slippi discord), and of course for giving us Slippi in the first place. Even prior to rollback netcode, slippi replays were what rekindled my interest in melee AI, and are what gave name to this repo.
Big thanks also to altf4 for creating the libmelee interface to slippi dolphin, making melee AI development accessible to everyone.
Thank you to the many players who have generously shared their replays.

Code Overview

Phillip is trained in two stages. In the first stage, it learns to imitate human play from a large dataset of slippi replays. The resulting imitation policy is ok, but makes a lot of mistakes. In the second stage, the imitation policy is refined by playing against itself with Reinforcement Learning. This results in much stronger agents that have their own style of play.

Creating a Dataset

The first step is preprocess your slippi replays using slippi_db/parse_local.py. See the documentation in that file for more details.

Note: local parsing currently depends on peppi-py version 0.6.0 which you may need to build manually.

The output of this step will be a Parsed directory of preprocessed games and a meta.json metadata file.

Imitation Learning

The entry point for imitation learning is scripts/train.py. See scripts/imitation_example.sh for appropriate arguments.

Metrics are logged to wandb during training. To use your own wandb account, set the WANDB_API_KEY environment variable. The key metric to look at is eval.policy.loss -- once this has plateaued you can stop training. On a good GPU (e.g. a 3080Ti), imitation learning should take a few days to a week. The agent checkpoint will be periodically written to experiments/<tag>/latest.pkl.

Reinforcement Learning

There are two entry points for RL: slippi_ai/rl/run.py for training an agent in the ditto, and slippi_ai/rl/train_two.py which trains two agents simultaneously. The arguments are similar for both; see scripts/rl_example.sh for an example ditto training script.

Evaluation

To play a trained agent or watch two trained agents play each other, use scripts/eval_two.py. To do a full evalution of two agents against each other, use scripts/run_evaluator.py.

Name		Name	Last commit message	Last commit date
Latest commit History 598 Commits
.github/workflows		.github/workflows
b9		b9
notebooks		notebooks
scripts		scripts
slippi_ai		slippi_ai
slippi_db		slippi_db
tests		tests
.dockerignore		.dockerignore
.gitconfig		.gitconfig
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
environment.yaml		environment.yaml
pyproject.toml		pyproject.toml
requirements-b9.txt		requirements-b9.txt
requirements.txt		requirements.txt
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Slippi-AI (Phillip II)

Playing the Bot

Recordings

Acknowledgements

Code Overview

Creating a Dataset

Imitation Learning

Reinforcement Learning

Evaluation

About

Releases

Packages

Contributors 3

Languages

License

vladfi1/slippi-ai

Folders and files

Latest commit

History

Repository files navigation

Slippi-AI (Phillip II)

Playing the Bot

Recordings

Acknowledgements

Code Overview

Creating a Dataset

Imitation Learning

Reinforcement Learning

Evaluation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages