This repository contains demo programs for the "Talking Head(?) Anime from a Single Image 4: Improved Model and Its Distillation" project. Roughly, the project is about a machine learning model that can animate an anime character given only one image. However, the model is too slow to run in real-time. So, it also proposes an algorithm to use the model to train a small machine learning model that is specialized to a character image that can anime the character in real time.
This demo code has two parts.
-
Improved model. This part gives a model similar to Version 3 of the porject. It has one demo program:
- The
full_manual_poser
allows the user to manipulate a character's facial expression and body rotation through a graphical user interface.
There are no real-time demos because the new model is too slow for that.
- The
-
Distillation. This part allows the user to train small models (which we will refer to as student models) to mimic that behavior of the full system with regards to a specific character image. It also allows the user to run these models under various interfaces. The demo programs are:
-
distill
trains a student model given a configuration file, a$512 \times 512$ RGBA character image, and a mask of facial organs. -
distiller_ui
provides a user-friendly interface todistill
, allowing you to create training configurations and providing useful documentation. -
character_model_manual_poser
allows the user to control trained student models with a graphical user interface. -
character_model_ifacialmocap_puppeteer
allows the user to control trained student models with their facial movement, which is captured by the iFacialMocap software. To run this software, you must have an iOS device and, of course, iFacialMocap. -
character_model_mediapipe_puppeteer
allows the user to control trained student models with their facial movement, which is captured a web camera and processed by the Mediapipe FaceLandmarker model. To run this software, you need a web camera.
-
There is no such program in this release. If you want one, try the ifacialmocap_puppeteer
of Version 3.
NO. This release does it in a more complicated way. In order to control an image, you need to create a "student model." It is a small (< 2MB) and fast machine learning model that knows how to animate that particular image. Then, the student model can be controlled with facial movement. You can find two student models in the data/character_models
directory. The two demos on the project website feature 13 students models.
No. You can create your own student models.
- You prepare your characater image according to the "Constraint on Input Images" section below.
- You prepare a black-and-white mask image that covers the eyes and the mouth of the character, like this image. You can see how I made it with GIMP by inspecting this GIMP file.
- You use
distiller_ui
to create a configuration file that specifies how the student model should be trained. - You use
distiller_ui
ordistill
to start the training process. - You wait several ten hours for the student model to finish training. Last time I tried, it was about 30 hours on a computer with an Nvidia RTX A6000 GPU.
- After that, you can control the student model with
character_model_ifacialmocap_puppeteer
andcharacter_model_mediapipe_puppeteer
.
Version 3 is arguably easier to use because you can give it an animate and you can control it with your facial movment immediately. However, I was not satisfied with its image quality and speed.
In this release, I explore a new way of doing things. I added a new preprocessing stage (i.e., training the student models) that has to be done one time per character image. It allows the image to be animated much faster at a higher image quality level.
In other words, it makes the user's life difficult but the engineer/researcher happy. Patient users who are willing to go through the steps, though, would be rewarded with faster animation.
No. A student model created by distill
is a PyTorch model, which cannot run directly in the browser. It needs to be converted to the appropriate format (TensorFlow.js) first, and the web demos use the converted models. However, The conversion code is not included in this repository. I will not release it unless I change my mind.
All programs require a recent and powerful Nvidia GPU to run. I developed the programs on a machine with an Nvidia RTX A6000. However, anything after the GeForce RTX 2080 should be fine.
The character_model_ifacialmocap_puppeteer
program requires an iOS device that is capable of computing blend shape parameters from a video feed. This means that the device must be able to run iOS 11.0 or higher and must have a TrueDepth front-facing camera. (See this page for more info.) In other words, if you have the iPhone X or something better, you should be all set. Personally, I have used an iPhone 12 mini.
The character_model_mediapipe_puppeteer
program requires a web camera.
Please update your GPU's device driver and install the CUDA Toolkit that is compatible with your GPU and is newer than the version you will be installing in the next subsection.
All programs are written in the Python programming languages. The following libraries are required:
python
3.10.11torch
1.13.1 with CUDA supporttorchvision
0.14.1tensorboard
2.15.1opencv-python
4.8.1.78wxpython
4.2.1numpy-quaternion
2022.4.2pillow
9.4.0matplotlib
3.6.3einops
0.6.0mediapipe
0.10.3numpy
1.26.3scipy
1.12.0omegaconf
2.3.0
Instead of installing these libraries yourself, you should follow the recommended method to set up a Python environment in the next section.
If you want to use ifacialmocap_puppeteer
, you will also need to an iOS software called iFacialMocap (a 980 yen purchase in the App Store). Your iOS and your computer must use the same network. For example, you may connect them to the same wireless router.
Please install Python 3.10.11.
I recommend using pyenv
(or pyenv-win
for Windows users) to manage multiple Python versions on your system. If you use pyenv
, this repository has a .python-version
file that indicates it would use Python 3.10.11. So, you will be using Python 3.10.11 automatically once you cd
into the repository's directory.
Make sure that you can run Python from the command line.
Please install Poetry 1.7 or later. We will use it to automatically install the required libraries. Again, make sure that you can run it from the command line.
Please clone the repository to an arbitrary directory in your machine.
- Open a shell.
cd
to the directory you just cloned the repository toocd SOMEWHERE/talking-head-anime-4-demo
- Use Python to create a virtual environment under the
venv
directory.python -m venv venv --prompt talking-head-anime-4-demo
- Activate the newly created virtual environment. You can either use the script I provide:
or do it yourself:
source bin/activate-venv.sh
source venv/bin/activate
- Use Poetry to install libraries.
cd poetry poetry install
- Open a shell.
cd
to the directory you just cloned the repository toocd SOMEWHERE\talking-head-anime-4-demo
- Use Python to create a virtual environment under the
venv
directory.python -m venv venv --prompt talking-head-anime-4-demo
- Activate the newly created virtual environment. You can either use the script I provide:
or do it yourself:
bin\activate-venv.bat
venv\Scripts\activate
- Use Poetry to install libraries.
cd poetry poetry install
Please download this ZIP file hosted on Dropbox, and unzip it to the data/tha4
directory the under the repository's directory. In the end, the directory tree should look like the following diagram:
+ talking-head-anime-4-demo
+ data
- character_models
- distill_examples
+ tha4
- body_morpher.pt
- eyebrow_decomposer.pt
- eyebrow_morphing_combiner.pt
- face_morpher.pt
- upscaler.pt
- images
- third_party
If you want to create your own student models, you also need to download a dataset of poses that are needed for the training process. Download this pose_dataset.pt
file and save it to the data
folder. The directory tree should then look like the following diagram:
+ talking-head-anime-4-demo
+ data
- character_models
- distill_examples
- tha4
- images
- third_party
- pose_dataset.pt
The programs are located in the src/tha4/app
directory. You need to run them from a shell with the provided scripts.
-
Open a shell.
-
cd
to the repository's directory.cd SOMEWHERE/talking-head-anime-4-demo
-
Run a program.
bin/run src/tha4/app/<program-file-name>
where
<program-file-name>
can be replaced with:character_model_ifacialmocap_puppeteer.py
character_model_manual_poser.py
character_model_mediapipe_puppeteer.py
distill.py
disllerer_ui.py
full_manual_poser.py
-
Open a shell.
-
cd
to the repository's directory.cd SOMEWHERE\talking-head-anime-4-demo
-
Run a program.
bin\run.bat src\tha4\app\<program-file-name>
where
<program-file-name>
can be replaced with:character_model_ifacialmocap_puppeteer.py
character_model_manual_poser.py
character_model_mediapipe_puppeteer.py
distill.py
disllerer_ui.py
full_manual_poser.py
In order for the system to work well, the input image must obey the following constraints:
- It should be of resolution 512 x 512. (If the demo programs receives an input image of any other size, they will resize the image to this resolution and also output at this resolution.)
- It must have an alpha channel.
- It must contain only one humanoid character.
- The character should be standing upright and facing forward.
- The character's hands should be below and far from the head.
- The head of the character should roughly be contained in the 128 x 128 box in the middle of the top half of the image.
- The alpha channels of all pixels that do not belong to the character (i.e., background pixels) must be 0.
character_model_ifacial_model_puppeteer
character_model_manual_poser
character_model_mediapipe_puppeteer
distill
distiller_ui
full_manual_poser
The author is an employee of pixiv Inc. This project is a part of his work as a researcher.
However, this project is NOT a pixiv product. The company will NOT provide any support for this project. The author will try to support the project, but there are no Service Level Agreements (SLAs) that he will maintain.
The code is released under the MIT license.
The THA4 models and the images under the data/images
directory are released under the Creative Commons Attribution-NonCommercial 4.0 International.
This repository redistributes a version of the Face landmark detection model from the MediaPipe project. The model has been released under the Apache License, Version 2.0.