GAN Inverter is a GAN inversion toolbox based on PyTorch library.
We collect SOTA inversion methods and construct a uniform pipeline with more features. Different methods and training strategies are convenient to compose and add. We hope that this toolbox could help people in every use.
- Implementations of sota inversion methods.
- Unified training/evaluation/inference/editing process.
- Modular and flexible configuration. Easy to set options by config file (yaml) or command in every use.
- Additional training features.
- Distributed training.
- Weight & bias (wandb).
- Automatically resume training.
We are working for supporting more methods' inference and conducting the benchmark.
2023.02
: SAM is supported.
2023.02
: V1.1. Re-organized codes: methods' class, inference pipeline. Add our new work DHR "What Decreases Editing Capability? Domain-Specific Hybrid Refinement for Improved GAN Inversion".
2022.11
: Add more optimizers and PTI is supported now.
2022.10
: GAN Inverter v1.0 is released. Support methods: pSp, e4e, LSAP.
2022.09
: LSAP is published on arxiv.
Although previous works use "encoder-based", "optimization-based" and "hybrid method" to categorize inversion methods, this set of division criteria is no longer appropriate at present. According to the purpose of methods, we divide the inversion process into two steps: Image Embedding and Result Refinement:
- Image Embedding aims to embed images into latent code by encoder or optimization.
- Result Refinement aims to refine the initial inversion and editing results from the first step by various strategies (e.g., adjusting generator weight or intermediate feature).
Method | Type | Repo | Paper | Source | |
---|---|---|---|---|---|
☑️ | pSp | E | code | paper | CVPR2021 |
☑️ | e4e | E | code | paper | SIGGRAPH2021 |
☑️ | LSAP | E | code | paper | Arxiv2022 |
◻️ | E2Style | E | code | paper | TIP2022 |
◻️ | Style Transformer | E | code | paper | CVPR2022 |
☑️ | StyleGAN2(LPIPS) | O | code | paper | CVPR2020 |
Note: E
/O
/H
means encoder-based and optimization-based methods.
Method | Repo | Paper | Source | |
---|---|---|---|---|
◻️ | HyperStyle | code | paper | CVPR2022 |
◻️ | HFGI | code | paper | CVPR2022 |
☑️ | SAM | code | paper | CVPR2022 |
☑️ | PTI | code | paper | TOG2022 |
◻️ | FeatureStyleEncoder | code | paper | ECCV2022 |
☑️ | Domain-Specific Hybrid Refinement (DHR) | code | paper | Arxiv2023 |
Method | Repo | Paper | Source | |
---|---|---|---|---|
☑️ | InterFaceGAN | code | paper | CVPR2020 |
☑️ | GANSpace | code | paper | NeurIPS2020 |
◻️ | StyleClip | code | paper | ICCV2021 |
As evaluation settings are different in previous inversion works, we conduct a benchmark to better evaluate inversion methods based on our unified pipeline. See Evaluation for more details.
Evaluation Settings:
-
Dataset: CelebA-HQ test split (2,824) images;
-
ID: identity similarity measured by face recognition model;
-
LPIPS version: VGG;
-
Images are generated and converted to uint8 to evaluate, except for ID and FID, which are evaluated on saved images (png format).
-
See
scripts/test.py
for more details.
Refinement | Embedding | PSNR |
MSE |
LPIPS |
ID |
FID |
---|---|---|---|---|---|---|
- | Optimization-W | - | - | - | - | - |
- | Optimization-W+ | - | - | - | - | - |
- | pSp | 18.0348 | 0.0345 | 0.1591 | - | - |
- | e4e | 16.6616 | 0.0472 | 0.1974 | - | - |
- | LSAP | 17.4958 | 0.0391 | 0.1765 | - | - |
PTI | pivot_w | 24.6004 | 0.0082 | 0.0820 | - | - |
PTI | e4e | - | - | - | - | - |
HFGI | e4e | - | - | - | - | - |
SAM | e4e | - | - | - | - | - |
SAM | LSAP | - | - | - | - | - |
DHR | e4e | - | - | - | - | - |
DHR | LSAP | - | - | - | - | - |
Note: The results may be inconsistent with the reported results in our paper because of different implementations.
We conduct a unified config system in train/inference/test/edit. All options are saved in the config file, which can be conveniently determined for any use.
We define all options in options
. And options/base_options.py
contains communal options in every phase.
We follow two-stage inference in this repository. The base inference class TwoStageInference
is defined in ./inference/two_stage_inference.py
. It follows image embedding -> result refinement pipeline.
This uniform inversion process can easily combine two methods. Users can try any combination of methods, not limited to those employed by the original authors. For example, GANInverter makes it possible to connect ReStyle with HyperStyle by --embed_mode restyle --refine_mode hyperstyle
or PTI + e4e by --embed_mode e4e --refine_mode pti
.
💥 You can run any method combination by setting their config files now. See Inference for more details.
For example:
- e4e:
--configs configs/e4e/e4e_ffhq_r50.yaml
- PTI + e4e:
--configs configs/e4e/e4e_ffhq_r50.yaml configs/pti/pti.yaml
- DHR + saved latent codes:
--embed_mode code --refine_mode dhr --code_path /path/to/code/xxxx.pt
Supported methods:
Option | Methods | Note |
---|---|---|
--embed_mode encoder |
pSp, e4e, LSAP | |
--embed_mode optim |
Optimization | |
--embed_mode code |
Saved codes | Need to set --code_path
|
--refine_mode pti |
PTI | Using --embed_mode optim to attain |
--refine_mode dhr |
DHR |
You can load a checkpoint by two ways:
--checkpoint_path xxxx.pt
: manually set checkpoint path to load. Although model architecture is slightly different from previous repository (e.g., pSp, e4e), the weight will be automatically converted to fit our architecture. You can use their original weight file.--auto_resume True
: automatically load{exp_dir}/checkpoints/last.pt
in training phase or{exp_dir}/checkpoints/best_model.pt
in the other phase.
--train_dataset_path
,--batch_size
,--workers
are used only in training.--test_dataset_path
,--test_batch_size
,--test_workers
are default test/inference options in every use.
Please refer to Installation Instructions for the details of installation.
Please refer to Dataset Instructions for the details of datasets.
Example: Train LSAP on FFHQ
python scripts/train.py -c configs/lsap/lsap_ffhq_r50.yaml
python -m torch.distributed.launch --nproc_per_node=8 --master_port=12345 -c configs/lsap/lsap_ffhq_r50.yaml --gpu_num 8
Notes:
- set
--auto_resume True
for automatically resume. - Batch size means total size of all gpus. It must be a multiple of gpu num.
- In our experiments, distributed training with batch size of 8 may much slower or accelerate marginally. For example, one iteration of e4e cost 50 sec on both one or two cards. However, distributed training can amplify the total batch size (batch size 8 cost 21G gpu memory) and may achieve fast convergence by large batch size and learning rate. If batch size increased to 16 on two cards (8 sample per card), cost per iteration only slightly increase (from 50 to <60 sec). We are glad to receive any suggestions to improve performance of distributed training.
Example: Train HFGI with LSAP on FFHQ.
You can infer images by:
python scripts/infer.py -c /path/to/config1 /path/to/config2
or
python scripts/infer.py \
--embed_mode [embed_mode] \
--refine_mode [refine_mode] \
--test_dataset_path [/path/to/dataset]\
--output_dir [/path/to/output] \
--save_code [true/false] \
--checkpoint_path [/path/to/checkpoint]
--save_code
: whether to save latent code.--test_dataset_path
: image file or folder.--output_dir
: path to save inversion results. Inverse images will be saved in{output_dir}/inversion/
and latent codes will be saved in{output_dir}/code/
. If not set, use{exp_dir}/inference/
by default.--checkpoint_path
: model weight.
Example1: LSAP on CelebA-HQ.
python scripts/infer.py -c configs/lsap/lsap_ffhq_r50.yaml
Example2: Optimization on CelebA-HQ.
python scripts/infer.py -c configs/optim/optim_celeba-hq.yaml
--stylegan_weights
: whether to save latent code.
Example3: PTI+e4e
python scripts/infer.py -c configs/e4e/e4e_ffhq_r50.yaml configs/pti/pti.yaml
We have three embed_mode
in editing: encoder
, optim
, and code
.
python scripts/edit.py -c configs/lsap/lsap_ffhq_r50.yaml --edit_mode interfacegan --edit_path editing/interfacegan_directions/age.pt --edit_factor 1.0
If you have inferred images first and saved the latent codes, you can edit these latent codes without inversion. We recommend "inference->edit" pipeline since editing with various attributes and factors will not cost extra inversion time.
python scripts/edit.py -c configs/optim/optim_celeba-hq.yaml --embed_mode code --test_dataset_path /path/to/latent/codes/ --edit_mode interfacegan --edit_path editing/interfacegan_directions/age.pt --edit_factor 1.0
TODO
If you use this toolbox for your research, please cite our repo.
@misc{cao2022ganinverter,
author = {Pu Cao and Dongxu Liu and Lu Yang and Qing Song},
title = {GAN Inverter},
howpublished = {\url{https://github.com/caopulan/GANInverter}},
year = {2022},
}
@article{cao2022lsap,
title={LSAP: Rethinking Inversion Fidelity, Perception and Editability in GAN Latent Space},
author={Cao, Pu and Yang, Lu and Liu, Dongxv and Liu, Zhiwei and Wang, Wenguan and Li, Shan and Song, Qing},
journal={arXiv preprint arXiv:2209.12746},
year={2022}
}