Accepted in the European Conference on Computer Vision (ECCV) 2022.
In computer vision, human pose synthesis and transfer deal with probabilistic image generation of a person in a previously unseen pose from an already available observation of that person. Though researchers have recently proposed several methods to achieve this task, most of these techniques derive the target pose directly from the desired target image on a specific dataset, making the underlying process challenging to apply in real-world scenarios as the generation of the target image is the actual aim. In this paper, we first present the shortcomings of current pose transfer algorithms and then propose a novel text-based pose transfer technique to address those issues. We divide the problem into three independent stages: (a) text to pose representation, (b) pose refinement, and (c) pose rendering. To the best of our knowledge, this is one of the first attempts to develop a text-based pose transfer framework where we also introduce a new dataset DF-PASS, by adding descriptive pose annotations for the images of the DeepFashion dataset. The proposed method generates promising results with significant qualitative and quantitative scores in our experiments.
The pipeline is divided into three stages. In stage 1, we estimate the target pose keypoints from the corresponding text description embedding. In stage 2, we regressively refine the initial estimation of the facial keypoints and obtain the refined target pose keypoints. Finally, in stage 3, we render the target image by conditioning the pose transfer on the source image.
Keypoints-guided methods tend to produce structurally inaccurate results when the physical appearance of the target pose reference significantly differs from the condition image. This observation is more frequent for the out of distribution target poses than the within distribution target poses. On the other hand, the existing text-guided method occasionally misinterprets the target pose due to a limited set of basic poses used for pose representation. The proposed text-guided technique successfully addresses these issues while retaining the ability to generate visually decent results close to the keypoints-guided baseline.
- Clone the project repository and install dependencies.
git clone https://github.com/prasunroy/tips.git
cd tips
mkdir datasets
pip install -r requirements.txt
- Download the DF-PASS dataset from Google Drive and extract into
datasets/DF-PASS
directory.
tips
├───datasets
│ └───DF-PASS
│ ├───gaussian_heatmaps
│ ├───descriptions.csv
│ ├───encodings.csv
│ ├───test_img_keypoints.csv
│ ├───test_img_list.csv
│ ├───test_img_pairs.csv
│ ├───train_img_keypoints.csv
│ ├───train_img_list.csv
│ └───train_img_pairs.csv
└─── ...
- Download the pretrained checkpoints and test data from Google Drive and extract into
tips/demo
directory.
tips
├───demo
│ ├───checkpoints
│ │ ├───pose2pose_260500.pth
│ │ ├───refinenet_100.pth
│ │ └───text2pose_75000.pth
│ ├───data
│ │ ├───images
│ │ ├───descriptions.csv
│ │ ├───encodings.csv
│ │ ├───img_pairs_df2df.csv
│ │ ├───img_pairs_df2rw.csv
│ │ ├───keypoints.csv
│ │ └───FreeMono.ttf
│ └─── ...
└─── ...
- Run the demo notebook from
tips/demo
directory.
cd demo
jupyter notebook TIPS_demo.ipynb
@inproceedings{roy2022tips,
title = {TIPS: Text-Induced Pose Synthesis},
author = {Roy, Prasun and Ghosh, Subhankar and Bhattacharya, Saumik and Pal, Umapada and Blumenstein, Michael},
booktitle = {The European Conference on Computer Vision (ECCV)},
month = {October},
year = {2022}
}
[1] Multi-scale Attention Guided Pose Transfer (PR 2023).
[2] Scene Aware Person Image Generation through Global Contextual Conditioning (ICPR 2022).
[3] Text Guided Person Image Synthesis (CVPR 2019).
[4] Progressive Pose Attention Transfer for Person Image Generation (CVPR 2019).
[5] DeepFashion: Powering Robust Clothes Recognition and Retrieval With Rich Annotations (CVPR 2016).
Copyright 2022 by the authors
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
The DF-PASS dataset and the pretrained models are released under Creative Commons Attribution 4.0 International (CC BY 4.0) license.