This is the official repo for ECCV2020 paper "Whole-Body Human Pose Estimation in the Wild". The slides of this work can be found in Slides. The repo contains COCO-WholeBody annotations proposed in this paper. Note that in our ECCV paper, all experiments are conducted on COCO-WholeBody V0.5. We further improve the annotation of the proposed dataset from V0.5 to V1.0. The benchmark results for COCO-WholeBody V1.0 can be found in MMPose. More introduction of COCO-WholeBody V1.0 is summarized in our recent TPAMI paper.


[2022.08.08] Our recent paper "ZoomNAS: Searching for Whole-body Human Pose Estimation in the Wild" has been published in IEEE Transactions on Pattern Analysis and Machine Intelligence. In this paper, we propose a neural architecture search framework, termed ZoomNAS, to promote both the accuracy and efficiency of whole-body pose estimation.

[2021.03.20] Our whole-body pose estimation model has been adopted in the top-2 winning solutions (smilelab2021 and rhythmblue) of the CVPR 2021 Looking at People Large Scale Signer Independent Isolated SLR Challenge.

[2020.10.08] We put a fancy demo video to YouTube. The demo video is generated by MMPose. Enjoy :)

[2020.09.19] We have supported COCO-WholeBody in MMPose. A much stronger baseline model dark_pose_hrnet_w48+ with WholeBody AP 66.1% is provided for research purpose.

[2020.09.15] The evaluation tool is integrated into Extended COCO API for version>=1.5

To install from pip:

pip install xtcocotools

What is COCO-WholeBody?

COCO-WholeBody dataset is the first large-scale benchmark for whole-body pose estimation. It is an extension of COCO 2017 dataset with the same train/val split as COCO.

Here is an example of one annotated image.

For each person, we annotate 4 types of bounding boxes (person box, face box, left-hand box, and right-hand box) and 133 keypoints (17 for body, 6 for feet, 68 for face and 42 for hands). The face/hand box is defined as the minimal bounding rectangle of the keypoints. The keypoint annotations are illustrated as follows.

How to Use?


Images can be downloaded from COCO 2017 website.

COCO-WholeBody annotations for Train / Validation (OneDrive).

Alternatively, we also provide the BaiduPan download link for the annotation files.

BaiduPan Link:

Password: pu6j

Note: Our dataset is also available at OpenXLab.

Annotation Format

The data format is defined in DATA_FORMAT.


We provide evaluation tools for COCO-WholeBody dataset. Our evaluation tools is developed based on @cocodataset/cocoapi.

We also provide an example groundtruth file (example_gt.json) and an example pred file (example_pred.json).

Evaluate on COCO-WholeBody by running the following line:

python evaluation/ --gt_file evaluation/example_gt.json --res_file evaluation/example_pred.json

Terms of Use

  1. COCO-WholeBody dataset is ONLY for research and non-commercial use. The annotations of COCO-WholeBody dataset belong to SenseTime Research, and are licensed under a Creative Commons Attribution 4.0 License.

  2. For commercial usage of our COCO-WholeBody annotations, please contact Mr. Malon (machang[at]tetras[dot]ai) and cc Sheng Jin (jinsheng13[at]foxmail[dot]com).

  3. We do not own the copyright of the images. Use of the images must abide by the Flickr Terms of Use. The users of the images accept full responsibility for the use of the dataset, including but not limited to the use of any copies of copyrighted images that they may create from the dataset.

Compare with other popular datasets.

Overview of some popular public datasets for 2D keypoint estimation in RGB images. Kpt stands for keypoints, and #Kpt means the annotated number. ``Wild'' denotes whether the dataset is collected in-the-wild. * means head box.

DataSet Images #Kpt Wild Body Box Hand Box Face Box Body Kpt Hand Kpt Face Kpt Total
MPII [1] 25K 16 ✔️ ✔️ * ✔️ 40K
MPII-TRB [2] 25K 40 ✔️ ✔️ * ✔️ 40K
CrowdPose [3] 20K 14 ✔️ ✔️ ✔️ 80K
PoseTrack [4] 23K 15 ✔️ ✔️ ✔️ 150K
AI Challenger [5] 300K 14 ✔️ ✔️ ✔️ 700K
COCO [6] 200K 17 ✔️ ✔️ * ✔️ 250K
OneHand10K [7] 10K 21 ✔️ ✔️ ✔️ -
SynthHand [8] 63K 21 ✔️ ✔️ -
RHD [9] 41K 21 ✔️ ✔️ -
FreiHand [10] 130K 21 ✔️ -
MHP [11] 80K 21 ✔️ ✔️ -
GANerated [12] 330K 21 ✔️ -
Panoptic [13] 15K 21 ✔️ ✔️ -
WFLW [14] 10K 98 ✔️ ✔️ ✔️ -
AFLW [15] 25K 19 ✔️ ✔️ ✔️ -
COFW [16] 1852 29 ✔️ ✔️ ✔️ -
300W [17] 3837 68 ✔️ ✔️ ✔️ -
Halpe[27] 43K 136 ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ 50K
COCO-WholeBody 200K 133 ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ ✔️ 250K

COCO-WholeBody Benchmark (V0.5)

Whole-body pose estimation results on our WholeBody benchmark.

Method body foot face hand whole
OpenPose [18] 0.563 0.612 0.532 0.645 0.482 0.626 0.198 0.342 0.338 0.449
SN [19] 0.280 0.336 0.121 0.277 0.382 0.440 0.138 0.336 0.161 0.209
PAF [20] 0.266 0.328 0.100 0.257 0.309 0.362 0.133 0.321 0.141 0.185
PAF-body [20] 0.409 0.470 - - - - - - - -
AE [21] 0.405 0.464 0.077 0.160 0.477 0.580 0.341 0.435 0.274 0.350
AE-body [21] 0.582 0.634 - - - - - - - -
HRNet [22] 0.659 0.709 0.314 0.424 0.523 0.582 0.300 0.363 0.432 0.520
HRNet-body [22] 0.758 0.809 - - - - - - - -
ZoomNet 0.743 0.802 0.798 0.869 0.623 0.701 0.401 0.498 0.541 0.658

Pre-training on COCO-WholeBody V0.5 for face/hand keypoint estimation

WholeBody-Face (WBF) & WholeBody-Hand (WBH) are subsets of COCO-WholeBody.

We build WBF & WBH by extracting cropped face & hand images and annotations from COCO-WholeBody.

Method extra. comm.↓ chall.↓ full ↓ test ↓
RCN [23] - 4.67 8.44 5.41 -
DAN [24] - 3.19 5.24 3.59 4.30
DCFE [25] w/3D 2.76 5.22 3.24 3.88
LAB [14] w/Boundary 2.98 5.19 3.49 -
HRNet [26] - 2.87 5.15 3.32 3.85
HRNet-Ours - 2.89 5.15 3.33 3.91
HRNet-Ours WBF 2.84 4.73 3.21 3.68
Train-set Test-set EPE ↓ NME ↓
CMU Panoptic [13] CMU Panoptic [13] 7.49 0.68
WBH → CMU Panoptic [13] CMU Panoptic [13] 7.00 0.63
WBH WBH 2.76 6.66
CMU Panoptic [13] → WBH WBH 2.70 6.49


If you use this dataset in your project, please cite this paper.

  title={Whole-Body Human Pose Estimation in the Wild},
  author={Jin, Sheng and Xu, Lumin and Xu, Jin and Wang, Can and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping},
  booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},    

  title={ZoomNAS: Searching for Whole-body Human Pose Estimation in the Wild},
  author={Xu, Lumin and Jin, Sheng and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping and Wang, Xiaogang},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},


