This is an paddle implementation of Simple Baselines for Human Pose Estimation and Tracking. This work provides baseline methods that are surprisingly simple and effective, thus helpful for inspiring and evaluating new ideas for the field. State-of-the-art results are achieved on challenging benchmarks.
Arch | Head | Shoulder | Elbow | Wrist | Hip | Knee | Ankle | Mean | Mean@0.1 |
---|---|---|---|---|---|---|---|---|---|
256x256_pose_resnet_50_d256d256d256 | 96.658 | 95.482 | 89.041 | 83.587 | 88.333 | 84.625 | 80.090 | 88.808 | 34.241 |
- Flip test is used.
The code is developed using python 3.7 on Ubuntu 16.04. NVIDIA GPUs are needed. The code is developed and tested using 1 NVIDIA V100 GPU cards.
-
Install paddle >= v2.1.2
-
Clone this repo, and we'll call the directory that you cloned as ${POSE_ROOT}.
-
Install dependencies:
pip install -r requirements.txt
-
Install coco dependencies:
cd cocoapi/PythonAPI&&python3 setup.py install --user
-
Init output(training model output directory) and log(visual log directory) directory:
mkdir output mkdir log
Your directory tree should look like this:
${POSE_ROOT} ├── data ├── experiments ├── images ├── lib ├── log ├── models ├── output ├── pose_estimation ├── README.md └── requirements.txt
For MPII data, please download from MPII Human Pose Dataset. The original annotation files are in matlab format. We have converted them into json format, you also need to download them from OneDrive or GoogleDrive. Extract them under {POSE_ROOT}/data, and make them look like this:
${POSE_ROOT}
|-- data
`-- |-- mpii
`-- |-- annot
| |-- gt_valid.mat
| |-- test.json
| |-- train.json
| |-- trainval.json
| `-- valid.json
`-- images
|-- 000001163.jpg
|-- 000003072.jpg
python pose_estimation/train.py \
--cfg experiments/mpii/resnet50/256x256_d256x3_adam_lr1e-3.yaml
The training output info like:
2021-12-14 09:45:59 Epoch: [61][0/695] Time 1.030s (1.030s) Speed 31.1 samples/s Data 0.819s (0.819s) Lr 0.0010 (0.0010) Loss 0.00045 (0.00045) Accuracy 0.851 (0.851)
2021-12-14 09:46:49 Epoch: [61][50/695] Time 1.268s (0.992s) Speed 25.2 samples/s Data 1.026s (0.800s) Lr 0.0010 (0.0010) Loss 0.00044 (0.00047) Accuracy 0.878 (0.849)
pretained_model link: https://pan.baidu.com/s/1gaeyGPThltbyeoEQTx7RZQ valid code: sxfs
python pose_estimation/valid.py \
--cfg experiments/mpii/resnet50/256x256_d256x3_adam_lr1e-3.yaml \
--flip-test \
--model-file output/mpii/pose_resnet_50/256x256_d256x3_adam_lr1e-3/model_best.pdparams
The valid info output format as follow:
Test: [0/93] Time 5.032 (5.032) Loss 0.0003 (0.0003) Accuracy 0.946 (0.946)
Test: [50/93] Time 0.254 (0.391) Loss 0.0004 (0.0004) Accuracy 0.901 (0.900)
| Arch | Head | Shoulder | Elbow | Wrist | Hip | Knee | Ankle | Mean | Mean@0.1 |
|---|---|---|---|---|---|---|---|---|---|
| 256x256_pose_resnet_50_d256d256d256 | 96.658 | 95.482 | 89.041 | 83.587 | 88.333 | 84.625 | 80.090 | 88.808 | 34.241 |
Note: You need to abtain object's location bbox firstly. details in infer.py line 138.
If there are multiple persons in images, detectors such as Faster R-CNN, SSD or others should be used first to crop them out. Because the simple baseline for human pose estimation is a top-down method.
python pose_estimation/infer.py --cfg experiments/mpii/resnet50/256x256_d256x3_adam_lr1e-3.yaml --img-file images/test.jpg --model-file output/mpii/pose_resnet_50/256x256_d256x3_adam_lr1e-3/model_best.pdparams
The extracted mpii mini dataset containing 8 training images and 8 validation images is stored in the data folder in the repository. (The full dataset annotations and images are not included, just to verify the correctness of the training and validation for this project, please replace the entire dataset during subsequent training.) After installing the dependencies according to the above instructions, ultra-fast training and validation can be performed directly using the following commands:
- unzip mini datasets
unzip data/mpii.zip -d data/
- ultra-fast training-validation
python pose_estimation/train.py \
--cfg experiments/mpii/resnet50/256x256_d256x3_adam_lr1e-3.yaml
output info on screen as follow:
=> load 15 samples
=> load 8 samples
2021-12-14 21:15:25 Epoch: [0][0/8] Time 1.835s (1.835s) Speed 1.1 samples/s Data 0.073s (0.073s) Lr 0.0010 (0.0010) Loss 0.55474 (0.55474) Accuracy 0.000 (0.000)
Test: [0/1] Time 1.839 (1.839) Loss 0.0195 (0.0195) Accuracy 0.023 (0.023)
| Arch | Head | Shoulder | Elbow | Wrist | Hip | Knee | Ankle | Mean | Mean@0.1 |
|---|---|---|---|---|---|---|---|---|---|
| 256x256_pose_resnet_50_d256d256d256 | 0.000 | 6.250 | 0.000 | 6.250 | 0.000 | 0.000 | 0.000 | 1.852 | 0.000 |
=> saving checkpoint to output/mpii/pose_resnet_50/256x256_d256x3_adam_lr1e-3
python pose_estimation/export.py \
--cfg experiments/mpii/resnet50/256x256_d256x3_adam_lr1e-3.yaml \
--model-file output/mpii/pose_resnet_50/256x256_d256x3_adam_lr1e-3/model_best.pdparams
The exporting info like:
2021-12-14 09:57:14 [INFO] ONNX model saved in ./output/onnx/posenet.onnx
=> Model saved as: ./output/onnx/posenet.onnx
=> Done.
Note: Only support for coco dataset now. The deploy.py is a example only.
python pose_estimation/deploy.py --img data/coco/images/xxx.jpg --model output/onnx/posenet.onnx --type ONNX --width 656 --height 384
Original project: https://github.com/microsoft/human-pose-estimation.pytorch
If you want use other dataset such as coco. Please check the orginal project's experiments setting.
The differences in .yaml file are as follows(You need to point out the depth of backbone and select lr_sche):
MODEL:
NAME: pose_resnet
BACKBONE: resnet
DEPTH: 50
TRAIN:
BATCH_SIZE: 128
SHUFFLE: True
BEGIN_EPOCH: 0
END_EPOCH: 140
RESUME: False
OPTIMIZER: adam
LR_TYPE: PolynomialDecay
If you want to cite the work in your research, please cite with:
@inproceedings{xiao2018simple,
author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
title={Simple Baselines for Human Pose Estimation and Tracking},
booktitle = {European Conference on Computer Vision (ECCV)},
year = {2018}
}