Official code for "X-HRNet: Towards Lightweight Human Pose Estimation with Spatially Unidimensional Self-Attention"
The code is developed using python 3.8 on Ubuntu 20.04. NVIDIA GPUs are needed. The code is developed and tested using 8 NVIDIA V100S GPU cards. Other platforms or GPU cards are not fully tested.
- Linux (Windows is not officially supported)
- Python 3.8
- PyTorch 1.8
- CUDA 11.1
- GCC 5+
- mmcv (Please install the latest version of mmcv-full)
- Numpy
- cv2
- json_tricks
- xtcocotools
a. Install mmcv, we recommend you to install the pre-build mmcv as below.
pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/{cu_version}/{torch_version}/index.html
Please replace {cu_version}
and {torch_version}
in the url to your desired one. For example, to install the latest mmcv-full
with CUDA 11
and PyTorch 1.8.0
, use the following command:
pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.8.0/index.html
If it compiles during installation, then please check that the cuda version and pytorch version **exactly"" matches the version in the mmcv-full installation command. For example, pytorch 1.7.0 and 1.7.1 are treated differently. See here for different versions of MMCV compatible to different PyTorch and CUDA versions.
you can compile mmcv from source by the following command
pip install mmcv-full==1.3.9
# alternative: pip install mmcv
Important: You need to run pip uninstall mmcv
first if you have mmcv installed. If mmcv and mmcv-full are both installed, there will be ModuleNotFoundError
.
b. Install build requirements
pip install -r requirements.txt
It is recommended to symlink the dataset root to $LITE_HRNET/data
.
If your folder structure is different, you may need to change the corresponding paths in config files.
For COCO data, please download from COCO download, 2017 Train/Val is needed for COCO keypoints training and validation. HRNet-Human-Pose-Estimation provides person detection result of COCO val2017 to reproduce our multi-person pose estimation results. Please download from OneDrive
Download and extract them under $LITE_HRNET/data
, and make them look like this:
lite_hrnet
├── configs
├── models
├── tools
`── data
│── coco
│-- annotations
│ │-- person_keypoints_train2017.json
│ |-- person_keypoints_val2017.json
|-- person_detection_results
| |-- COCO_val2017_detections_AP_H_56_person.json
│-- train2017
│ │-- 000000000009.jpg
│ │-- 000000000025.jpg
│ │-- 000000000030.jpg
│ │-- ...
`-- val2017
│-- 000000000139.jpg
│-- 000000000285.jpg
│-- 000000000632.jpg
│-- ...
All outputs (log files and checkpoints) will be saved to the working directory,
which is specified by work_dir
in the config file.
By default we evaluate the model on the validation set after each epoch, you can change the evaluation interval by modifying the interval argument in the training config
evaluation = dict(interval=5) # This evaluate the model per 5 epoch.
According to the Linear Scaling Rule, you need to set the learning rate proportional to the batch size if you use different GPUs or videos per GPU, e.g., lr=0.01 for 4 GPUs x 2 video/gpu and lr=0.08 for 16 GPUs x 4 video/gpu.
# train with a signle GPU
python tools/train.py ${CONFIG_FILE} [optional arguments]
# train with multiple GPUs
./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} [optional arguments]
Optional arguments are:
--validate
(strongly recommended): Perform evaluation at every k (default value is 5 epochs during the training.--work-dir ${WORK_DIR}
: Override the working directory specified in the config file.--resume-from ${CHECKPOINT_FILE}
: Resume from a previous checkpoint file.--gpus ${GPU_NUM}
: Number of gpus to use, which is only applicable to non-distributed training.--seed ${SEED}
: Seed id for random state in python, numpy and pytorch to generate random numbers.--deterministic
: If specified, it will set deterministic options for CUDNN backend.JOB_LAUNCHER
: Items for distributed job initialization launcher. Allowed choices arenone
,pytorch
,slurm
,mpi
. Especially, if set to none, it will test in a non-distributed mode.LOCAL_RANK
: ID for local rank. If not specified, it will be set to 0.--autoscale-lr
: If specified, it will automatically scale lr with the number of gpus by Linear Scaling Rule.
Difference between resume-from
and load-from
:
resume-from
loads both the model weights and optimizer status, and the epoch is also inherited from the specified checkpoint. It is usually used for resuming the training process that is interrupted accidentally.
load-from
only loads the model weights and the training epoch starts from 0. It is usually used for finetuning.
Examples:
./tools/dist_train.sh configs/xhrnet/sxhrnet_18_coco_256x192.py 8
You can use the following commands to test a dataset.
# single-gpu testing
python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] [--eval ${EVAL_METRIC}] \
[--proc_per_gpu ${NUM_PROC_PER_GPU}] [--gpu_collect] [--tmpdir ${TMPDIR}] [--average_clips ${AVG_TYPE}] \
[--launcher ${JOB_LAUNCHER}] [--local_rank ${LOCAL_RANK}]
# multiple-gpu testing
./tools/dist_test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM} [--out ${RESULT_FILE}] [--eval ${EVAL_METRIC}] \
[--proc_per_gpu ${NUM_PROC_PER_GPU}] [--gpu_collect] [--tmpdir ${TMPDIR}] [--average_clips ${AVG_TYPE}] \
[--launcher ${JOB_LAUNCHER}] [--local_rank ${LOCAL_RANK}]
Optional arguments:
RESULT_FILE
: Filename of the output results. If not specified, the results will not be saved to a file.EVAL_METRIC
: Items to be evaluated on the results. Allowed values depend on the dataset.NUM_PROC_PER_GPU
: Number of processes per GPU. If not specified, only one process will be assigned for a single gpu.--gpu_collect
: If specified, recognition results will be collected using gpu communication. Otherwise, it will save the results on different gpus toTMPDIR
and collect them by the rank 0 worker.TMPDIR
: Temporary directory used for collecting results from multiple workers, available when--gpu_collect
is not specified.AVG_TYPE
: Items to average the test clips. If set toprob
, it will apply softmax before averaging the clip scores. Otherwise, it will directly average the clip scores.JOB_LAUNCHER
: Items for distributed job initialization launcher. Allowed choices arenone
,pytorch
,slurm
,mpi
. Especially, if set to none, it will test in a non-distributed mode.LOCAL_RANK
: ID for local rank. If not specified, it will be set to 0.
Examples:
./tools/dist_test.sh configs/xhrnet/sxhrnet_18_coco_256x192.py \
checkpoints/SOME_CHECKPOINT.pth 8 \
--eval mAP
You can use the following commands to compute the complexity of one model.
python tools/summary_network.py ${CONFIG_FILE} --shape ${SHAPE}
Arguments:
SHAPE
: Input size.
Examples:
python tools/summary_network.py configs/xhrnet/sxhrnet_18_coco_256x192.py --shape 256 256
Thanks to:
If you use our code or models in your research, please cite with:
@inproceedings{xuan2022xhrnet,
title={X-HRNet: Towards Lightweight Human Pose Estimation with Spatially Unidimensional Self-Attention},
author={Zhou, Yixuan and Wang, Xuanhan and Xu, Xing and Zhao, Lei and Song, Jingkuan},
booktitle={ICME},
year={2022}
}