Skip to content

Latest commit

 

History

History
231 lines (163 loc) · 22.1 KB

README.md

File metadata and controls

231 lines (163 loc) · 22.1 KB


Human-Art

This repository contains the implementation of the following paper:

Human-Art: A Versatile Human-Centric Dataset Bridging Natural and Artificial Scenes [Project Page] [Paper] [Code] [Data] [Video]
Xuan Ju∗12, Ailing Zeng∗1, Jianan Wang1, Qiang Xu2, Lei Zhang1
Equal contribution 1International Digital Economy Academy 2The Chinese University of Hong Kong

Table of Contents

General Description

This paper proposes a large-scale dataset, Human-Art, that targets multi-scenario human-centric tasks to bridge the gap between natural and artificial scenes. It includes twenty high-quality human scenes, including natural and artificial humans in both 2D representation (yellow dashed boxes) and 3D representation (blue solid boxes).

Contents of Human-Art:

  • 50,000 images including human figures in 20 scenarios (5 natural scenarios, 3 2D artificial scenarios, and 12 2D artificial scenarios)
  • Human-centric annotations include human bounding box, 21 2D human keypoints, human self-contact keypoints, and description text
  • baseline human detector and human pose estimator trained on the joint of MSCOCO and Human-Art

Tasks that Human-Art targets for:

  • multi-scenario human detection, 2D human pose estimation, and 3D human mesh recovery
    • Notably, after training with ED-Pose, results on MSCOCO raise 0.8, indicating multi-scenario images may benefit feature extraction and human understanding of real scenes.
  • multi-scenario human image generation (especially controllable human image generation, e.g. with conditions such as pose and text)
  • out-of-domain human detection and human pose estimation

Dataset Download

Under the CC-license, Human-Art is available for download. Fill out this form to request authorization to use Human-Art for non-commercial purposes. After you submit the form, an email containing the dataset will be instantly delivered to you. Please do not share or transfer the data privately.

For convenience of usage, Human-Art is processed using the same format as MSCOCO. Please save the dataset with the following file structure after downloading (we also include the file structure of COCO because we use it for joint training of COCO and Human-Art):

|-- data
    |-- HumanArt
        |-- annotations 
            |-- training_coco.json
            |-- training_humanart.json
            |-- training_humanart_coco.json
            |-- training_humanart_cartoon.json
            |-- ...
            |-- validation_coco.json
            |-- validation_humanart.json
            |-- validation_humanart_coco.json
            |-- validation_humanart_cartoon.json
            |-- ...
        |-- images
            |-- 2D_virtual_human
                |-- ...
            |-- 3D_virtual_human
                |-- ...
            |-- real_human
                |-- ...
    |-- coco
        |-- annotations 
        |-- train2017 
        |-- val2017 

Noted that we have several different json settings:

  • the ones end with _coco (e.g. training_coco.json) is reprocessed coco annotation json files (e.g. person_keypoints_train2017.json), which can be used in same format as Human-Art

  • the ones end with _humanart (e.g. training_humanart.json) is the annotation json files of Human-Art

  • the ones end with _humanart_coco (e.g. training_humanart_coco.json) is the annotation json files of the assemble of COCO and Human-Art

  • the ones end with _humanart_[scenario] (e.g. training_humanart_cartoon.json) is the annotation json files of one specific scenario of Human-Art

  • HumanArt_validation_detections_AP_H_56_person.json is the detection results with an AP of 56 for the evaluation of top-down pose estimation models (similar with COCO_val2017_detections_AP_H_56_person.json in MSCOCO)

The annotation json files of Human-Art is described as follows:

{
    "info":{xxx}, # some basic information of Human-Art
    "images":[
        {
            "file_name": "xxx" # the path of the image (same definition with COCO)
            "height": xxx, # the image height (same definition with COCO)
            "width": xxx, # the image width (same definition with COCO)
            "id": xxx, # the image id (same definition with COCO)
            "page_url": "xxx", # the web link of the page containing the image
            "image_url": "xxx", # the web link of the image
            "picture_name": "xxx", # the name of the image
            "author": "xxx", # the author of the image
            "description": "xxx", # the text description of the image
            "category": "xxx"  # the scenario of the image (e.g. cartoon)
        },
        ...
    ],
    "annotations":[
        {
            "keypoints":[xxx], # 17 COCO keypoints' position (same definition with COCO)
            "keypoints_21":[xxx], # 21 Human-Art keypoints' position 
            "self_contact": [xxx], # self contact keypoints, x1,y1,x2,y2...
            "num_keypoints": xxx, # annotated keypoints (not invisible) in 17 COCO format keypoints (same definition with COCO)
            "num_keypoints_21": xxx, # annotated keypoints (not invisible) in 21 Human-Art format keypoints 
            "iscrowd": xxx, # annotated or not (same definition with COCO)
            "image_id": xxx, # the image id (same definition with COCO)
            "area": xxx, # the human area (same definition with COCO)
            "bbox": [xxx], # the human bounding box (same definition with COCO)
            "category_id": 1, # category id=1 means it is a person category  (same definition with COCO)
            "id": xxx, # annotation id (same definition with COCO)
            "annotator": xxx # annotator id
        }
    ],
    "categories":[] # category infromation (same definition with COCO)
}

Human Pose Estimation

Human pose estimators trained on Human-Art is now supported in MMPose in this pr. The detailed usage and Model Zoo can be found in MMPose's documents: (1) ViTPose, (2) HRNet, and (3) RTMPose.

To train and evaluate human pose estimators, please refer to MMPose. Due to the frequent update of MMPose, we do not maintain a codebase in this repo. Since Human-Art is compatible with MSCOCO, you can train and evaluate any model in MMPose using its dataloader.

The supported model include (xx-coco means trained on MSCOCO only and xx-humanart-coco means trained on Human-Art and MSCOCO):

Results of ViTPose on Human-Art validation dataset with ground-truth bounding-box

With classic decoder

Arch Input Size AP AP50 AP75 AR AR50 ckpt log
ViTPose-S-coco 256x192 0.507 0.758 0.531 0.551 0.780 ckpt log
ViTPose-S-humanart-coco 256x192 0.738 0.905 0.802 0.768 0.911 ckpt log
ViTPose-B-coco 256x192 0.555 0.782 0.590 0.599 0.809 ckpt log
ViTPose-B-humanart-coco 256x192 0.759 0.905 0.823 0.790 0.917 ckpt log
ViTPose-L-coco 256x192 0.637 0.838 0.689 0.677 0.859 ckpt log
ViTPose-L-humanart-coco 256x192 0.789 0.916 0.845 0.819 0.929 ckpt log
ViTPose-H-coco 256x192 0.665 0.860 0.715 0.701 0.871 ckpt log
ViTPose-H-humanart-coco 256x192 0.800 0.926 0.855 0.828 0.933 ckpt log

Results of HRNet on Human-Art validation dataset with ground-truth bounding-box

With classic decoder

Arch Input Size AP AP50 AP75 AR AR50 ckpt log
pose_hrnet_w32-coco 256x192 0.533 0.771 0.562 0.574 0.792 ckpt log
pose_hrnet_w32-humanart-coco 256x192 0.754 0.906 0.812 0.783 0.916 ckpt log
pose_hrnet_w48-coco 256x192 0.557 0.782 0.593 0.595 0.804 ckpt log
pose_hrnet_w48-humanart-coco 256x192 0.769 0.906 0.825 0.796 0.919 ckpt log

Results of RTM-Pose on Human-Art validation dataset with ground-truth bounding-box

Arch Input Size AP AP50 AP75 AR AR50 ckpt log
rtmpose-t-coco 256x192 0.444 0.725 0.453 0.488 0.750 ckpt log
rtmpose-t-humanart-coco 256x192 0.655 0.872 0.720 0.693 0.890 ckpt log
rtmpose-s-coco 256x192 0.480 0.739 0.498 0.521 0.763 ckpt log
rtmpose-s-humanart-coco 256x192 0.698 0.893 0.768 0.732 0.903 ckpt log
rtmpose-m-coco 256x192 0.532 0.765 0.563 0.571 0.789 ckpt log
rtmpose-m-humanart-coco 256x192 0.728 0.895 0.791 0.759 0.906 ckpt log
rtmpose-l-coco 256x192 0.564 0.789 0.602 0.599 0.808 ckpt log
rtmpose-l-humanart-coco 256x192 0.753 0.905 0.812 0.783 0.915 ckpt log

Human Detection

Human detectors trained on Human-Art is now supported in MMPose in this pr. The detailed usage and Model Zoo can be found here.

To train and evaluate human detectors, please refer to MMDetection, which is an open source object detection toolbox based on PyTorch that support diverse detection frameworks with higher efficiency and higher accuracy. Due to the frequent update of MMDetection, we do not maintain a codebase in this repo. Since Human-Art is compatible with MSCOCO, you can train and evaluate any model in MMDetection using its dataloader.

The supported model include:

Detection Config Model AP
Download
RTMDet-tiny 46.6 Det Model
RTMDet-s 50.6 Det Model
YOLOX-nano 38.9 Det Model
YOLOX-tiny 47.7 Det Model
YOLOX-s 54.6 Det Model
YOLOX-m 59.1 Det Model
YOLOX-l 60.2 Det Model
YOLOX-x 61.3 Det Model

Citing Human-Art

If you find this repository useful for your work, please consider citing it as follows:

@inproceedings{ju2023human,
    title={Human-Art: A Versatile Human-Centric Dataset Bridging Natural and Artificial Scenes},
    author={Ju, Xuan and Zeng, Ailing and Wang, Jianan and Xu, Qiang and Zhang, Lei},
    booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
    year={2023},
}