Another Contributor: Rashid Ali
This is part of the repository code for Transform-based One-Shot Learning with Variable Camera Angle for Human Behavior Recognition Inside the Elevator
and also part of my Master Thesis (Design and Implementation of Transformbased EfficientLearning with Variable Camera Angle for Human BehaviorRecognition inside the Elevator)
For our paper (Still on the way to published) click here
Dataset licensed by Human 3.6M download here
This code is for predicting 2D Alpha Pose to 3D Pose using 3D Pose Baseline. All of the neural network here is built using Pytorch. For full code including classification, you can click here
I developed this code from related works below. Thank you for providing nice code for research purposes:
Below is our contribution in this repository:
- Using Alpha Pose as an input to estimate its 3D Pose Estimation
- Improve 3D Pose Baseline Accuracy by tuning the hyperparameter using Optuna Hyperparameter Search
- Provide function to map COCO Skeleton Format to Human3.6M Format
- Provide function to transform 3D Skeleton to another reference skeleton (To match angle and position between two skeleton)
- Provide jupyter notebook for visualize all the skeleton result (You can use this one for research purposes)
- Obtain license for Human3.6M in here and download the dataset.
- Open alpha pose skeleton from your images or video in .json file using this repository
- Open "src/Full Experiment.ipynb" to examine all the code and doing experiment with this code (This code contain all the experiment including visualization, etc)
- Open "src/Full Pipeline 2D to 3D npy from Alpha Pose.ipynb" to doing 3D Skeleton prediction from Alpha Pose json file. You will receive 3D Skeleton in NPY and JSON file.
- Feel free to explore those two codes and use it for your needs.
RGB Image | Alpha Pose | Human 3.6M 2D | Human 3.6M 3D | |
---|---|---|---|---|
Falling (Scenario 1) | ||||
Falling (Scenario 2) | ||||
Walk Dog (Human3.6M Dataset) | ||||
Direction (Human 3.6M Dataaset) |
This test is using multicam dataset where the same action taken with 8 different camera angle at the same time. We try to transform all the angles to angle of camera 4. For example, we train our classification module with angle from camera 4, using this transformation, we do not need to train new data from another camera angle. Instead, we could transform other angle to camera 4.
Test 1 - Standing | Test 2 - Falling | |
---|---|---|
RGB Image | ||
2D Pose Estimation | ||
3D Inference Result | ||
Transform another angle to CAM 4 | ||
Map back to 2D Skeleton |
Original version is refer to original 3D Pose Baseline from this repository
Pytorch version is refer to original 3D Pose Baseline from this repository ( retrained using their default setting )
We try to do hyperparameter tuning using optuna and found better result with better MJPE (Mean Per Join Position Error). Our final hyperparameter shown below:
Best trial:
Params:
lr: 0.0006495169217159662
lr_decay: 150000
gamma: 0.9
linear_size: 1024
p_dropout: 0.3
num_stage: 5
optimizer: Adam
direct. | discuss. | eat. | greet. | phone | photo | pose | purch. | sit | sitd. | smoke | wait | walkd. | walk | walkT | avg | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
original version | 37.7 | 44.4 | 40.3 | 42.1 | 48.2 | 54.9 | 44.4 | 42.1 | 54.6 | 58.0 | 45.1 | 46.4 | 47.6 | 36.4 | 40.4 | 45.5 |
pytorch version | 35.9 | 42.9 | 37.3 | 40.2 | 43.1 | 51.2 | 44.0 | 37.9 | 51.9 | 53.0 | 41.4 | 42.1 | 43.6 | 32.8 | 35.2 | 42.2 |
Ours | 35.0 | 40.8 | 36.0 | 39.1 | 42.8 | 48.7 | 41.8 | 37.0 | 47.8 | 47.9 | 40.2 | 41.1 | 41.7 | 31.4 | 33.0 | 40.3 |
MIT