This project focuses on adapting a High-Resolution Network (HRNet) for pose estimation in an underwater swimming environment. Pose estimation underwater presents unique challenges, including light refraction, motion distortion, and occlusions caused by water turbulence. Leveraging HRNet's ability to maintain high-resolution representations throughout the process, this project aims to overcome these obstacles and achieve accurate keypoint detection.
Here is the link for the final presentation. I hope you will enjoy it.
The dataset for this project was collected from the University of South Carolina Swim and Dive Team. The athletes' performances were recorded during training sessions to generate a comprehensive dataset for underwater pose estimation.
You can learn more about the swim and dive team here.
The dataset used for training can be found in the data directory. it contains a few dataset directories, each from a different video. each dataset will have a directory for the frames and the COCO format JSON annotation file.
each frame is annotated with 13 keypoints to represent swimmer biomechanics.
This project was developed based on the research paper "Deep High-Resolution Representation Learning for Human Pose Estimation" by Ke Sun, Bin Xiao, Dong Liu, and Jingdong Wang. The paper introduces HRNet, which maintains high-resolution representations throughout the process, achieving superior accuracy in pose estimation tasks. This project's implementation draws inspiration from Bin Xiao's official HRNet repository, which is available on GitHub here.
-
Clone this repository to your local machine:
git clone https://github.com/csce585-mlsystems/FancyBear.git
-
Navigate into the project directory:
cd FancyBear/Swimming_Pose_Estimation
-
Install the required packages:
pip install -r requirements.txt
First, open Tester.py
and add the model and input video file paths in lines 182 and 188.
Next, run the Tester.py
script.
python train_model.py
This script will estimate keypoints found in the video and output a new video with the estimated keypoints.
Training the model will require NVidia GPU. In order to modify the learning rate, batch size, and the number of epochs go to lines 142 and 145. After a new minimum has been found, the trainer will save the model as a checkpoint.pth.
To initialize the training, run the hrnetTrainer.py
python hrnetTrainer.py
There are a few scripts to test various components of the system.
The hrnetTester.pt
script will test the accuracy of the model by loading a single frame on a desired model and showing it with the corresponding keypoints, the target ground truth for each keypoint (GT keypoint n), the confidence level for each predicted keypoint,
the ground truth for each predicted keypoint (Pred Keypoint n), and the Euclidean distance from the target keypoint to the predicted keypoint.
First, open hrnetTester.pt
and add the desired model and frame for testing on lines 165 and 174. Next run `hrnetTester.py.
python hrnetTester.py
The testerAugmentation.py
script will test three data augmentation techniques, Horizontal Flip, Rotation, and Translation. The function will select a random frame from the dataset and apply the augmentations, showing the before and after frames with the corresponding keypoints.
To run the augmentation tester:
python testerAugmentation.py
The testerDATALOADER.py
script will test the functionality of the dataloader by loading the dataset and displaying a batch of 4 frames with their name and corresponding keypoints.
To run the dataloader tester:
python testerDATALOADER.py