This repository serves as an example to build your digital twin with only a cellphone camera. The software at every step is either directly using open-source programs or written by me. I'll share the pipeline in depth in the following sections.
graph TD;
Calibrate_Your_Camera-->Take_Pictures_of_the_Target_Person;
Take_Pictures_of_the_Target_Person-->Run_COLMAP_to_Localize_Cameras;
Run_COLMAP_to_Localize_Cameras-->Train_NeRF;
Train_NeRF-->Extract_the_Mesh_From_NeRF;
Extract_the_Mesh_From_NeRF-->Import_and_Clean_the_Mesh_in_Blender;
Import_and_Clean_the_Mesh_in_Blender-->Apply_Pose_Bone_to_the_Mesh;
Fetch_a_Target_Video-->Run_Mediapipe_Pose_Estimation;
Run_Mediapipe_Pose_Estimation-->Export_Pose_JSON;
Export_Pose_JSON-->Run_Blender_Script;
Apply_Pose_Bone_to_the_Mesh-->Run_Blender_Script;
- Python 3
- OpenCV
- Numpy
- tqdm
- COLMAP
- nerf_pl
- Blender
- mediapipe
The first step of the pipeline is to calibrate your camera. This step is optional, but it would help the scale recovery and improve the localization stability in the COLMAP step.
To calibrate the camera, please take ~50 images of a checkerboard pattern from different angles. The pattern would be printed out or displayed from any digital screen, eg. an iPad. Place the images under camera_calibration/calibration_images
.
- Take ~50 images of a checkerboard pattern, and place those images under
image_preprocessing/calibration_images
- Change line 8 (
CHECKERBOARD = (6,9)
) ofimage_preprocessing/camera_calibration.py
if necessary cd image_preprocessing
- Run
python camera_calibration.py
- Take 50 to 100 images of the target person with the same camera, and place those images under
image_preprocessing/data_orig/FOLDER_NAME
- Replace
FOLDER_NAME
- Replace
- Change line 20 (
w_h_ratio = 4/3
) ofimage_preprocessing/undistort_new_images.py
if necessaryw_h_ratio
is the image width height ratio
- Assign
w_new
andh_new
at line 21 and 22 ofimage_preprocessing/undistort_new_images.py
- Do this if the original image resolution is too high
- This setting will affect both COLMAP and NeRF
- Change lines 46 and 102 of
image_preprocessing/undistort_new_images.py
if there're no less than 10000 images - Run
python undistort_new_images.py
image_preprocessing/data/FOLDER_NAME
will be created, under which undistorted images and acamera.txt
will be stored.
- Keep camera scale and orientation fixed
- Keep the body of the target person as static as possible
- Take more than 70 images of the target person with full 360-degree views in an indoor environment
- Eg. take images from at least 20 uniformly distributed angles and 3 images from each angle at different camera heights
- An example is shown in Figure 3.
- Separate arms from the main body and legs from each other to reduce manual work in Blender
- If possible, make the background as simple as possible
- The photograph step is not automated at all, and it takes a couple of minutes. During that, there could be some minor body shifts, affecting the final mesh quality. In the future, I'm considering using a programmable cheap drone to automatically take pictures, but that extra hardware could cost at least $100.
- Taking a quick video is another option. I haven't tried that yet.
- My pictures were taken outdoors, and I think that was a mistake and added tons of difficulties for NeRF to converge. That's why I'm suggesting having an indoor photography environment.
- Some AI algorithms could generate masks for the human body, and those masks could be used by NeRF, but I'm not sure if that would work well robustly.
- Move
image_preprocessing/data_orig/FOLDER_NAME
to1_camera_localization/
- Create a subfolder
images
under1_camera_localization/FOLDER_NAME
, and move all images into1_camera_localization/FOLDER_NAME/images
- Create a chain of subfolders
sparse/0
under1_camera_localization/FOLDER_NAME
- Copy
1_camera_localization/sample_data_subfolder/sparse/0/project.ini
to1_camera_localization/FOLDER_NAME/sparse/0/project.ini
- Remember to change lines 4 and 5 of
project.ini
accordinglydatabase_path=WORKSPACE_ABSOLUTE_PATH/1_camera_localization/FOLDER_NAME/database.db image_path=WORKSPACE_ABSOLUTE_PATH/1_camera_localization/images
- And also line 14. The four values are
fx
,fy
,cx
,cy
, i.e.camera_matrix[0,0]
,camera_matrix[1,1]
,camera_matrix[0,2]
,camera_matrix[1,2]
.camera_params=771.0800010279605,772.711181640625,241.53466279090742,354.9183215136509
- Remember to change lines 4 and 5 of
- Run COLMAP GUI via
colmap gui
- Click
File -> Open project
and select1_camera_localization/FOLDER_NAME/sparse/0/project.ini
- Click
File -> Edit project
- Click
New
at the line of Database, navigate to1_camera_localization/FOLDER_NAME
, and create a filedatabase.db
- Click
Select
at the line ofImages
, and select1_camera_localization/FOLDER_NAME/images
- Click
Save
- Click
- Click
Processing -> Feature extraction
- Make sure that
- The camera model is
PINHOLE
Shared for all images
andShared per sub-folder
are selectedParameters from EXIF
is unchecked- Values under
Custom parameters
is correct
- The camera model is
- Click
Extract
- Make sure that
- Click
Processing -> Feature matching -> Run
- Click
Reconstruction -> Automatic reconstruction
- Assign
Workspace folder
to be1_camera_localization/FOLDER_NAME
- Assign
image folder
to be1_camera_localization/FOLDER_NAME/images
- (Optional) Select
Extreme
forQuality
- Check
Shared intrinsics
andSparse model
- Uncheck
Dense model
- Click
Run
- Assign
- Sparse reconstruction result should be visualized, and please verify the camera poses visually
- Quit COLMAP
- Click
- Change
scene_folder
at line 5 of1_camera_localization/run_gen_poses.py
toFOLDER_NAME
cd 1_camera_localization
andpython run_gen_poses.py
cd 2_NeRF_training
git clone https://github.com/kwea123/nerf_pl.git
cd nerf_pl
- Execute the following command to train NeRF
python train.py --dataset_name llff \ --root_dir ../../1_camera_localization/FOLDER_NAME \ --N_samples 512 \ --N_importance 128 --img_wh 480 720 \ --num_epochs 2 --batch_size 1024 \ --optimizer adam --lr 5e-4 \ --lr_scheduler steplr --decay_step 10 20 --decay_gamma 0.5 \ --exp_name exp \ --spheric \ --use_disp
- Remember to change
--img_wh 480 720
to be--img_wh ${w_new} ${h_new}
. Replace ${w_new} and ${h_new} with their actual values. - Fine-tune
N_samples
,N_importance
,num_epochs
, andbatch_size
according to your GPU memory. The above example roughly consumes all GPU memory of an NVIDIA T4 GPU and takes about 1 day to converge at about 1.3 epochs.
- Remember to change
- Use
https://github.com/kwea123/nerf_pl/blob/master/extract_mesh.ipynb
to extract a mesh within a certain 3D bounding box.- Note: I modified nerf_pl's
extract_color_mesh.py
slightly to reduce memory usage, and the modified file is under2_NeRF_training
. Besides, I figured out that the 3D bounding box has to be a cube, rather than a cuboid. I also attempted to relieve that constraint but didn't succeed.
- Note: I modified nerf_pl's
- Free free to try a faster version of NeRF, eg.
Instant-NGP
. - We need a better meshing policy specially designed for NeRF or its variant to improve mesh quality and mesh exporting speed, as well as reduce the memory footprint. The current extraction method is indeed a brute-force one.
- Open Blender
File -> Import -> Stanford (.ply)
to read the ply file generated in step 2- Move the mesh bottom center to the origin. Rotate and scale the mesh properly if necessary
- Select the mesh, and in
Material Properties
, selectBase Color
to beVertex Color | Color
to see a colorful mesh - At `Modifier Properties, add two modifier
Smooth
,Factor' = 1.000, and
Repeat` = 60Decimate -> Collapse
, `Ratio' = 0.025
- Export the modified mesh to a new ply file
- Save and Create a new blender project
- Import the modified mesh
- Create a proper armature and parent the mesh to the armature
- Take a look at the example at
3_preprocess_mesh_in_blender/Yukun512_7.blend
- Take a look at the example at
- Assign weights to each pose bone in the armature
- Still, please refer to
3_preprocess_mesh_in_blender/Yukun512_7.blend
- Still, please refer to
- Before smoothing, the mesh from the marching cubes has an unrealistically rough surface. Probably a better NeRF variant, eg. Neus, would provide a mesh with higher quality. And
Decimate
speeds up the subsequent processes. - Blender is an amazingly capable 3D software. I still feel that I only know a little about it. Besides, it's a little bit hard to describe its usage with words, so many substeps are documented concisely here.
- Fetch your target video, extract images from each frame, and store them under
4_mocap_and_export/images
cd 4_mocap_and_export
andpython mocap.py
blender_data.json
will be generated
- There're some AI mocap websites, and I've tried two of them. The first one charged me $48 for a 5min video, and the exported motion was smooth but missed lots of gestures. And the second one was free, but I failed to use it.
- Those websites could take 20min to process a 5min video, while mediapipe runs at 6~7 FPS. More importantly, with mediapipe, we could build real-time applications, rather than offline animations.
- In
5_mocap_data_in_blender/script.py
, change the POSE_JSON_FILE_PATH to$ABSOLUTE_WORKSPACE_PATH/4_mocap_and_export/blender_data.json
POSE_JSON_FILE_PATH = NONE
POSE_OFFSET = 300
- Smooth the keyframes by switching the
Editor Type
toGraph Editor
, andthen executekey -> smooth key
twice.
- There're a lot of comments in the
script.py
. Please read them if you are willing to understand more about the "ugly" trade-offs and tricks in the code. - There could be plenty of ways to improve the
script.py
, and in the demonstration video, you may notice some spurious behaviors, eg. sudden large body rotations.