FusionSense

More Documentation Ongoing for VLM Reasoning and Real World Experiments. This README is Still Being Actively Updated

🆕 [2024-11-15] Installation for VLM Reasoning & Active Touch Selection Updated.

🆕 [2024-10-17] Installation for Hardware Integration/3D Printing Updated.

🆕 [2024-10-15] Installation for Robotics Software Updated.

🆕 [2024-10-11] Made Public

FusionSense

[Page] | [Paper] | [Video]

This is the official implementation of FusionSense: Bridging Common Sense, Vision, and Touch for Robust Sparse-View Reconstruction

Irving Fang, Kairui Shi, Xujin He, Siqi Tan, Yifan Wang, Hanwen Zhao, Hung-Jui Huang, Wenzhen Yuan, Chen Feng, Jing Zhang

FusionSense is a novel 3D reconstruction framework that enables robots to fuse priors from foundation models with highly sparse observations from vision and tactile sensors. It enables visually and geometrically accurate scene and object reconstruction, even for conventionally challenging objects.

Preparation

This repo has been tested on Ubuntu 20.04 and 22.04. The real-world experiment is conducted on 22.04 as ROS2 Humble requires it.

Step 0: Install Everything Robotics

We used a depth camera mounted on a robot arm powered by ROS2 to acquire pictures with accurate pose information. We also used a tactile sensor for Active Touch Selection.

If you have no need for this part, feel free to jump into Step 1 for the 3D Gaussian pipeline of Robust Global Shape Representation and Local Geometric Optimization.

For installing robotics software, please see Robotics Software Installation.
For hardware integration, please see 3D Printing Instructions.

Note: ROS2 doesn't play well with Conda in general. See official doc and this issue in the ROS2 repo. As a result, in this project, ROS2 uses the minimal system Python environment and have limited direct interaction with the Python perception modules.

Step 1: Install 3D Gaussian Dependencies

We will need two independent virtual environments due to some compatibility issue.

Usage

1. Robust Global Shape Representation

a. Prepare Data

You can see here for an example dataset structure.

Note that a lot of the folders are generated during the pipeline. The data needed to start this projects are: images, realsense_depth, and transforms.json.

The ROS2 packages I shared can be used to acquire the aforementioned data. Or you can manually format your own dataset this way.

The project assume that all the folders in the HuggingFace repo are put under FusionSense/datasets/.

b. Extract Mask

If you want to let VLM classify the object, click here. If you want to manually specify the name, please read ahead.

Inside our main conda env

conda activate fusionsense

Run this script.

python scripts/VLM.py --mode partname --data_name {DATASET_NAME}

data_name: Name of the specific dataset folder. Example: transparent_bunny

Whether you got the name from VLM or not, we can proceed.

Switch your conda env first

conda activate G-SAM-2

Inside the submodule of our Grounded-SAM2

cd Grounded-SAM2-for-masking

Run the script to extract masks by setting your dataset path and object name prompt text. The prompt text ends with an '.' at the end.

You can use something you came up with, or one proposed by the VLM. In our experience, both works fine.

eg. --path /home/irving/FusionSense/dataset/transparent_bunny --text 'transparent bunny statue.'

python grounded_sam2_hf_model_imgs_MaskExtract.py  --path {ABSOLUTE_PATH} --text {TEXT_PROMPT_FOR_TARGET_OBJ}

You will see mask_imgs in the newly created /masks folder, and you can check /annotated folder to see the results more directly.

c. Select Frames

set train.txt with images id. You can pick images that have better masking for better final result. Although in our experiment we didn't cherrypick which images to use except that we want images to be relatively evenly spread out.

d. Run Pipeline

This pipeline is mostly run in Nerfstudio. You can change configs at configs/config.py First go back to our main conda environment and main folder

conda activate fusionsense

cd ..

Then we run

python scripts/train.py --data_name {DATASET_NAME} --model_name {MODEL_NAME} --load_touches {True, False} --configs {CONFIG_PATH} --verbose {True, False} --vram_size {"large", "small"}

data_name: Name of the dataset folder
model_name: Name of the model you train. It will impact the output and eval folder name. You can technically name this whatever you want.`
load_touches: Whether to load tactile data. Default=False
configs: Path to the Nerfstudio config file
verbose: False: Only show important logs. True: Show all logs. Default=False
vram_size: "large" or "small". Decides the foundation models variants used in the pipeline. Default="large"

An example using the provided data would be:

python scripts/train.py --data_name transparent_bunny --model_name 9view --configs configs/config.py --vram_size small

Render outputs

For render jpeg or mp4 outputs using nerfstudio, we recommend install ffmpeg in conda environment:

conda install -c conda-forge x264=='1!161.3030' ffmpeg=4.3.2

To render outputs of pretrained models:

python scripts/render_video.py camera-path --load_config your-model-config --camera_path_filename camera_path.json --rendered_output_names rgb depth normal

more details in nerfstudio ns-render.

Dataset Format

datasets/
    ds_name/
    │
    ├── transforms.json # need for training
    │
    ├── train.txt
    │
    ├── images/
    │   ├── rgb_1.png
    │   └── rgb_2.png
    │ 
    ├── realsense_depth/
    │   ├── depth_1.png
    │   └── depth_2.png
    │
    │── tactile/
    │   ├── image
    │   ├── mask
    │   ├── normal
    │   └── patch
    │
    ├── model.stl       # need for evaluation
    │
    ├── normals_from_pretrain/ # generated
    │   ├── rgb_1.png
    │   └── rgb_2.png
    │
    ├── foreground_pcd.ply
    │
    └── merged_pcd.ply

Outputs Format

outputs/
    ds_name/
    │
    ├── MESH/
    │   └── mesh.ply
    │
    ├── nerfstudio_models/
    │   └── 30000.ckpt
    │   
    ├── cluster_centers.npy
    │
    ├── config.yml
    │
    ├── high_grad_pts.pcd
    │
    ├── high_grad_pts_ascii.pcd
    │
    └── dataparser_transforms.json

eval/
    ds_name/ *evaluation results files*

Name		Name	Last commit message	Last commit date
Latest commit History 227 Commits
Grounded-SAM2-for-masking @ 7fd369c		Grounded-SAM2-for-masking @ 7fd369c
PartSlip		PartSlip
assets		assets
configs		configs
dn_splatter		dn_splatter
eval_utils		eval_utils
instructions		instructions
scripts		scripts
utils		utils
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
env1.yml		env1.yml
env2.yml		env2.yml
pixi.lock		pixi.lock
pixi.toml		pixi.toml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FusionSense

[Page] | [Paper] | [Video]

Preparation

Step 0: Install Everything Robotics

Step 1: Install 3D Gaussian Dependencies

Step 1.1: DN-Splatter and Metric3D

Step 1.2: Grounded-SAM-2

Step 2: Install VLM Dependencies for Active Touch Selection

Usage

1. Robust Global Shape Representation

a. Prepare Data

b. Extract Mask

c. Select Frames

d. Run Pipeline

Render outputs

Dataset Format

Outputs Format

About

Releases

Packages

Contributors 4

Languages

License

ai4ce/FusionSense

Folders and files

Latest commit

History

Repository files navigation

FusionSense

[Page] | [Paper] | [Video]

Preparation

Step 0: Install Everything Robotics

Step 1: Install 3D Gaussian Dependencies

Step 1.1: DN-Splatter and Metric3D

Step 1.2: Grounded-SAM-2

Step 2: Install VLM Dependencies for Active Touch Selection

Usage

1. Robust Global Shape Representation

a. Prepare Data

b. Extract Mask

c. Select Frames

d. Run Pipeline

Render outputs

Dataset Format

Outputs Format

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages