Authors: Boying Li, Danping Zou, Yuan Huang, Xinghan Niu, Ling Pei and Wenxian Yu.
🏠 [Project] 📝 [Paper] ➡️ [Dataset] 🔧 [Extra Evaluation Tool]
Motivation:
⭐ TextSLAM is a novel visual Simultaneous Localization and Mapping system (SLAM) tightly coupled with semantic text objects.
💡 Humans can read texts and navigate complex environments using scene texts, such as road markings and room names. why not robots?
⭐ TextSLAM explores scene texts as the basic feature both geometrically and semantically. It achieves superior performance even under challenging environments, such as image blurring, large viewpoint changes, and significant illumination variations (day and night).
This repository provides C++ implementation of TextSLAM system.
Our accompanying videos are now available on YouTube (click below images to open) and Bilibili1-outdoor, 2-night, 3-rapid.
⭐ Please consider citing the following papers in your publications if the project helps your work.
@article{li2023textslam,
title={TextSLAM: Visual SLAM with Semantic Planar Text Features},
author={Li, Boying and Zou, Danping and Huang, Yuan and Niu, Xinghan and Pei, Ling and Yu, Wenxian},
booktitle={IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)},
year={2023}
}
@inproceedings{li2020textslam,
title={TextSLAM: Visual SLAM with Planar Text Features},
author={Li, Boying and Zou, Danping and Sartori, Daniele and Pei, Ling and Yu, Wenxian},
booktitle={IEEE International Conference on Robotics and Automation (ICRA)},
year={2020}
}
Download the dataset from TextSLAM Dataset.
TextSLAM is run in Ubuntu 16.04. It should be easy to compile in other Linux system versions.
1.1 Ceres & Eigen3
Refer to Ceres for installing it in Linux.
During the above process, Eigen3 is also installed at the same time.
1.2 OpenCV
We use OpenCV 3.3.1 for image processing.
You can use the OpenCV library provided by ROS. Remember to set OpenCV_DIR in CMakeLists.txt using set(OpenCV_DIR [ros_direction]/share/OpenCV-3.3.1-dev)
.
You can also refer to OpenCV to download and install the library.
1.3 EVO (Evaluation)
EVO is used for SLAM results evaluation. Refer to EVO to install this evaluation tool.
2.1 Clone the repository and build the project:
git clone https://github.com/SJTU-ViSYS/TextSLAM.git
mkdir build
cd build
cmake ..
make -j
Above procedure will create an executable named TextSLAM.
2.2 Run TextSLAM with:
./TextSLAM [yaml_path]/[yaml_name].yaml
We provide yaml files (GeneralMotion.yaml
,AIndoorLoop.yaml
, LIndoorLoop.yaml
, Outdoor.yaml
) for our 4 kinds of experiments.
Write your sequence save path in 'Exp read path:' of the yaml file.
Refer to TextSLAM Dataset for a detail yaml file structure.
2.3 Output:
keyframe_latest.txt
will output to record each keyframe pose estimation results in the current station.
keyframe.txt
will output when finishing a sequence.
Both keyframe_latest.txt
and keyframe.txt
are in TUM format with timestamp tx ty tz qx qy qz qw
.
We use EVO to evaluate the SLAM performance.
For APE evaluation:
evo_ape tum gt.txt text.txt -va -s
For RPE evaluation at the uint of 1.0 m:
evo_rpe tum gt.txt text.txt -va -s --pose_relation trans_part -d 1.0 -u m
For the loop tests in a large indoor scene, add --n_to_align XX
to align the first XX pose of the whole trajectory.
Because GT for this sequence is only at the beginning and the end, using the alignment for the first poses will get the more correct results.
evo_ape tum gt.txt text.txt -va -s --n_to_align XX
evo_rpe tum gt.txt text.txt -va -s --pose_relation trans_part -d 1.0 -u m --n_to_align XX
❗ ATTENTION for RPE evaluation: ❗
EVO does not automatically rectify the misalignment between the SLAM body frame and the ground-truth body, which influences RPE results.
To solve this problem, we provide an extra Evaluation tool for TextSLAM dataset, which also served as a supplement for EVO.
Following the instruction of the extra Evaluation tool to first obtain the updated pose ground truth file, and then use the updated GT file to evaluate the RPE results.
This step is necessary for all data except outdoor sequences. We use COLMAP to generate outdoor sequences' ground truth, which generates the same ground truth frame as the SLAM estimated body frame.
The authors thank ORB-SLAM, DSO, and AttentionOCR for their excellent works. The authors thank EVO for providing this convenient evaluation tool. The authors thank Ceres for providing this powerful optimization library.