Sizhuo Ma1, Jian Wang1, Wenzheng Chen3, Suman Banerjee2, Mohit Gupta2, Shree Nayar1
1Snap Inc., 2University of Wisconsin-Madison, 3University of Toronto
This is the official implementation of QfaR, a location-guided QR code scanner. The complete pipeline of the proposed method is described as follows: When the user wants to scan a code, the app will take a picture of the current scene. The captured image is sent to a QR code detector which crops out only the part of the image that contains the code. Simultaneously, the GPS location of the user will be sent to a database and look up a list of QR codes within the vicinity. The scanned code image is then matched to the list of codes using the intensity-based matching described above. In this repo, we generate random QR codes as our candidated codes.
We first use a YOLO network to detect a bounding box of the QR code in the scene. We then use a key point detection network to predict a heat map for potential key points (3 corners) The fourth corner is predicted by assuming the four key points form a parallelogram (weak-perspective projection). A homography is computed to transform the paralellogram to a square (rectification) such that codes captured at different viewing angles can be matched directly. Both networks are trained with simulation data with physics-based imaging models.
How to find the correct code from the pruned list? Conventional decoders apply thresholding to the captured degraded code image to get a binary code, which contains lots of error bits and therefore cannot be decoded. Although the captured code image is heavily degraded, it still contains visual features such as blobs of black/white bits. Therefore, we propose to treat these captured code as "images" and match based on their intensities. Specifically, we find the candidate code D with shortest L2 distance to scanned code Im (template matching) D m = argmin D ∈ S d L 2 ( I m , D ) . Please refer to the paper for more insights and reasoning behind this design.
git clone https://github.com/snap-research/qfar.git
cd qfar
conda env create -f environment.yml
Notice this environment only contains CPU-only pytorch libraries. Please modify environment.yml accordingly if you need GPU inference.
Tested on:
- python=3.8
- pytorch=2.0.0
- opencv=4.8
Download pretrained models at https://www.dropbox.com/scl/fo/3jfd836ax7evte48d1tl6/h?rlkey=q2g4by9zxddzgfgq9rasiyhnu&dl=1 and place the checkpoints under ./aligner and ./detector.
conda activate qfar
python example_pipeline.py
This will run the pipeline described above (detector, aligner, decoder) on test images in the data/ folder. Please take a look at the code for detailed usage.
Output will be stored to results.txt. Below is an example.
IMG ID: 2
Matched code: 0
Matched ratio: 0.448256
Time for processing image 2: 0.015902
- IMG ID: ID of the test image
- Matched code: Index of the matched code. In the test examples, we always assume the ground truth code has an index of 0.
- Matched ratio: Confidence value of this match
Sizhuo Ma (sizhuoma@gmail.com)
@inproceedings{ma2023qfar,
author = {Ma, Sizhuo and Wang, Jian and Chen, Wenzheng and Banerjee, Suman and Gupta, Mohit and Nayar, Shree},
title = {QfaR: Location-Guided Scanning of Visual Codes from Long Distances},
year = {2023},
booktitle = {Proceedings of the 29th Annual International Conference on Mobile Computing and Networking},
articleno = {4},
numpages = {14},
series = {MobiCom '23}
}