Prototype multi object detector and tracker suite for FRC
Provided you are on ubuntu 22.04, just use
bash install.sh
install.sh
installs everything it can from apt, creates a venv that
uses system packages, then adds a few python packages that aren't in apt repos.
Try conda install -c conda-forge gcc=12.1.0
if you get an error about GLIBCXX_3.4.30
python lumine.py --weights /path/to/openvino/weights/folder --source 0 --table --stream 5800 --draw
weights
: Path to a folder containing openvino .bin, .xml, .mapping files
source
: A name that can be passed to cv2.VideoCapture. Numbers refer to /dev/videoX devices
table
: Draw an ASCII table of the detection results
- CPU, with pytorch and OpenVINO backends.
- Intel IGPs with OpenVINO
All detectors should support a detect(img) -> list(dict)
function.
detect
is passed a 3 channel RGB image as a numpy array of shape (H, W, 3).
Detectors should not modify the image they receive, if you must modify it,
create a copy and modify that.
Detectors should produce a dictionary with at least the following for each detection instance.
type
, Which detector this originated from.id
Could be a tag family, object type. Info about what's in the bounding box.color
RGB triple for the box color. Used when drawing the feed.corners
of the bounding area of the detection- numpy array, 4 rows, 2 columns, shape = (4, 2)
corners
should be scaled to the input image: If the input is rescaled inside thedetect
function, corners must be scaled back to the original.
confidence
A confidence value between 0 and 1 for the detection.- 0 is worst, 1.0 is best. Scale the value if it's not between 0 and 1.0.
detect
should always return a list of detections. If there is only one detection, it
should still be in a list.
Several detector-specific or post-processing keys may also exist, but are not guaranteed for every object.
distance
(stereo depth estimator, apriltags)pose
(apriltags)confidence
(yolov5)sort_id
(SORT)
Most USB webcams have a few configurable parameters. To minimize motion blur we want to operate at the fastest exposure time possible, even if that doesn't always translate to more frames delivered per second.
In this context is a device number corresponding to /dev/videoX.
Here's an example command set for configuring the Logitech C310.
Pass -C
instead of -c
to read a property instead of set it.
v4l2-ctl --list-devices # Find your /dev/videoX number
v4l2-ctl --all -d device # Show all configuration properties
v4l2-ctl -d 3 -c exposure_auto=1
v4l2-ctl -d 3 -c gain=200
v4l2-ctl -d 3 -c exposure_time_absolute=250
OpenCV can configure these parameters with the cv2.CAP_PROP_*
fields,
but just from experimenting with it, it doesn't seem reliable. Setting
the gain for instance changes the number reported by v4l2-ctl
but
doesn't affect the image.
If using lumine
with network tables, pass --nt TE.AM
on the command line or pass
a full ip like 127.0.0.1
.
Lumine creates the table /lumine
and a subtable for each detector family. For instance,
lumine/yolo
and lumine/apriltags
.
Within each detector specific table, data is stored in a "struct of arrays" consisting
of an array of scalars for each property. For instance, yolo
produces...
/lumine/yolo
len: <number>
ids: [0, 4, 62, 1....]
confidence: [0.5, 0.42, 0.32, 0.12....]
To access all the properties for a one particular detection, you must request the same index from each property array.
Detections are always sorted in order of descending confidence.
This project uses OpenVINO for inference on Intel integrated graphics.
The easiest way to get the Ultralytics/YoloV5 model in OpenVINO format is to
simply request it when calling export.
python export.py --weights yolov5s.pt --include openvino
Models exported this way appear to be the standard float32 version.
The hardware may also take advantage of float16 for a performance improvement with minimal effort.
To get the model into float16 format:
python export.py --weights /path/to/your/weights.pt --include onnx --opset 12
# mo is installed with the openvino-dev package.
mo --input_model /path/to/your/weights.onnx --compress_to_fp16 True
OpenVINO produces a .xml, .bin, and .mapping
file from your model, which should be put
in its own folder. Then, passing the path to the folder to --weights
when calling lumine.
YoloV8 can be easily exported to OpenVINO weights by the Ultralytics command line util.
yolo export model=yolov8n.pt format=openvino half=true
Lumine aims to offer some choice of object detection model, depending on the coprocessor in use. The YoloV5 or YoloV8 families will be the first choice for powerful coprocessors that provide acceleration hardware like NVIDIA Jetson or Intel Iris Xe. More limited systems can benefit from a lighter model like NanoDet.
We recommend CVAT
(cvat.ai) for image labeling. CVAT is a labeling server developed by
Intel that runs a web interface that users can connect to and label images. CVAT is primarily
self-hosted, but the setup procedure is relatively low-effort thanks to its docker-compose
procedure. Some cloud based solutions are Roboflow or Supervisely, but may have certain limitations
in the free-tier.
Retraining YOLO is quite easy thanks to how complete the Ultralytics solution is. Start by exporting to the YOLO 1.1 format in CVAT. TODO
Lumine provides a script (utils/autolabel.py
) that uses an existing model to generate YOLO format
labels to be imported into CVAT. The workflow would look something like this.
- Collect and label a small dataset. For FRC teams, this could be the field walkthrough videos that FIRST releases on day one.
- Train the largest instance of YOLO your hardware can handle on the small dataset.
- Collect additional datasets. Have autolabel analyze them and generate labels.
- Import labels into CVAT and manually correct errors. Add the finished labels to the cumulative dataset
- Retrain the model with the cumulative dataset and repeat the process. As the dataset improves autolabel should produce better results.
Ideally autolabeling can progress to a state where it's practical for teams to add data collected from practice matches on the first day of competition, and produce a model with higher performance in their particular environment as quickly as possible.
- Improve mapping SORT boxes back to detect boxes. Current method allows
double assignment, and this seems bugged.
- We might want to run a single sort instance per detector type
- Add blob detector module. Kind of questionable b/c it is being phased out.
- Support streaming the augmented camera feed to the driver station.
- Test cscore module
- Add OpenVINO int8 quantization flow. Should accept the same dir hierarchy
as the yolov5 training set since we need representative images during calibration.
- int8 is only for CPUs, so maybe this is not useful. Xe IGPU cannot benefit from int8
- Support fp16 and int8 calibration for Jetson with TensorRT, and validate pytorch.
- Figure out how to request a faster update rate from NT
- Support multiple cameras and switching
- Add a timestamp to networktables API
- Support Nanodet training and inference.
- (High effort) Implement a Lumine vendor library that provides an RPC interface for querying detects rather than using network tables.
- Replace cscore with a gstreamer solution for HW accelerated temporal codecs.
- Also need to provide some sort of driver station client or dashboard plugin
This project uses black
because it's easy.