This repository contains the code of a case study where we have combined multiple technologies to study the how we can leverage NVidia Omniverse™, ROS 2, PyTorch and the great olixVision™ Camera to simulate and run a system that detects fruit. There are multiple applications for a system like this, ranging from agriculture to industrial automation applications. This case study shows how the simulation and systhetic data generation processes can be used to model a camera sensor, generate datasets for ML model training and evaluation as well as the real life integration into a real and functional system.
The setup built for this system can be seen in the following picture:
Connected to the camera, there is a Lenovo Legion Pro 7i 16, with an Intel® Core™ i9-14900HX, 32GB of RAM, 2TB of disk, NVidia GeForce RTX 4090 GPU, which was used for development. Requirements for running the system are less than for development and training. See the following sections for more information.
- Docker
- Ubuntu 20.04 / 22.04
- NVIDIA GPU GeForce RTX 3070 or higher.
- NVIDIA GPU Driver
- NVIDIA Container Toolkit
- NVIDIA Omniverse™ Launcher
- NVIDIA Omniverse™ Nucleus
We recommend reading this article from NVIDIA Omniverse™ which explains the basic configuration.
NOTE: this project is disk savvy, make sure to have tens of GBs (~50GB) available of free disk space.
This projects uses pre-commit hooks for linting. To install and make sure they are run when committing:
python3 -m pip install -U pre-commit
pre-commit install
If you want to run the linters but still not ready to commit you can run:
pre-commit run --all-files
Please, consider visiting Olive Robotics web page to learn more about their portfolio. In particular, you can refer to the olixVision™ Camera for further details and the specifications of the camera.
The olixVision™ Camera provides a ROS 2 native interface which allows to quickly integrate it to any ROS 2 enabled application. The necessary steps to connect to it are described in the documentation page. Navigate through the Hardware > Camera menu titles to access all the details about it. For a quick setup, we recommend following these steps:
- Connect the USB cable to the camera
- Connect the USB cable to your computer
- Configure the network interface
The camera will create an Ethernet over USB driver. It comes with a static IP configured, which it is 192.168.7.2 in our case, and then one needs to validate the new network interface appears in the network manager and configure the computer interface accordingly. For us, that is:
- IP: 192.168.7.100
- Netmask: 255.255.255.0
- Gateway: N/A
- Static IP configuration
- Verify you can access the web dashboard
Head to your favourite browser and type the IP of the camera in the URL. You should be able to see the camera is up when it displays a dashboard similar to the following:
Please look into the oficial documentation to learn more about other ways in you can use the camera.
Within the docker directory you will find a docker compose file, the dockerfiles for each image and some custom configuration files. The system relies on using profiles to select which set of services build and run depending on the workflow. The following sections explain how to deal with them.
The available profiles are:
training
: trains a fasterrcnn_resnet50_fpn model based on a synthetic dataset.detection
: loads the detection stack.visualization
: loads RQt to visualize the input and output image processing.webcam
: loads the usb_cam driver that makes a connected webcam to publish. Useful when the olixVision™ Camera is not available.simulation
: loads the simulation NVIDIA Omniverse™.dataset_gen
: generates a training dataset using NVIDIA Omniverse™.
Compound profiles are:
olive_pipeline
: loadsvisualization
anddetection
, expects the olixVision™ Camera to be connected.
graph TD
A[olixVision™ Camera] --> B[Fruit Detection Node]
B[Fruit Detection Node] --> C[RQt Visualization]
A[olixVision™ Camera] --> C[RQt Visualization]
webcam_pipeline
: loadswebcam
,visualization
anddetection
.
graph TD
A[Webcam] --> B[Fruit Detection Node]
B[Fruit Detection Node] --> C[RQt Visualization]
A[Webcam] --> C[RQt Visualization]
simulated_pipeline
: loadssimulation
,visualization
anddetection
.
graph TD
A[NVIDIA Omniverse™] --> B[Fruit Detection Node]
B[Fruit Detection Node] --> C[RQt Visualization]
A[NVIDIA Omniverse™] --> C[RQt Visualization]
Testing profiles are:
training_test
: runs the tests for the training stack.detection_test
: runs the tests for the detection stack.
To build all the docker images:
docker compose -f docker/docker-compose.yml --profile "*" build
NOTE: there will be a warning as follows, we will take care of that environment variable later on.
WARN[0000] The "DATASET_NAME" variable is not set. Defaulting to a blank string.
WARN[0000] The "DATASET_NAME" variable is not set. Defaulting to a blank string.
It generates a dataset with 300 annotated pictures where many scene conditions are randomized such as lighting and object pose.
To generate a new dataset:
docker compose -f docker/docker-compose.yml --profile dataset_gen up
The following .gif video shows pictures where the ground plane conditions color is randomized having a better dataset for the simulation.
And once it finishes (note the scene does not evolve anymore) check the generated folder under isaac_ws/datasets/YYYYMMDDHHMMSS_out_fruit_sdg
where YYYYMMDDHHMMSS
is the stamp of the dataset creation.
Typically 300 images are not enough. For a quick iteration it is recommended to go with 300 images to validate everything works as expected. When running the system for training a model more seriously you can go up to five thousands and spend between 15 and 20 minutes in doing so. You can quickly change the number of images to generate by editing ./isaac_ws/simulation_ws/conf/config.yml
and look for the NUM_FRAMES
item.
To train a model you need a NVIDIA Omniverse™ synthetic dataset built in the previous step. You first need to set up the following environment variable:
export DATASET_NAME=YYYYMMDDHHMMSS_out_fruit_sdg
Then you can run the training using the training profile:
docker compose -f docker/docker-compose.yml --profile training up
After the training ends, a model.pth
file will be available inside model
. Additionally, you will notice that the dataset files were organized in different folders based on their extension. To test the model you can run:
docker compose -f docker/docker-compose.yml --profile training_test up
This will evaluate every image in the DATASET_NAME
and generate annotated images in the model/test_output
folder.
The logs generated when training a model are stored in the model/runs
folder and they can be visualized using the profile training_vis
. This profile runs Tensorboard over localhost:6006
, and can be accessed via a web browser. To run the Tensorboard server, execute:
docker compose -f docker/docker-compose.yml --profile training_vis up
To run the system you need to define which profile(s) to run. You can pile profiles by adding them one after the other to have a custom bring up of the system (e.g.--profile detection --profile visualization
).
To load the system with the olixVision™ Camera, detection and the visualization in RQt, you can do the following:
-
Connect the camera to the USB port of your computer.
-
Assuming you have already built a detection model, run the following command:
docker compose -f docker/docker-compose.yml --profile olive_pipeline up
-
Verify you can see in the camera input and the processed images in RQt.
-
To stop the system you can Ctrl-C or from another terminal call:
docker compose -f docker/docker-compose.yml --profile olive_pipeline down
To load the system with a webcam, detection and the visualization in RQt, you can do the following:
-
Connect the camera to your computer or make sure the integrated webcam is working on your laptop.
-
Assuming you have already built a detection model, run the following command:
docker compose -f docker/docker-compose.yml --profile webcam_pipeline up
-
Verify you can see in the camera input and the processed images in RQt.
-
To stop the system you can Ctrl-C or from another terminal call:
docker compose -f docker/docker-compose.yml --profile webcam_pipeline down
NOTE: In case the default configuration is not appropriate for you hardware setup, please review the documentation of the usb_cam package. In particular, it is typical to find setups with multiple video devices, and to pick the one you want, you need to explicitly configure it in camera_params.yml. Here you can find a reference configuration file. To make sure your changes are effective, build the camera profile before running it again:
docker compose -f docker/docker-compose.yml --profile webcam build
To load the system with the simulation, detection and the visualization in RQt, you can do the following:
- Assuming you have already built a detection model, run the following command:
docker compose -f docker/docker-compose.yml --profile simulated_pipeline up
-
Verify you can see in the camera input and the processed images in RQt as well as having the simulator window up.
-
To stop the system you can Ctrl-C or from another terminal call:
docker compose -f docker/docker-compose.yml --profile simulated_pipeline down
The following detection node parameters are exposed and can be modified via rqt_gui when running any pipeline. Please note that values set outside of their range will simply be discarded.
The parameters bbox_min_x
, bbox_min_y
, bbox_max_x
, bbox_max_y
are measured in pixels. bbox_min_x
and bbox_max_x
can take values between 0 and 640. bbox_min_y
and bbox_max_y
can take values between 0 and 480. They can be used to filter the inferences based on the size of the bounding boxes generated.
The parameter score_threshold
takes values between 0.0 and 1.0. It can be used to filter inferences based on their confidence values.
It's always useful to record rosbags to try the system outside the laboratory or to share data among teammates.
- Build the rosbag profile:
docker compose -f docker/docker-compose.yml --profile rosbag build
- Run the service:
docker compose -f docker/docker-compose.yml --profile rosbag up
- Attach a console to it and source ROS 2 installation:
docker exec -it rosbag bash
source /opt/ros/humble/setup.bash
- With a running system (see any of the aforementioned run ways), record the olixVision™ Camera:
ros2 bag record -s mcap -o my_rosbag /olive/camera/id01/image/compressed
Press Ctrl-C to stop recording.
- You can try to play it back now:
ros2 bag play my_rosbag
docker compose -f docker/docker-compose.yml --profile detection_test build
Issues or PRs are always welcome! Please refer to CONTRIBUTING document.
The free software code of conduct fosters an inclusive, respectful community for contributors by promoting collaboration and mutual respect. For more details, refer to the full document Code of Conduct.
- How do I clean up all the docker resources?
Your good old friend docker system prune
and the more agressive docker system prune --all
. Caution: it will likely erase stuff you didn't want to erase as it is a blanket prune. Read the documentation for more information.
- Do you have problems with XWindows?
xhost +si:localuser:root
- Running the NVIDIA Omniverse™ together with Google Chrome yields errors when opening the simulator. What do I do?
We faced some situations in which precedence of access to the GPU yields to race conditions between these two programs. One possible solution is to do the following:
- Close all Google Chrome related processes.
- Try to open the simulator using one of the provided instructions in the readme.
- Verify that you can open the simulator, otherwise, perhaps you need to reboot your system :/ and try again.
- Detection calibration
You may need to further calibrate the detection node post-training. This is usually done considering the ambient conditions. We offer two parameters via dynamic reconfigure to be used. Refer to the "Parameter tuning" section for further details.