Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a rosbag anonymizer tool #4551

Closed
4 of 9 tasks
xmfcx opened this issue Mar 19, 2024 · 18 comments
Closed
4 of 9 tasks

Create a rosbag anonymizer tool #4551

xmfcx opened this issue Mar 19, 2024 · 18 comments
Assignees
Labels
component:perception Advanced sensor data processing and environment understanding. type:new-feature New functionalities or additions, feature requests.

Comments

@xmfcx
Copy link
Contributor

xmfcx commented Mar 19, 2024

Checklist

  • I've read the contribution guidelines.
  • I've searched other issues and no duplicate issues were found.
  • I've agreed with the maintainers that I can plan this task.

Description

The Autoware Foundation seeks to develop a tool that anonymizes camera data within rosbags, specifically targeting the blurring of faces and license plates to maintain privacy. This initiative aims to enable the secure sharing of rosbags containing camera data amongst member companies and the wider community.

Purpose

The primary goal is to ensure the privacy of individuals captured in camera data shared within the Autoware ecosystem. By creating a tool that can anonymize sensitive information in rosbags, we facilitate a safer, privacy-compliant exchange of data that can be used for research, development, and testing of autonomous vehicle technologies.

Possible approaches

  • Fork the autodistill project to the Autoware Foundation's repository, as it is under the Apache 2.0 license and could serve as a starting point.
  • Manually test the SAM model for its effectiveness in anonymizing faces and license plates within camera data.
  • Based on the results, develop a tool (either as a standalone application or integrated within the autodistill project) that inputs a rosbag and outputs an anonymized version of the same data, with faces and license plates blurred.

Definition of done

  • The autodistill project is successfully forked to the Autoware Foundation repository.
  • Some test data from people with consent are uploaded for testing.
  • The SAM model's effectiveness in blurring faces and license plates is manually tested and confirmed.
  • A decision is made on whether the anonymization tool will be standalone or integrated into autodistill.
  • The tool is developed and tested to ensure it accurately anonymizes faces and license plates in rosbag camera data.
  • Documentation is created to guide users on how to use the tool.
@xmfcx xmfcx added the component:perception Advanced sensor data processing and environment understanding. label Mar 19, 2024
@xmfcx xmfcx added the type:new-feature New functionalities or additions, feature requests. label Mar 19, 2024
@xmfcx
Copy link
Contributor Author

xmfcx commented Apr 17, 2024

@StepTurtle
Copy link

In last situation we decide won't use autodistill anymore because it don't have any additional things from original DINO and SAM for us.

Instead of this, we used Grounding DINO and SAM from original repositories and we added a image classification method OpenCLIP to validate Grounding DINO results. The working scheme is as follows:

Project Link: https://github.com/leo-drive/rosbag2_anonymizer

system

  • The tool reads the images from a ROS2 bag file and feeds them to the Grounding DINO with original images. Grounding DINO takes an image and a list of prompts as input and finds the objects described by the prompts in the image.
  • Grounding DINO cannot find all objects properly, so we validate objects with the image classification model OpenCLIP. OpenCLIP takes objects and a list of prompts as input and returns a matching score for the object and prompts.
  • After validating the bounding box results, SAM segments all objects and finds masks. Then, the segmentation results are blurred and written to a new bag file.

Additional of these things, we want to add one more validation part. The new validation method will check whether some objects should be inside other objects or not. For example, a license plate should be inside of the car but should not be inside of human.

Also I will add detailed outputs, for now you can check this bag file which anonymized with our tool:

Definition of done

  • Add new validation part which checks the object positions
  • Add detailed outputs

@xmfcx
Copy link
Contributor Author

xmfcx commented Apr 24, 2024

@StepTurtle could you test https://github.com/knzo25/rosbag2_language_sam with the same data and compare them?

cc. @knzo25

I'm expecting the comparison to be a playback of the anonymized rosbag camera image which as a video shared here.

@StepTurtle
Copy link

StepTurtle commented Apr 24, 2024

@StepTurtle could you test https://github.com/knzo25/rosbag2_language_sam with the same data and compare them?

cc. @knzo25

I'm expecting the comparison to be a playback of the anonymized rosbag camera image which as a video shared here.

Here is the results:

In the video:

Left rqt window shows this tool: https://github.com/leo-drive/rosbag2_anonymizer

  • 🔴 red boxes represents license plate
  • 🟢 green boxes represents human faces

Right rqt window shows this tool: https://github.com/knzo25/rosbag2_language_sam

  • 🟣 purple boxes represents license plate
  • 🟢 green boxes represents cars

@StepTurtle
Copy link

StepTurtle commented Apr 26, 2024

Additionally, a validation component has been added to https://github.com/leo-drive/rosbag2_anonymizer to verify the object positions. You can view the results here:

Do you have any ideas or suggestions on what we can do in the upcoming stages?

@xmfcx
Copy link
Contributor Author

xmfcx commented Apr 26, 2024

I can read the text, blur is not enough.

There are so many places where the plates are not blurred well enough.

What happens if you look for license plates with low score threshold and if the plate is inside the vehicle for validation?

Blurring classes

Classes for the license plate detection

Parent classes

car
bus
truck
minibus
motorcycle
trailer
utility vehicle
tractor
golf cart
semi-truck
moped
scooter

Child class

license plate

Classes for the pedestrian face detection

Parent classes

person
child

Child class

human face

@StepTurtle
Copy link

StepTurtle commented Apr 26, 2024

@xmfcx

I can read the text, blur is not enough.

There are so many places where the plates are not blurred well enough.

I changed the blur parameters, I guess it is okey now.


For this question following schema could be helpful

system

The first step of validation involves running OpenClip. OpenClip will return results similar to the following:

  • Assuming you have input prompts such as: ["license plate", "car", "face"]
  • The output will look like this: [0.95, 0.4, 0.1]

If the score for the corresponding label is greater than 0.9, it will be selected as valid.

In the second validation step, we verify whether the label is inside of the parent. If it resides within one of the parent categories, it must satisfy one of the following conditions:

  • Is the score for the corresponding label the highest among the scores?
  • Is the score greater than 0.3?

What happens if you look for license plates with low score threshold and if the plate is inside the vehicle for validation?

For your example, license plate must have score greater than 0.3 or the highest score for the corresponding label the highest among the scores.

@xmfcx
Copy link
Contributor Author

xmfcx commented Apr 26, 2024

binary_classification

photo attribution from unsplash

My problem is with the false negatives, also known as, missed detections.

Does your proposal reduce FNs?

@StepTurtle
Copy link

StepTurtle commented Apr 27, 2024

When we implemented this proposal, it didn't have a direct impact on FNs, but it allowed us to lower DINO threshold.

By reducing DINO threshold, we're able to detect more objects, including some that were previously classified as FNs. Also reducing DINO threshold will return a lot of FP and we aim to determine these FPs with proposal

@mitsudome-r
Copy link
Member

@StepTurtle
We can put the repository under AWF GitHub organization.
Please make sure that you are not violating the license term of all the codes/models that you used.

@StepTurtle
Copy link

@StepTurtle We can put the repository under AWF GitHub organization. Please make sure that you are not violating the license term of all the codes/models that you used.

@mitsudome-r @xmfcx we forked repository couple time ago.

But currently, I don't have write access. Could you give me a access to this repository? I believe I can create PRs, but I would prefer to push directly to the main branch as there might not be anyone to review for now. If this isn't acceptable, I'll create a PR whenever I need to update the code.

@StepTurtle
Copy link

@StepTurtle
Copy link

I am sharing the videos which shows the current results:

After labeling and training YOLOv8, we combined YOLOv8 and DINO to find bounding boxes and results improved.

@StepTurtle
Copy link

Hi @xmfcx,

The tool have usage instructions in the project README. Should we also add a user guideline for the tools in the Autoware documentation. And instruction for how to publish new public dataset with Autoware community.

@xmfcx
Copy link
Contributor Author

xmfcx commented May 24, 2024

@StepTurtle under here:
https://autowarefoundation.github.io/autoware-documentation/main/datasets/

it would be nice to have a separate page, dedicated to data anonymization.

@mitsudome-r
Copy link
Member

@mitsudome-r will find someone to test this tool.

@NilaySener
Copy link

Hi @StepTurtle, first of all, thank you for the tool you prepared. I had the opportunity to test the tool and I would like to give feedback about it. The tool version of I used:


I used the tool to anonymize the data I collected in our autonomous test vehicle. You can find detailed information about the vehicle and system here:

ECU of Test Vehicle

The ECU specs of our test vehicle are as follows:

Complement Product
CPU AMD Ryzen Threadripper PRO 3975WX 32-core, 64-thread
Memory 256 GB RAM
GPU 3x NVIDIA RTX A4000 (operations are performed on a single GPU)

Anonymizing The Data

I anonymized the image in the bag file in the system that has the features I mentioned above. You can find the information about the bag file below:

Property Value
Bag size 3.3 GiB
Storage id sqlite3
Duration 116.724427755s
Total Messages 939565
Total Number of Topics 314
Image Message Rate ~10 Hz
Image Message Type sensor_msgs/msg/CompressedImage
Image Message Count 1101
Image Resolution (height x width) 1860 x 2880

While anonymizing the data I provided above with the tool, the whole process took approximately 1 hour and 55 minutes. When I observed the approximate GPU usage with the nvidia-smi command throughout the process, I got the following result:

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      4498      G   /usr/lib/xorg/Xorg                          300MiB |
|    0   N/A  N/A     31173      G   ...seed-version=20240904-180241.692000      186MiB |
|    0   N/A  N/A    313763      C   python3                                    6830MiB |
|    1   N/A  N/A      4498      G   /usr/lib/xorg/Xorg                            4MiB |
|    2   N/A  N/A      4498      G   /usr/lib/xorg/Xorg                            4MiB |
+---------------------------------------------------------------------------------------+
Fri Sep  6 15:17:27 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.08              Driver Version: 545.23.08    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA RTX A4000               On  | 00000000:2E:00.0  On |                  Off |
| 71%   89C    P2              68W / 140W |   7416MiB / 16376MiB |    100%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA RTX A4000               On  | 00000000:41:00.0 Off |                  Off |
| 47%   66C    P8              17W / 140W |     13MiB / 16376MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   2  NVIDIA RTX A4000               On  | 00000000:61:00.0 Off |                  Off |
| 49%   67C    P8              20W / 140W |     13MiB / 16376MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+


Results and Observations

You can find images of anonymized data in this video:

In conclusion, as can be observed from the video, the anonymization results are good enough. But the anonymization process took about 1 hour and 55 minutes. Considering the 116s total bag duration, this process time is not short and during this time, GPU usage was quite high.

@mitsudome-r
Copy link
Member

@StepTurtle @xmfcx I have approved and merged the PR to Autoware Documentation. autowarefoundation/autoware-documentation#557

Should we close this issue?
If we want to do additional task from Nilay's feedback, we could consider creating a follow up issue (something like "make rosbag anonymizer tool faster")

@xmfcx xmfcx closed this as completed Sep 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:perception Advanced sensor data processing and environment understanding. type:new-feature New functionalities or additions, feature requests.
Projects
Status: Done
Development

No branches or pull requests

4 participants