Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sv changes bbox shape for object detection with YOLOv8?? #1362

Open
1 task done
abelBEDOYA opened this issue Jul 15, 2024 · 10 comments
Open
1 task done

sv changes bbox shape for object detection with YOLOv8?? #1362

abelBEDOYA opened this issue Jul 15, 2024 · 10 comments
Labels
question Further information is requested

Comments

@abelBEDOYA
Copy link

Search before asking

  • I have searched the Supervision issues and found no similar feature requests.

Question

I've been using supervision, its tracker, annotators, ... Nice work!! However I've noticed that, doing object detection with yolov8, bboxe shape from ultralytics are changed by supervision even though it refers to the same detection. The following screenshot shows a detected object provided by YOLO, ultralytics.Result (before doing supervision_tracker.update(results[0]) and after parsing it to supervision_tracker.

Screenshot from 2024-07-15 12-16-53
The bboxes are diferent. I expect they shouldn't...

Can this bbox shape change be removed? I would like to keep original bbox shape.

Thanks!!

Additional

No response

@abelBEDOYA abelBEDOYA added the question Further information is requested label Jul 15, 2024
@LinasKo
Copy link
Contributor

LinasKo commented Jul 15, 2024

Hi @abelBEDOYA 👋

Could you share a short snippet of the code, with the print statements?

Also, to clarify, which of these are you measuring the difference between?

  1. ultralytics result (result.boxes.xyxy)
  2. Detections, as created by from_ultralytics
  3. Detections, as updated by tracker.update_with_detections

@abelBEDOYA
Copy link
Author

abelBEDOYA commented Jul 15, 2024

Here is the code. It just open webcam with cv2 and runs callback() parsing last frame, which infers and tracks:

import numpy as np
import supervision as sv
from ultralytics import YOLO
import torch

model = YOLO("yolov8n.pt")
tracker = sv.ByteTrack()
box_annotator = sv.BoundingBoxAnnotator()
label_annotator = sv.LabelAnnotator()

def callback(frame: np.ndarray, _: int) -> np.ndarray:
    results = model(frame)[0]
    print('YOLO bbox: ', results.boxes.cpu().xyxy[0] if len(results.boxes.cpu().xyxy)>0 else [])
    detections = sv.Detections.from_ultralytics(results)
    detections = tracker.update_with_detections(detections)
    print('bbox from tracker sv: ', torch.tensor(tracker.tracked_tracks[0].tlbr).cpu())
    print('\n \n ')
    labels = [
        f"#{tracker_id} {results.names[class_id]}"
        for class_id, tracker_id
        in zip(detections.class_id, detections.tracker_id)
    ]

    annotated_frame = box_annotator.annotate(
        frame.copy(), detections=detections)
    return label_annotator.annotate(
        annotated_frame, detections=detections, labels=labels)

import cv2

# Abre la webcam (0 es el índice por defecto de la cámara)
cap = cv2.VideoCapture(0)

# Verifica si la cámara se abrió correctamente
if not cap.isOpened():
    print("Error: No se puede abrir la cámara")
    exit()

while True:
    # Captura frame por frame
    ret, frame = cap.read()

    # Si no se recibió el frame correctamente, sal del loop
    if not ret:
        print("Error: No se puede recibir frame (stream end?). Saliendo ...")
        break

    img = callback(frame, 0)
    # # Muestra el frame resultante
    cv2.imshow('Webcam', img)

    # Presiona 'q' para salir del loop
    if cv2.waitKey(1) == ord('q'):
        break

# Cuando todo esté listo, libera el capture
cap.release()
cv2.destroyAllWindows()

These are the "key" lines:
Screenshot from 2024-07-15 13-12-25

The output bbox have change (YOLO vs SV):
Screenshot from 2024-07-15 13-11-54

@LinasKo
Copy link
Contributor

LinasKo commented Jul 15, 2024

Curious. Thanks for letting us know - we'll test it.

@rolson24
Copy link
Contributor

@abelBEDOYA,
This is interesting, what version of supervision are you using? I seem to remember this was an issue we fixed a few months ago, but it may not be working correctly.

@abelBEDOYA
Copy link
Author

$ pip show supervision
Name: supervision
Version: 0.21.0
Summary: A set of easy-to-use utils that will come in handy in any Computer Vision project
Home-page: https://github.com/roboflow/supervision
Author: Piotr Skalski
Author-email: piotr.skalski92@gmail.com
License: MIT
Location: /home/faraujo/anaconda3/lib/python3.9/site-packages
Requires: defusedxml, matplotlib, numpy, opencv-python-headless, pillow, pyyaml, scipy
Required-by: 

@rolson24
Copy link
Contributor

Hmm, the latest release is 0.22.0, please try the latest one and see if it helps. In the meantime I will test your code.

@rolson24
Copy link
Contributor

rolson24 commented Jul 16, 2024

Hi @abelBEDOYA,

I think I know what your problem is. It looks like you are printing the bounding box stored in the tracked object in this line

print('bbox from tracker sv: ', torch.tensor(tracker.tracked_tracks[0].tlbr).cpu())

This prints the internal bounding box that the tracker is using and which is associated with location and size velocities within the tracker and may be different than the actual bounding box from the most recent frame. If you want the precise bounding box from detector that is associated with that track, you will want to get the bounding box from the Detections object returned by tracker.update_with_detections(). This object contains the original bounding boxes from the detector associated with a tracker id.
So if you wanted to print those bounding boxes, you would change the line to be

print('bbox from tracker sv: ', detections.xyxy[0])

@LinasKo
Copy link
Contributor

LinasKo commented Jul 16, 2024

I just wanted to take some time to say thanks, @rolson24. The tracker issues have been plaguing us for a while, and we've not had much time to look at it. We really appreciate you helping out!

@abelBEDOYA
Copy link
Author

Okey! Thanks @rolson24! I also take this opportunity to ask you about the detection and track association.

My point is, I start with ultralytics Result object which contains detections. I parse them to detections = sv.Detections.from_ultralytics(results) and then detections = tracker.update_with_detections(detections). There are some atributes that ultralytics Results can have like keypoints and segmentation. I would like to associate those yolo detections with the sv tracks in order to give them an id_tracking. That the reason I was comparing bboxes between yolo detections and supervision detections. The association is not a 1to1 because, for example, not always the number of yolo detections is the same of sv ones.

How can this association be done?

Thanks again!

@rolson24
Copy link
Contributor

If you use the detections returned from tracker.update_with_detections(detections) and the Detections object has segmentation masks, then the segmentation masks from the model will be retained and have a tracker_id assigned to them.

Unfortunately, the tracker does not support Keypoints right now. From what you are describing, it sounds like you would want to use a yolo-pose model which returns bboxes and keypoints, and you would want to track the objects. This may be something we add, but for now I have a somewhat hacky idea of how you may be able to do this:

results = model(frame, imgsz = 1280,verbose=False)[0]
pre_track_detections = sv.Detections.from_ultralytics(results)
keypoints = sv.KeyPoints.from_ultralytics(results)
post_track_detections = byte_tracker.update_with_detections(pre_track_detections)

pre_track_bounding_boxes = pre_track_detections.xyxy
post_track_bounding_boxes = post_track_detections.xyxy

ious = sv.tracker.byte_tracker.matching.box_iou_batch(pre_track_bounding_boxes, post_track_bounding_boxes)
iou_costs = 1 - ious
matches, _, _ = sv.tracker.byte_tracker.matching.linear_assignment(iou_costs, 0.5)

post_track_keypoints = sv.KeyPoints.empty()

post_track_keypoints.xy = np.empty((len(post_track_detections), keypoints.xy.shape[1], 2), dtype=np.float32)
post_track_keypoints.class_id = np.empty((len(post_track_detections), keypoints.xy.shape[1]), dtype=np.float32)
post_track_keypoints.confidence = np.empty((len(post_track_detections), keypoints.xy.shape[1]), dtype=np.float32)
post_track_keypoints.data = keypoints.data

for i_detection, i_track in matches:
    post_track_keypoints.xy[i_track] = keypoints.xy[i_detection]
    post_track_keypoints.class_id[i_track] = keypoints.class_id[i_detection]
    post_track_keypoints.confidence[i_track] = keypoints.confidence[i_detection]

This will make it so that the keypoints in post_track_keypoints have the same index as their corresponding bounding box in post_track_detections. Its kinda hacky, but it should work.
I also have a colab notebook that demonstrates it here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants