Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for transformers v5 post_process functions #1386

Merged
merged 23 commits into from
Aug 5, 2024

Conversation

shaddu
Copy link
Contributor

@shaddu shaddu commented Jul 20, 2024

Description

post_process_segmentation is getting depreciated in transformers version 5. Code changes added are to support post_process_semantic_segmentation

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)

How has this change been tested, please provide a testcase or example of how you tested the change?

Detection test Colab

Any specific deployment considerations

No

@CLAassistant
Copy link

CLAassistant commented Jul 20, 2024

CLA assistant check
All committers have signed the CLA.

@SkalskiP
Copy link
Collaborator

Hi @shaddu 👋🏻 Thank you for submitting the PR. I have a few questions regarding the scope of the changes that have been and will need to be made in from_transformers to fully migrate us. In #1113 I see that we need to migrate post_process, post_process_panoptic, post_process_segmentation and post_process_instance. Do I understand correctly that this PR only migrates post_process_segmentation for now, and the remaining outputs still need to be migrated?

@shaddu
Copy link
Contributor Author

shaddu commented Jul 22, 2024

Hi @SkalskiP , Thank you for reviewing the PR and providing your comments. The scope was to migrate post_process, post_process_panoptic, post_process_segmentation and post_process_instance but I have recently started working on computer vision, so I wanted to make sure before making all the changes that I am in the right direction. If this code is fine then I can migrate the rest of the method in this PR only or another as you prefer.

@SkalskiP
Copy link
Collaborator

@shaddu I like what you have done so far! Both code quality and functionality look good to me. From my perspective, the most important thing is that from_transformers supports both v4 and v5.

I think you can continue the migration in this PR.

@shaddu
Copy link
Contributor Author

shaddu commented Jul 27, 2024

Hi @SkalskiP ,

I have finished adding support for the remaining function. Could you please review it and offer feedback?

Thank you

@LinasKo
Copy link
Contributor

LinasKo commented Jul 29, 2024

Hi @shaddu

Thank you for your contribution! I'll check this today; hopefully we can merge soon.

@@ -483,11 +484,40 @@ class names. If provided, the resulting Detections object will contain
Class names values can be accessed using `detections["class_name"]`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to update the from_transformers docstring.

Creates a Detections instance from object detection or segmentation Transformer inference result.

I'd mention we support object detection as well as panoptic, semantic and instance segmentation results.

transformers_results (dict): The output of Transformers model inference. A dictionary containing the scores, labels, boxes and masks keys.

Now it can also be a Tensor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @SkalskiP , One quick query, ideally it should now be transformers_results (Union[dict, torch.Tensor]) but to support this we need to import the torch package, should we add the import or leave it in docstrings for now?

@@ -483,11 +484,40 @@ class names. If provided, the resulting Detections object will contain
Class names values can be accessed using `detections["class_name"]`.
""" # noqa: E501 // docs

class_ids = transformers_results["labels"].cpu().detach().numpy().astype(int)
data = {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In from_transformers, there are four places where we do:

if id2label is not None:
    class_names = np.array([id2label[class_id] for class_id in class_ids])
    data[CLASS_NAME_DATA_FIELD] = class_names

I'd wrap it in a local helper function. It can be defined inside from_transformers.

def get_data(class_ids: np.ndarray, id2label: Optional[Dict[int, str]]) -> dict:
    data = {}
    if id2label is not None:
        class_names = np.array([id2label[class_id] for class_id in class_ids])
        data[CLASS_NAME_DATA_FIELD] = class_names
    return data

mask=masks,
class_id=class_ids,
data=data,
)
else:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can drop this else statement.


if transformers_results.__class__.__name__ == "Tensor":
segmentation_array = transformers_results.cpu().detach().numpy()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make it more compact and drop those extra new lines.

Convert a PNG byte string to a binary mask array.

Args:
- png_string (bytes): A byte string representing the PNG image.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This docstring format is incompatible with rest of docstings in the project.

"""
image = Image.open(io.BytesIO(png_string))
mask = np.array(image, dtype=np.uint8)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's drop this extra new line.

@@ -504,6 +534,60 @@ class names. If provided, the resulting Detections object will contain
class_id=class_ids,
data=data,
)
elif "segments_info" in transformers_results:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I would move segments_info = transformers_results["segments_info"] extraction to if "segmentation" in transformers_results: and elif "png_string" in transformers_results: blocks. We will get one indention less.

@SkalskiP
Copy link
Collaborator

Hi @shaddu 👋🏻 Awesome work here. I left some comments regarding the current version of the code, but here are some high-level overviews ones:

  • Could you update your Colab? The current version tests only a limited subset of supported capabilities. It would be awesome to test all functions mentioned in Update to support new from_transformers methods #1113 in a single Colab.
  • In my opinion, having all that logic in a single function is hard to follow. I would create a new file, supervision/detection/transformers.py, and inside define process_transformers_v5_result, process_transformers_v4_detection_result, process_transformers_v4_segmentation_result, etc, and just call those inside from_transformers in supervision/detection/core.py. This will:
    • Make supervision/detection/core.py shorter. It is already too long.
    • Make transformers' logic organized and easier to maintain.

In general this looks really good already. Thanks a lot for all the help! 🙏🏻

@LinasKo
Copy link
Contributor

LinasKo commented Jul 29, 2024

Hi @shaddu,

Here's a Colab that may help as a starting point. I've got examples of every segmentation type, except for instance segmentation.

It would be great to also test with a few other models - not just a panoptic segmentation one.

Lastly, I'm curious: does panoptic segmentation add anything to the detections.data field?

@LinasKo LinasKo changed the title Support for post_process_semantic_segmentation added Support for transformers v5 post_process functions Jul 29, 2024
@shaddu
Copy link
Contributor Author

shaddu commented Jul 30, 2024

Hey @SkalskiP,

Thank you for the detailed feedback. The suggestion of a separate file is a good idea. I'll work on this approach.

Hey @LinasKo,

Thank you for sharing the new collab. I learned something new today on how to use your repos in collab notebook 👍

@LinasKo LinasKo mentioned this pull request Aug 1, 2024
4 tasks
@shaddu
Copy link
Contributor Author

shaddu commented Aug 3, 2024

Hi @SkalskiP / @LinasKo ,

I have made changes as per the feedback, can you please review the PR again?

@LinasKo post_process_panoptic_segmentation adds class names in detection.data.

I will be adding more models and test cases in this colab for review.

@SkalskiP
Copy link
Collaborator

SkalskiP commented Aug 5, 2024

Hi @shaddu 👋🏻 ! Awesome work! I added small changes - changed names of functions and improved docstrings. We are ready to merge! Thanks a lot for the contribution! 🙏🏻 Supporting transformers is one of our goals; your contribution is a big step in that direction.

@SkalskiP SkalskiP merged commit aa6c673 into roboflow:develop Aug 5, 2024
9 checks passed
@shaddu
Copy link
Contributor Author

shaddu commented Aug 5, 2024

@SkalskiP Thank you for merging the PR. It was a great learning experience. I'll look for more issues to work on in Supervision and if you come across any related issues, I'd be happy to help with them.

@SkalskiP
Copy link
Collaborator

SkalskiP commented Aug 6, 2024

@shaddu, working with you was an awesome experience! 🔥

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants