-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataset class improperly handles images with multiple objects #60
Comments
I just realized this problem too. How much of the source code has to be rewritten to fix it? |
Thanks for bringing this up. @TasinIshmam if you have the code for this, definitely feel free to submit a PR :) |
@langejoh this might not be comprehensive but from my understanding the following changes need to be made. detecto.utils
detecto.core
There might be other changes required in detecto.visualize as well. I will have to look further into the code to find out.
@alankbi alright thanks. Please give me a few days to document the changes and clean up my code. I'l submit a PR afterwards. |
Hi @alankbi, I wanted to verify my changes are okay with all existing tests. However, I'm not able to run I could generate a model of my own but I wanted to keep the tests as consistent as possible. Could you please provide me that file in some way? |
Sure - here is a link to the model file (this will automatically download the file when you click on it): |
A column called "image_id" is added to the csv files generated using utils.xml_to_csv(). This image_id field is used by core.Dataset's _getitem_(self, idx) function t o identify the index of any specific image. Previously the index an entry inside the csv was used for indexing objects in _getitem_(self, idx) function. However this does not work when each image has more than one object. (See alankbi#60)
fix #60 - Support for images with multiple objects in Dataset class
Describe the bug
If a single image contains more than one object, then the dataset class treats each object as a separate item in the dataset. As a result, if there are 'n' objects for any image 'i', then the same image 'i' is repeated in the dataset 'n' times, one time for each object.
This happens for both instantiation methods (with csv and without csv file). Looking at the detecto source code, this happens because the
__getitem__
method of theDataset
class treats each entry in the csv or dataframe as a separate image instance (even though the same image may be in the csv/dataframe multiple times if there are multiple objects in the image).Code and Data
I am using the Dhaka-AI traffic dataset. The dataset is labelled in PASCAL VOC format, with labels and annotations both residing inside the train folder.
I ran the following code to verify the bug
Environment:
I'd be happy to work on this issue. I have already modified the detecto source code locally to handle this exact problem so I can make a PR to fix it.
Additional context
I'm attaching a sample annotation file from the dateset for reference.
The text was updated successfully, but these errors were encountered: