-
Notifications
You must be signed in to change notification settings - Fork 45.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SSD] Small object detection #3196
Comments
Did you try taking 300x300 crops from the images? You could try training it on smaller images and feed in overlapping crops of size 300x300 that tile the original image, which could be bigger. I was able to train it on 1000x600 images, and it worked on my test set which was also 1000x600. This might be slightly hard since your original set is not 300x300, but if instead you could form a dataset out of random crops of size 300x300 from your original set then maybe... The images I am actually working with are around 12MP, and I am feeding in crops of size 1000x600. However, with 1000x600, SSD is struggling to learn the classes, but the localization error is very low. |
Hi @oneTimePad, Thanks for your reply. I have thought about this approach too. Did you first annotation all the images and then covert the annotations into the cropped corresponding image (with some python script I assume)? Or you first crop them and then annotate manually on those 300x 300 images? |
@Tsuihao you cropping already annotated images. SSD has issues with detecting small objects but Faster-RCNN much better at this. |
Hi @Luonic, Yes, I had successfully trained faster rcnn and obtained an accurate result. However, it is too slow for my use case. I have some concerns regarding the annotated information. when you crop it into 300 x 300, the annotated image coordinate system need to be updated. Regards, |
Ah, yes. Completely forgot about the annotation. In my case I have program that generates all of my training data, so I can easily change the training data image size (which will then change the annotations). However, yeah, you could write a program that converts the bounding box coordinates as you mentioned, but as mentioned I am still struggling with getting the classification accuracy up. An idea I had, was to first train mobilenet base network, fine tuning from the checkpoint trained on the coco dataset or a classification checkpoint, to just classify small crops of the the objects of interest. In your case, crops of traffic lights classifying their color. Then go back to SSD and fine-tune the model from these weights trained to classify. I haven't tried this yet, but it might help mostly with the classification accuracy. You mentioned mobilenet(s); have you tried a different base network? |
Hi @oneTimePad, Thanks for the reply. In my case, I also used the pre-trained SSD mobilenet on coco dataset and fine tuning with the traffic light dataset. There are two assumptions I made (please correct me if I am wrong):
Maybe the last way is really like what you say, crop and re-annotate everything. that will be a lot overhead. |
If you want to train an SSD512 model, you need to start from scratch. The pre-trained model can only be fine-tuned as SSD300 model. |
Hi @Tsuihao Did you successfully train the SSD model on small objects? If so how did you get around it? My original images are 512x512 I am thinking about cropping them to 300x300 around the areas of interest and create the TFrecords file from the cropped ones. Would this be ok? |
Hi @augre, I have not tried it yet. I am not sure how the performance will be of cropping training images. |
Hi @Tsuihao Could you share your trained model(faster-rcnn)? And what framework did you use for training, caffe or tensorflow? Thanks |
@Tsuihao Any progress on this method ? |
Try this paper |
Hi, I'm interested in training ssd500 mobilenet from scratch, can someone give me some hints? |
Hi @Tsuihao I have a problem with ssd_mobilenet_v2_coco. My images are 600x600 size but with resizing in the config file 300x300. Is there any possibility to work 600x600 in this case? Do my training images have to be 300x300? How did you solved small object problem? |
@sapjunior : Have you used the implementation on some application other than faces? |
@jungchan1 sorry I could not provide my trained work. I was using TensorFlow @cyberjoac Nope, I did not go further on this topic; however, I am still looking forward to see if anyone can share the experience in this community :) @elifbykl 600X600 for me sounds acceptable to resize into 300x300; however, it also depends on the relative object size you are working on. Based on the above discussion, you training image will resize inito 300x 300 due to the fixed architecture SSD provided by Tensorflow. I am still not solving the small object detection with SSD yet. |
I trained a model capable of recognizing 78 German traffic signs. I used Tensorflow's Object Detection API for the training. The model can recognize the characters at a signsof about 15 meters. Here you can download the model and try it out. Model: http://eugen-lange.de/download/ssd-4-traffic-sign-detection-frozen_inpherence_graph-pb/ |
@Tsuihao i had a similar problem and i needed to slice the image into smaller tiles/crops. however i already labelled my dataset and i was not sure what size of tiles were suitable for training. Here is the code, its far from perfect but i needed a quick solution. It uses openCV rather then PIL because i tested both and openCV was much quicker with sliceing and saving the images. The function creates 2 rows and 2 columns. so if you have a image that is 1000x1000 and you need 500x500 tiles. you just put size=(2,2) 1000 / 2 = 500. |
Hello @Tsuihao, have you tried the stock SSD_mobilenet_v1_coco_2017_11_17 without training and see the result visually? My situation is the performance from stock SSD_inception_v2_coco_2017_11_17 is better than my trained-with-kitti model on car detection. I am still working on this and hopefully can get back to you ASAP. Best, |
Hi @willSapgreen, Yes, I have tried to use the pure SSD_mobilenet_v1_coco_2017_11_17 to do the traffic light detection. And the result is better than my trained SSD with traffic light dataset. However, this result can be foreseen due to the fact that SSD_mobilenet_v1_coco_2017_11_17 trained with the COCO dataset. In my case, I need a more details about the detected traffic lights e.g. red, green, yellow, red left, etc. In your case, you wanted to detect car, I believed that car in the image is much bigger than the traffic light; therefore, you should not have the same issue (traffic light is too small) as mine.
Regards, |
Hey, I read that you struggled with resizing/cropping and then labeling again. I had the same problem so I made some scripts that I am trying to turn into a library. Why don't you check them https://github.com/lozuwa/impy There is a method called reduceDatasetByRois() that takes in an offset and produces images of size (offset)X(offset) which contain the annotations of the original image. |
I'm finding several problems in obtaining a good detection on small objects. I'm using the typical ssd_mobilenet config file, and I train from ssd_mobilenet_v2 pretrained model. I'm interested in a good accuracy with a great speed, so I need SSD architecture. Maybe is better to move to SSD inception v2? Or can I change some parameters, like anchors and fixed_shape_resizer (but... how?) Thank you for any advice, @eumicro how did you edit the config file to obtain that good detection? |
Hi, i have a problem related with this, but it's a little different. I want to train a model to detect my hand, yes only one class and run the model on my phone. But the speed is a little slow ,about 400ms. I want to resize the image to smaller size like 100*100, the speed is much fast, but the presicion is very bad. I guess i need to train the ssd from scratch, is that right ? @izzrak |
You have to go on with MobileNet v2. On modern device you would get around
200 ms per image. It operates on 224x224 images. 100x100 is too small for
robust detection. If you want smooth UI you can track feature points with
classic CV tracker and while calculating new predictions animate UI with
tracked movement.
…On Fri, Jun 15, 2018, 11:59 hengshan ***@***.***> wrote:
Hi, i have a problem related with this, but it's a little different. I
want to train a model to detect my hand, yes only one class and run the
model on my phone. But the speed is a little slow ,about 400ms. I want to
resize the image to smaller size like 100*100, the speed is much fast, but
the presicion is very bad. I guess i need to train the ssd from scratch, is
that right ?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3196 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AMn3zerXQCTPu4JaV5S04MqJgA7_33gWks5t83dWgaJpZM4RjWXw>
.
|
OK i will try 224*224 |
I have same problem with detecting small objects, my input 660x420 and the objects are about 25x35. I consider my objects medium size but SSD mobilenet v1 gives low accuracy and the training time is long. I did try to make my input 660x660 (width:heigh = 1:1) as recommended by @oneTimePad to see how the resizing step to 300x300 of SSD make any improvement but the answer is yes, but not much. |
@AliceDinh, for long training time, what do you mean? How many steps? Which learning rate? Do you change anchors values? |
|
@Deep-Sek I had some experience classifying similar classes before though, e.g. different type of cars( different brand, year etc.) and different birds. It is indeed a hard problem, and I think you can have a look at paper in this domain, such as: |
@Ekko1992 I skipped OCR techniques all together because I thought since this is "OCR in the wild" where we don't control the environment, the performance would not be good. I'll give it a try asap and keep everyone updated on how it works out. Maybe I can do some affine transformations and control the text density and structure a bit. Also, will take a look at the paper and try that too. Thanks a lot for the resources. I'll provide an update as soon as I can. I did try this: http://vis-www.cs.umass.edu/bcnn/docs/bcnn_iccv15.pdf But not sure if I did have enough data to substantiate training this huge network with double the parameters. It just took way too long to converge. I'll probably re-attempt too at a later time after trying out your suggestions. |
Even if the image is cropped and re-annotated during training, the image is still so large when detected that cropping seems to be of little use. |
Hello, I am also facing a problem of recognizing small objects on the image. The real size of a billboard is pretty big, but we need to detect numbers from a distance, so the numbers would actually become small, although you could still easily recognize them on the phone screen. I will create a dataset consisting of individual numbers, logos and the whole billboard. So we would actually run the detector twice on the same image. I assume this would be anyway faster than running ResNet or Faster-RCNN on mobile device. Does anyone know if that would make any improvements for detecting process with SSD mobilenet? |
Tensorflow is crap and below-par piece of shitty library written for the benefit of Google cloud. Thank you. |
For those who are visiting... let me break down the entire story for you. #data_augmentation_options { |
@dexception Which version of tensorflow you're reffering to as the old version? And since which version this bug is fixed? Thanks. |
|
Here is something I tried that I haven't seen anyone else try here. 300 * 300 = 90e3, Rounding X and Y to integers to keep X * Y<90e3 with minimal wasted bytes finds the optimal new size to be 346x260 with 40 * 3 wasted bytes. img.shape = (260,346,3)
Retraining a SSD with inception v2, I should keep the meat of what the model has learned with minimal trouble. |
@tcrockett Preserving aspect ratio should not really affect your training in anyway. If your camera input is 4:3 (1280x960) and you resize your input image to 1:1 (300x300) and you're always consistent with this. Then it shouldn't matter. For example, after you train your network by resizing your pics from 4:3 to 1:1.. as long as you do the same during inference time (post training) and convert your camera input from 4:3 to 1:1, the distortion that you do on the image is consistent and the neural network doesn't care much about that. I can see that the network having trouble with detections if you used a different aspect ratio to capture raw data (before resizing) and then resized that to 1:1. But preserving aspect ratio doesn't really do anything. In SSD, the prior boxes have different aspect ratios which is why the aspect ratio of the input image doesn't really matter because the prior boxes will pick up the aspect ratio variation of the objects. |
Hallo Tsuihao, is the loss in your graph for the traffic light detection in percent? Or I must multiply the values with 100? Thanks for your answer |
Can anyone suggest something about Retraining a Object Detection model. i.e - |
Hi guys, here are my 2 cents: in my scenario I want to detect UI elements (buttons, checkbox, etc) from screenshots of 800x800 using ssd_mobile_net_v2. The dimensions of the objects range from 80px to 400px.
Lastly in my case I also have the need for an augmentation that creates an effect of zoom-in zoom-out for simulating projects at different scales and positions. For this I modify the preprocessor as in the pull request #8043 and used the configuration
On Stack Overflow someone explained how to test the augmentation. This is the adapted script to visualize the effect of the above operation
|
from which file you removed first two layers ? |
@sky5media have you been able to solve your issue? If yes, how? I also try to use object detection for OCR but I have 14 classes and can only detect 9 of them with model_main. Train.py loss does something weird doing great for the first epoch and then goes expotentially to billioons. |
Quite a same issue i am facing with |
@sainisanjay Your learning rate(LR) is too high I guess. Try setting a scheduled decay of LR. Check whether your objects are correctly annotated and easy to disntinguish from the background. Check the exif orientation of your pictures as well. |
@synergy178, I have following parameters:
I am not really sure how to check the the exif orientation of your pictures. But i have visualised my TF records with tfrecord-viewer. This tools gives my same results as original annotation. As can be seen attached image. |
Further, i have checked the image orientation with following two options. Both has gave me same orientation:
Option 2: Normal matplotlib lib.
|
@synergy178 unfortunately no, I couldn't solve it. |
It is not a good idea to have different height and width for the image resizer in case you want to convert it to uff to run on edge devices. Because you need to manually put the ratios in the uff config file. and the function that is used to calculate the ratios take only one variable as input. so for 300x300, the ratios would be calculated for 300. but for your case 260x346, if you input either 260 or 346, the resulting bounding boxes generated by the tensorrt model in the edge device will be different than the ones generated by the tensorflow model in your pc. |
@preronamajumder Did you use transfer learning or you train the model from scratch? I believe, If you change the height and width you can not use the pre-trained model (300x300) for weight initialization. |
https://github.com/DetectionTeamUCAS/FPN_Tensorflow |
Hi all. |
I used transfer learning with ssd_mobilenet_v2_coco. fixed image resizer can be changed. But I started setting it to 300x300. |
Why dont you try to pad the images? It will maintain the aspect ratio of the ground truth boxes and will also give the appropriate size required by the detection model. |
Hey @Tsuihao could you share the repo you use for the faster-RCNN please? Thanks in advance! |
@darkdrake88 @sainisanjay He removed the first two layers of the architecture in my opinion. scientific paper (https://arxiv.org/abs/1801.04381): TF OD API (https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet_v2.py): op(slim.conv2d, stride=2, num_outputs=32, kernel_size=[3, 3])
op(ops.expanded_conv, expansion_size=expand_input(1, divisible_by=1),num_outputs=16) Questions about it:
|
Hi all,
I have a question regarding the configuration of SSD.
An interesting task for me is to fine-tuning the SSD_mobilenet_v1_coco_2017_11_17 with Bosch small traffic light dataset.
However, the default setting is to resize the image into 300 x 300 (image_resizer).
Here is the total loss during training.
It loss maintains around 6. (Please ignore the overlapping at 5000 steps, due to some re-launch trainign process.)
I think the trend of the total loss is okay.
However, when I stop around 12k and feed with the test dataset (around 90 images for a short try). There is nothing detected.
Personally, I have some doubts about this issue:
Can I simply change the config of image size into 512 x 512 or even larger value (1000 x 1000)?
Will this work correctly as well?
Regards,
Hao
The text was updated successfully, but these errors were encountered: