Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Worse performance in this repo than in a yolov3 pytorch implementation #2914

Open
drapado opened this issue Apr 12, 2019 · 38 comments
Open

Worse performance in this repo than in a yolov3 pytorch implementation #2914

drapado opened this issue Apr 12, 2019 · 38 comments
Labels
question want enhancement Want to improve accuracy, speed or functionality

Comments

@drapado
Copy link

drapado commented Apr 12, 2019

I've tested this yolov3 implementation in pytorch https://github.com/ultralytics/yolov3. I used the same dataset and same yolov3-spp.cfg file (same of everything) for the tests. I achieved these results (consistent over several attempts):

  • This repo: 58.2% mAP
  • Pytorch repo: 64.4% mAP
@AlexeyAB
Copy link
Owner

  • Did you train your own model or default model https://pjreddie.com/media/files/yolov3-spp.weights ?

  • What dataset did you use to check the mAP, is it MS COCO 2014 test-dev, or val, or MS COCO 2017, or some custom dataset?

  • What script and commands did you use in both cases to check mAP?

  • Did you check mAP@0.5 or mAP@[.5, .95]?

  • Can you show screenshots of result(mAP)?

@drapado
Copy link
Author

drapado commented Apr 12, 2019

  • I train yolov3-spp with my custom dataset (4 classes)

  • I used:
    ./darknet detector map data/obj.data cfg/yolov3-spp-obj.cfg backup/best.weigths
    python test.py --data-cfg data/obj.data --cfg cfg/yolov3-spp-obj.cfg --weights weights/best.pt --img-size 256 --batch-size 32 --conf-thres 0.25

  • I checkd mAP@0.5

  • I don't have access to the computer right now to run again the commands, but I stored the results on a spreadsheet
    image

@AlexeyAB
Copy link
Owner

AlexeyAB commented Apr 12, 2019

Did you train 2 different models, one by using https://github.com/AlexeyAB/darknet and antoher by using https://github.com/ultralytics/yolov3 ?

How many Training and Validation images did you use?

What GPU did you use for training using Darknet?

Did you set CUDNN_HALF=1?

As claimed, there can be achived +0.1% - 0.3% mAP higher accuracy by using ultralytics/yolov3 than by using Darknet https://github.com/ultralytics/yolov3#map


There are several assumptions here:

@drapado
Copy link
Author

drapado commented Apr 12, 2019

Yes, I trained two different models starting from darknet53.conv.74 and yolov3-spp.cfg adapted for 4 classes. I kept the rest of the hyperparameters equal in both frameworks. I have 4000 images in the train set and 1000 in the valid set.

I used an RTX 2060 so I trained with CUDNN_HALF=1, but I also added mixed precison training to the pytorch version through nvidia apex.amp.

I believe it's related with the presence of very small objects (smaller than 10x10 when img size=256). This objects are present as class 1, the class that improves the most its mAP in this pytorch implementation and also the class with more number of examples.

@AlexeyAB
Copy link
Owner

You can try to convert your Darknet weights-file to PyTorch weigths pt-file and check mAP with ultralytics/yolov3, in such a way you can see how different are the accuracy calculation algorithms.


I used an RTX 2060 so I trained with CUDNN_HALF=1, but I also added mixed precison training to the pytorch version through nvidia apex.amp.

Also if you try to retrain the model with CUDNN_HALF=0 and get better accuracy, I will try to find an bug if is there in the Darknet.

I just don't use Loss-scale, because I don't apply FP16 to activations, but may be it is required in any cases: https://docs.nvidia.com/deeplearning/sdk/mixed-precision-training/index.html

@drapado
Copy link
Author

drapado commented Apr 12, 2019

It seems that you can run the pytorch inplementation with the darknet weights. I got this results by using the same weigths, cfg file and validation set

  • darknet mAP@0.5 = 58.01%
  • pytorch mAP@0.5 = 51.5%

@drapado
Copy link
Author

drapado commented Apr 12, 2019

Also if you try to retrain the model with CUDNN_HALF=0 and get better accuracy, I will try to find an bug if is there in the Darknet.

I already tried that some weeks ago and there was no performance difference, it was even slightly better with CUDNN_HALF=1

@AlexeyAB
Copy link
Owner

In addition to: #2914 (comment)
There is also better NMS algorithm in the ultralytics/yolov3 : ultralytics/yolov3#72 (comment)

@AlexeyAB AlexeyAB added the want enhancement Want to improve accuracy, speed or functionality label Apr 12, 2019
@glenn-jocher
Copy link

glenn-jocher commented Apr 15, 2019

@AlexeyAB @drapado yes https://github.com/ultralytics/yolov3 accepts weights in either darknet or pytorch format in train.py, test.py and detect.py, and also computes mAP locally, which is validated against pycocotools to about 1% (i.e. we get 0.611 mAP using our repo mAP calculation vs 0.608 pycocotools with python3 test.py --weights weights/yolov3-spp.weights).

A couple points on training though:

  • darknet and this repo likely train COCO better than ultralytics/yolov3 at the moment, as we still need to optimize our hyperparameters (strangely the original darknet loss function performs poorly for us, or we may have implemented it suboptimally).
  • We see excellent training results early on which plateau after about 50 epochs training COCO from scratch guideline ultralytics/yolov3#205 (comment)
  • We report a slightly higher COCO mAP than darknet mainly due to a modified NMS. This bumps our test mAP a few fractions of a percent only. With 'traditional' NMS we report mAPs slightly lower than darknet. Our technique merges lower confidence boxes into the main box using a weighted mean method rather than simply deleting them. This technique is applicable to any object detection network, and is independent of the network itself (i.e. requires no architecture changes to yolo).
  • mAP in general is a terrible metric for real world usability, as it is optimized at extremely low confidence thresholds, creating a mess of FPs. F1 is more suitable I believe. See Unable to reproduce mAP with yolov3-tiny.weights ultralytics/yolov3#188

@AlexeyAB
Copy link
Owner

AlexeyAB commented Apr 16, 2019

@glenn-jocher Hi,

Can you provide short manual how to convert cfg/weigths -> pt and and convert back pt -> cfg/weigths to make it clear to most users? I want to link to it.

mAP in general is a terrible metric for real world usability, as it is optimized at extremely low confidence thresholds, creating a mess of FPs. F1 is more suitable I believe. See ultralytics/yolov3#188

Do you mean it is better to calculate many F1-scores for each threshold from 0.0 to 1.0 (for example) with step=0.01 and get the highest value?
Since optimal threshold for Yolo is ~0.25 while optimal threshold for SSD/DSSD is ~0.5 - 0.8, so we can't compare these two models with the same Confidence-threshold.

@glenn-jocher
Copy link

glenn-jocher commented Apr 16, 2019

@AlexeyAB yes I will create a short conversion function. There is already a nice pathway to convert weights/cfg to *.pt, but we don't have an easy way to convert back to *.weights/cfg yet though!!

About the mAP, it seems I get the best mAP@0.5 on COCO by testing at extremely low confidence thresholds, about conf_thres=0.001. But if you actually look at those pictures, the result is terrible, there are about 10 FPs for every 1 TP (about 0.10 precision). So in the ultralytics/yolov3 repo we test at conf_thres=0.001 but we detect at conf_thres=0.5.

This is an example using yolov3-tiny, from ultralytics/yolov3#188. The top pictures, run at --conf-thres 0.001 produce much higher mAP using pycocotools. So I feel they've set a terrible metric, since now everyone is chasing mAP as some sort of end-all be-all metric for how well their object detector trained, when they are actually optimizing their system to produce junk like in the first examples:

ultralytics/yolov3 yolov3-tiny.weights darknet yolov3-tiny.weights
--conf-thres 0.001 --conf-thres 0.001
person predictions
zidane predictions
ultralytics/yolov3 yolov3-tiny.weights darknet yolov3-tiny.weights
--conf-thres 0.50 --conf-thres 0.50
person predictions
zidane predictions

@AlexeyAB
Copy link
Owner

AlexeyAB commented Apr 16, 2019

@glenn-jocher

About the mAP, it seems I get the best mAP@0.5 on COCO by testing at extremely low confidence thresholds, about conf_thres=0.001. But if you actually look at those pictures, the result is terrible, there are about 10 FPs for every 1 TP (about 0.10 precision). So in the ultralytics/yolov3 repo we test at conf_thres=0.001 but we detect at conf_thres=0.5.

This is an example using yolov3-tiny, from ultralytics/yolov3#188. The top pictures, run at --conf-thres 0.001 produce much higher mAP using pycocotools. So I feel they've set a terrible metric, since now everyone is chasing mAP as some sort of end-all be-all metric for how well their object detector trained, when they are actually optimizing their system to produce junk like in the first examples:

mAP is calculated for all possible thresholds.

So when you set the conf_thres=0.001 then you just set the lowest threshold, so mAP will be calculated from threshold=0.001 to 1.0 with some step.

Why should we take into account detections with very low and very high thresholds:

  • There are many tasks where a very high Recall is required (with very low threshold), where we must detect all the objects necessarily, even if there are a lot of FPs that we will reject later: diseases detection, obstacles detection, detection of attacks and incidents ...

  • Vice versa, there are many tasks where a very high Precision is required (with very high threshold), we need to detect only necessary objects, and we should not detect objects if we are not 99% sure: weapon guidance systems, detection of road bending for self-driving cars...

  • Different models and frameworks have different optimal thresholds: (SSD ~0.5-0.8), (Yolo ~0.25), ...

So to create the single rating of models we should use mAP that includes Precision and Recall for all possible thresholds.

That is why mAP is used in the most detection ratings/competitions: Pascal VOC, MS COCO, ImageNet...


Actually for MS COCO the mAP is calculated for 101 different thresholds, URLs at the bottom of the first message: #2746

We get 101 points on Precision-Recall curve, for Recall = 0.0 - 1.0 with step 0.01. So for each of these points there will be a difference threshold:

darknet/src/detector.c

Lines 982 to 1002 in 099b71d

// MSCOCO - 101 Recall-points, PascalVOC - 11 Recall-points
else
{
int point;
for (point = 0; point < map_points; ++point) {
double cur_recall = point * 1.0 / (map_points-1);
double cur_precision = 0;
for (rank = 0; rank < detections_count; ++rank)
{
if (pr[i][rank].recall >= cur_recall) { // > or >=
if (pr[i][rank].precision > cur_precision) {
cur_precision = pr[i][rank].precision;
}
}
}
//printf("class_id = %d, point = %d, cur_recall = %.4f, cur_precision = %.4f \n", i, point, cur_recall, cur_precision);
avg_precision += cur_precision;
}
avg_precision = avg_precision / map_points;
}

@glenn-jocher
Copy link

glenn-jocher commented Apr 23, 2019

@AlexeyAB thanks for the excellent summary of mAP and why it's important across different applications! Very educational for everyone. It is true that you can tune your P/R ratio to suit your needs as move up the conf_threshand yes I see how the current mAP metric tests against all the various thresholds above the set value.

I've added a simple conversion function to export from both pytorch to darknet format and vice versa now. The process is very simple:

git clone https://github.com/ultralytics/yolov3 && cd yolov3

# darknet to pytorch
python3  -c "from models import *; convert('cfg/yolov3-spp.cfg', 'weights/yolov3-spp.weights')"
Success: converted 'weights/yolov3-spp.weights' to 'converted.pt'

# pytorch to darknet
python3  -c "from models import *; convert('cfg/yolov3-spp.cfg', 'weights/yolov3-spp.pt')"
Success: converted 'weights/yolov3-spp.pt' to 'converted.weights'

@AlexeyAB
Copy link
Owner

@glenn-jocher Thank you! I will add URL to Readme.

@glenn-jocher
Copy link

glenn-jocher commented Apr 28, 2019

@AlexeyAB Great! You could link to our iDetection iOS app also if you want, it runs YOLOv3-SPP 320 realtime (about 15-20 FPS) on devices with the newest Apple A12 processor (iPhone Xs, Xr, etc.)

It has a 5 star rating and over 700 downloads in the last two months. The screenshots below are from a previous release at 416 inference, which reduces the framerate to about 11 FPS. We are working on introducing rectangular inference as well, which could theoretically boost the FPS by 40% on HD (16:9) aspect ratios vs square inference, adding pinch to zoom functionality like the native camera app, and a few other updates.

Older devices can run the app as well, but will suffer as the model year goes back. An iPhone 6s for example will run about 0.3 FPS. Apple has really been making leaps with their Neural Engine, which is at 5 TOPS now.

@aditbhrgv
Copy link

aditbhrgv commented Apr 29, 2019

Hello @glenn-jocher & @AlexeyAB ,

I was trying to reproduce training & evaluation results on my custom dataset from Darknet C implementation from alexeyAB repo. I get worse performance results on ultralytics/yolov3 implementation.
Could you please let me know how to reproduce my Darknet C results in Pytorch?

Training dataset: ~7800 images
Test dataset: ~2560 images

Command which I ran to compute the metrics:

Darknet C implementation:
./build/darknet detector map cfg/hld.data cfg/yolov3-tiny_3l.cfg weights/yolov3-tiny_3l_20000.weights

Converted the .weights file from Darknet to .pt :
python3 -c "from models import *; convert('cfg/yolov3-tiny_3l.cfg', 'weights/yolov3-tiny_3l_20000.weights')"
to get converted.pt and then ran
python test.py --cfg=cfg/yolov3-tiny_3l.cfg --data-cfg=cfg/obj.data --weights=converted.pt --img-size=608 --conf-thres=0.25 --batch-size=64

ultralytics/yolov3 Pytorch trained model implementation:
python test.py --cfg=cfg/yolov3-tiny_3l.cfg --data-cfg=cfg/obj.data --weights=weights/best.pt--img-size=608 --conf-thres=0.25 --batch-size=64

Metrics Darknet trained model Converted model from Darknet .weights to .pth ultralytics/yolov3 trained model  
@0.25 conf-thresh        
         
Precision 0.78 59.5 0.45  
Recall 0.72 57.7 0.643  
F1 score 0.75 58.6 0.53  
MAP@0.5 0.7435 56.3 0.553  
         
         

Thanks

@aditbhrgv
Copy link

Also, when I tried to convert the Pytorch model to darknet .weights format , I get no detections in Darknet.

python3 -c "from models import *; convert('cfg/yolov3-tiny_3l.cfg', 'weights/best.pt')"

**calculation mAP (mean average precision)...
2560
detections_count = 0, unique_truth_count = 5009
class_id = 0, name = tl_pair, ap = 0.00% (TP = 0, FP = 0)
class_id = 1, name = hl_pair, ap = 0.00% (TP = 0, FP = 0)

for thresh = 0.25, precision = -nan, recall = 0.00, F1-score = -nan
for thresh = 0.25, TP = 0, FP = 0, FN = 5009, average IoU = 0.00 %

IoU threshold = 50 %, used Area-Under-Curve for each unique Recall
mean average precision (mAP@0.50) = 0.000000, or 0.00 %
Total Detection Time: 84.000000 Seconds

Set -points flag:
-points 101 for MS COCO
-points 11 for PascalVOC 2007 (uncomment difficult in voc.data)
-points 0 (AUC) for ImageNet, PascalVOC 2010-2012, your custom dataset**

@glenn-jocher
Copy link

@aditbhrgv generally testing for mAP computation (to get the results shown in https://github.com/ultralytics/yolov3#map) should be done at extremely low conf_thres, such as the default value in test.py of 0.001.

We've not actually tried using converted models, so this is an interesting finding. What happens if you convert the official yolov3.pt model from https://drive.google.com/drive/folders/1uxgUBemJVw9wZsdpboYbzUN4bcRhsuAI to .weights format and test that?

@aditbhrgv
Copy link

@aditbhrgv generally testing for mAP computation (to get the results shown in https://github.com/ultralytics/yolov3#map) should be done at extremely low conf_thres, such as the default value in test.py of 0.001.

Actually, I am not interested in mAP, I just care about comparable P, R and F1 scores in both the implementations for a particular threshold. (0.25 in above example.). I wonder what could be implementation differences on my custom dataset which lead to the above results.

@aditbhrgv
Copy link

We've not actually tried using converted models, so this is an interesting finding. What happens if you convert the official yolov3.pt model from https://drive.google.com/drive/folders/1uxgUBemJVw9wZsdpboYbzUN4bcRhsuAI to .weights format and test that?

Hello @AlexeyAB ,
I tried to test this with official yolov3.pt and test on a single image(dog.jpg) and still can't get any detections
Command:
#/build/darknet detector test cfg/coco.data cfg/yolov3.cfg /home/Darknet2Pytorch/yolov3/converted.weights data/dog.jpg

There's definitely some problem in converting from Pytorch to Darknet weights.

THanks

@glenn-jocher
Copy link

glenn-jocher commented Apr 29, 2019

@aditbhrgv ah buddy I think you are confusing the extensions and repositories a bit:

With the ultralytics/yolov3 repo the commands to detect the default images (using rectangular inference at 416 pixels :) with 1) original darknet yolov3-spp.weights and 2) darknet converted to pytorch converted.pt weights, and lastly 3) pytorch converted back to darknet as converted.weights. This round trip should fully verify the conversion functionality I believe:

# 1) original darknet weights ------------------------------------------------------------------
python3 detect.py --weights weights/yolov3-spp.weights  # original darknet weights
Namespace(cfg='cfg/yolov3-spp.cfg', conf_thres=0.5, data_cfg='data/coco.data', images='data/samples', img_size=416, nms_thres=0.5, weights='weights/yolov3-spp.weights')
Using CPU
image 1/2 data/samples/bus.jpg: 416x320 1 handbags, 3 persons, 1 buss, Done. (0.755s)
image 2/2 data/samples/zidane.jpg: 256x416 1 ties, 2 persons, Done. (0.607s)

# 2) converted to pytorch ---------------------------------------------------------------------
python3  -c "from models import *; convert('cfg/yolov3-spp.cfg', 'weights/yolov3-spp.weights')"
Success: converted 'weights/yolov3-spp.weights' to 'converted.pt'

python3 detect.py --weights converted.pt  # converted to pytorch
Namespace(cfg='cfg/yolov3-spp.cfg', conf_thres=0.5, data_cfg='data/coco.data', images='data/samples', img_size=416, nms_thres=0.5, weights='converted.pt')
Using CPU
image 1/2 data/samples/bus.jpg: 416x320 1 handbags, 3 persons, 1 buss, Done. (0.749s)
image 2/2 data/samples/zidane.jpg: 256x416 1 ties, 2 persons, Done. (0.588s)

# 3) converted back to darknet ---------------------------------------------------------------
python3  -c "from models import *; convert('cfg/yolov3-spp.cfg', 'converted.pt')"
Success: converted 'converted.pt' to 'converted.weights'

python3 detect.py --weights converted.weights  # converted back to darknet
Namespace(cfg='cfg/yolov3-spp.cfg', conf_thres=0.5, data_cfg='data/coco.data', images='data/samples', img_size=416, nms_thres=0.5, weights='converted.weights')
Using CPU
image 1/2 data/samples/bus.jpg: 416x320 1 handbags, 3 persons, 1 buss, Done. (0.749s)
image 2/2 data/samples/zidane.jpg: 256x416 1 ties, 2 persons, Done. (0.594s)
bus.jpg zidane.jpg
bus zidane

@aditbhrgv
Copy link

aditbhrgv commented Apr 29, 2019

@glenn-jocher THanks for the clarification. I was thinking I could use the "converted.weights" from Pytorch in Darknet C implementation.

Just a last quick question, how can I reproduce the results on my custom dataset as Darknet C Implementation in Pytorch implementation ? (see table here #2914 (comment)) .

I didn't use multi-scale training, neither in Darknet C nor in Pytorch implementation.

Metrics Darknet trained model Converted model from Darknet .weights to .pth ultralytics/yolov3 trained model
@0.25 conf-thresh

Precision 0.78 59.5 0.45
Recall 0.72 57.7 0.643
F1 score 0.75 58.6 0.53
MAP@0.5 0.7435 56.3 0.553

@AlexeyAB
Copy link
Owner

@aditbhrgv Hi,

Hello @glenn-jocher & @AlexeyAB ,

I was trying to reproduce training & evaluation results on my custom dataset from Darknet C implementation from alexeyAB repo. I get worse performance results on ultralytics/yolov3 implementation.
Could you please let me know how to reproduce my Darknet C results in Pytorch?

Training dataset: ~7800 images
Test dataset: ~2560 images

Command which I ran to compute the metrics:

Darknet C implementation:
./build/darknet detector map cfg/hld.data cfg/yolov3-tiny_3l.cfg weights/yolov3-tiny_3l_20000.weights

Can you attach yolov3-tiny_3l.cfg file? (rename it to cfg-file and attach).


Try to test official yolov3.pt on https://github.com/pjreddie/darknet instead of https://github.com/AlexeyAB/darknet does it work?

Hello @AlexeyAB ,
I tried to test this with official yolov3.pt and test on a single image(dog.jpg) and still can't get any detections
Command:
#/build/darknet detector test cfg/coco.data cfg/yolov3.cfg /home/Darknet2Pytorch/yolov3/converted.weights data/dog.jpg

There's definitely some problem in converting from Pytorch to Darknet weights.

THanks

@AlexeyAB
Copy link
Owner

@AlexeyAB Great! You could link to our iDetection iOS app also if you want, it runs YOLOv3-SPP 320 realtime (about 15-20 FPS) on devices with the newest Apple A12 processor (iPhone Xs, Xr, etc.)

It has a 5 star rating and over 700 downloads in the last two months. The screenshots below are from a previous release at 416 inference, which reduces the framerate to about 11 FPS. We are working on introducing rectangular inference as well, which could theoretically boost the FPS by 40% on HD (16:9) aspect ratios vs square inference, adding pinch to zoom functionality like the native camera app, and a few other updates.

Older devices can run the app as well, but will suffer as the model year goes back. An iPhone 6s for example will run about 0.3 FPS. Apple has really been making leaps with their Neural Engine, which is at 5 TOPS now.

@glenn-jocher Hi,

That's great! I will add URL.

We are working on introducing rectangular inference as well, which could theoretically boost the FPS by 40% on HD (16:9) aspect ratios vs square inference, adding pinch to zoom functionality like the native camera app, and a few other updates.

Do you mean that you currently uses square network size (320x320) and uses letter_box resizing with padding? #232 (comment)

And you will add ability for rectangle network size 16:9 (320x192 or 576x320) and will use simple resize without padding?


Did you try to implement XNOR-net on ARM/ Apple A12 processor? #2365 (comment)

@glenn-jocher
Copy link

glenn-jocher commented Apr 29, 2019

@AlexeyAB we just got it done today!!! See ultralytics/yolov3#232 (comment).

To answer your question yes, previously our app was running at 416x416 with letterboxing vertical 4k iPhone Xs video (the 4k video was resized to 234x416 and then padded/letterboxed to 416x416). This ran about 11 FPS. We reduced this to 320x320 to improve performance, and this ran at about 18 FPS. This is the current v4 app available for download today on the app store.

After our rectangular inference builds the app can now run YOLOv3-SPP at 30FPS 192x320, or 20FPS 256x416. We still letterbox/pad the short dimension to the nearest 32 multiple though. So for example the 4k video is resized to 234x416 (width x height), and then padded with 11 pixels on the left + 11 on the right to round out a multiple of 32: 256x416.

I don't know what XNOR-net is though. Here is an actual screenshot from today in Madrid, with a 1.15X zoom factor also (we enabled pinch-to-zoom functionality as well!! :)

@aditbhrgv
Copy link

aditbhrgv commented Apr 30, 2019

Can you attach yolov3-tiny_3l.cfg file? (rename it to cfg-file and attach).

Hello @AlexeyAB
Please find the attached cfg file.
yolov3-tiny_3l.cfg.txt

@gwestner94
Copy link

gwestner94 commented May 25, 2019

Hi @glenn-jocher,
I am having the same issue as @aditbhrgv when making the round trip from:
alexey/darknet -> pytorch -> alexey/darknet
using the supplied pytorch yolov3 model as well as custom trained pytorch yolov3 models.

After the conversion nothing is detected.

I figured out that after changing the header information in the weights file using a tool like
https://linux.die.net/man/1/vbindiff
the output in alexey/darknet for a default tiny-yolov3 (https://pjreddie.com/media/files/yolov3-tiny.weights) the correct output is reproduced on dog.jpg.

But when converting a custom model I suffer a big accuracy loss (scores are reduced almost by 0.5).

The weird thing is that, when using my custom model or default tiny yolov3 after conversion on the pjreddie version of darknet,
the network produces the right output after the vbindiff change.

@AlexeyAB is there a difference, that you are aware of, between the pjreddie reposiotry and yours that could cause such a mismatch?

@glenn-jocher Is there some reason why you don't preserve the header information after conversion?
Is your conversion tested on the AlexeyAB/darknet version?

Thank you very much, this would clear up a lot for me

@glenn-jocher
Copy link

glenn-jocher commented May 25, 2019

@gwestner94 we can test out the conversion mAPs. The commands (and saved outputs) are here. All 3 results are identical, performing the mAP calculation using ultralytics/yolov3. The original yolov3-spp.weights was downloaded from https://pjreddie.com/media/files/yolov3-spp.weights.

This mAP round-trip should be reproducible in our Google Colab Notebook.

If the headers are different, perhaps the header may play a role when using this repo. Feel free to submit a PR for header inclusion over at ultralytics/yolov3 if you'd like.

git clone https://github.com/ultralytics/yolov3
cd yolov3

# 1) original darknet weights ------------------------------------------------------------------
python3 test.py --weights weights/yolov3-spp.weights --save-json
#  Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.566

# 2) converted to pytorch ---------------------------------------------------------------------
python3  -c "from models import *; convert('cfg/yolov3-spp.cfg', 'weights/yolov3-spp.weights')"
# Success: converted 'weights/yolov3-spp.weights' to 'converted.pt'
python3 test.py --weights converted.pt --save-json 
#  Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.566

# 3) converted back to darknet ---------------------------------------------------------------
python3  -c "from models import *; convert('cfg/yolov3-spp.cfg', 'converted.pt')"
# Success: converted 'converted.pt' to 'converted.weights'
python3 test.py --weights converted.weights --save-json 
#  Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.566

EDIT 1: @gwestner94 after re-reading your question a second test would be to perform the same round trip using an AlexeyAB/darknet trained network. I just so happen to have one of these. I can try the round trip again with it later.

EDIT 2: Round trip successfully performed on custom dataset trained on AlexeyAB/darknet.

# 1) original darknet weights ------------------------------------------------------------------
python3 test.py --weights ../darknet/backup/yolov3-spp-sm2-1cls_5000.weights --cfg cfg/yolov3-spp-sm2-1cls.cfg --data ../supermarket2/supermarket2.data
#               Class    Images   Targets         P         R       mAP        F1
# Computing mAP: 100%|██████████████████████████████████| 2/2 [00:02<00:00,  1.72s/it]
#                 all        25       479     0.486     0.971     0.868     0.648

# 2) converted to pytorch ---------------------------------------------------------------------
python3  -c "from models import *; convert('cfg/yolov3-spp-sm2-1cls.cfg', '../darknet/backup/yolov3-spp-sm2-1cls_5000.weights')"
# Success: converted '../darknet/backup/yolov3-spp-sm2-1cls_5000.weights' to 'converted.pt'
python3 test.py --weights converted.pt --cfg cfg/yolov3-spp-sm2-1cls.cfg --data ../supermarket2/supermarket2.data
#               Class    Images   Targets         P         R       mAP        F1
# Computing mAP: 100%|██████████████████████████████████| 2/2 [00:02<00:00,  1.72s/it]
#                 all        25       479     0.486     0.971     0.868     0.648

# 3) converted back to darknet ---------------------------------------------------------------
python3  -c "from models import *; convert('cfg/yolov3-spp-sm2-1cls.cfg', 'converted.pt')"
# Success: converted 'converted.pt' to 'converted.weights'
python3 test.py --weights converted.weights --cfg cfg/yolov3-spp-sm2-1cls.cfg --data ../supermarket2/supermarket2.data
#               Class    Images   Targets         P         R       mAP        F1
# Computing mAP: 100%|██████████████████████████████████| 2/2 [00:02<00:00,  1.65s/it]
#                 all        25       479     0.486     0.971     0.868     0.648

@gwestner94
Copy link

Thank you for your feedback!
I will look into the header specifics and give you an update on your repository when I find a solution.
It looks like the problem can be solved by correct header information.

@Sudhakar17
Copy link

Sudhakar17 commented May 28, 2019

I tested the yolo-v3 model using COCO-Val data in both darknet and pytorch(ultralytics).

In Pytorch framework:

1.Yolo-v3.weights (original from darknet) -- 54.2 % mAP
2.Yolo-v3_converted.pt (converted using ultralytics code) -- 54.2%
3.Yolo-v3_converted.weights(converted back to original weights) -- 54.2%

In Darknet framework:

Yolo-v3.weights (original) -- 54.37 %
Yolo_v3_converted.weights(darknet-->pytorch-->darknet weights) -- 0%

save_weights method from ultralytics_code:

`def save_weights(self, path='model.weights', cutoff=-1):
# Converts a PyTorch model to Darket format (*.pt to *.weights)
# Note: Does not work if model.fuse() is applied
with open(path, 'wb') as f:
self.header_info[3] = self.seen # number of images seen during training
self.header_info.tofile(f)

    # Iterate through layers
    for i, (module_def, module) in enumerate(zip(self.module_defs[:cutoff], self.module_list[:cutoff])):
        if module_def['type'] == 'convolutional':
            conv_layer = module[0]
            # If batch norm, load bn first
            if module_def['batch_normalize']:
                bn_layer = module[1]
                bn_layer.bias.data.cpu().numpy().tofile(f)
                bn_layer.weight.data.cpu().numpy().tofile(f)
                bn_layer.running_mean.data.cpu().numpy().tofile(f)
                bn_layer.running_var.data.cpu().numpy().tofile(f)
            # Load conv bias
            else:
                conv_layer.bias.data.cpu().numpy().tofile(f)
            # Load conv weights
            conv_layer.weight.data.cpu().numpy().tofile(f)`

It uses the header info to store it in the weights file. Can you please tell us that what went wrong in the .weights file conversion? @glenn-jocher

what do u mean "correct header information"? @gwestner94

@glenn-jocher
Copy link

@Sudhakar17 I don't believe anything went wrong with the weights conversion, as you can see from your own pytorch framework experiment. I myself don't have knowledge of how the headers are used in this AlexeyAB/darknet repository, that would be a question for @AlexeyAB. We do not use them at all in https://github.com/ultralytics/yolov3.

@AlexeyAB
Copy link
Owner

@Sudhakar17 Hi,

Can you share (f.e. via Google-disk) 4 files - I will check the difference:

  1. yolo-v3.weights (original from darknet)
  2. yolo-v3_converted.pt (converted using ultralytics code)
  3. yolo-v3_converted.weights
  4. yolov3.cfg (to be sure that we use exactly the same model)

@glenn-jocher
Copy link

@AlexeyAB thanks! We presently create a header 5 values long of int32s, and then write the number of images seen in index 3, everything else is left as zeros. Are there any other variables that should be written to this header when saving to a *.weights file? Is each value 32 bits?

# Needed to write header when saving *.weights
self.header_info = np.zeros(5, dtype=np.int32)  # First five are header values
self.header_info[3] = seen  # number of images seen during training

@AlexeyAB
Copy link
Owner

@glenn-jocher

  • The first 3 int32_t 32-bit-values - version (major=0, minor=2, revision=5 - you should set):

    darknet/src/parser.c

    Lines 1094 to 1096 in 55dcd1b

    fwrite(&major, sizeof(int), 1, fp);
    fwrite(&minor, sizeof(int), 1, fp);
    fwrite(&revision, sizeof(int), 1, fp);

  • Then there is 1 uint64_t 64-bit-value - number of images seen during training:

    fwrite(net.seen, sizeof(uint64_t), 1, fp);


You should use these values major=0, minor=2, revision=5.
Because old version 0.1.0 used uint32_t for seen instead of uint64_t and the header was shorter by 4 bytes.

@Sudhakar17
Copy link

@AlexeyAB I am traveling at the moment. I used original yolo-v3 model and the same cfg file. mAP values will be the different one since I didn't update my local darknet repository. I update my repository and rerun the converted model later. Is this header info used anywhere for calculating mAP? @glenn-jocher

@Sudhakar17
Copy link

I updated the darknet repository and run the yolo-v3_converted.weight. It's not working. Any new updates?
@AlexeyAB @glenn-jocher

@AlexeyAB
Copy link
Owner

AlexeyAB commented Jun 5, 2019

@glenn-jocher Hi, did you fix header (version) in your conversion script? #2914 (comment)

@glenn-jocher
Copy link

@AlexeyAB @Sudhakar17 I just fixed this now in ultralytics/yolov3@d7a28bd

@Sudhakar17 you should now be able to run pytorch exported models in darknet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question want enhancement Want to improve accuracy, speed or functionality
Projects
None yet
Development

No branches or pull requests

6 participants