Training not working anymore #3

chris-doe · 2019-09-03T11:57:30Z

Hi Tom,

first of all, thanks for updating the repo and providing the inference script.
However, there seems to be an issue now with the heatmap based scores during training. I did a clean clone of the repo and launched training as explained in the readme. Looking up the results in Tensorboard after 600 epochs, it can be seen, that the confidence maps don't show up any local maximas (while for the previous version of the repo, the confidence maps correctly showed that the network resolved depth uncertainity with increasing number of epochs and learned to localize objects). Hyperparameters as set by default (only set batch size to 8).

The inference script - using the old model checkpoints - worked for me after adapting NMS stage. Only one method (bbox_corners) in utils.py was missing.

Do you have any idea, to get the training running again? Would appreciate any help on that - thank you!

Best regards,
Chris

The text was updated successfully, but these errors were encountered:

aloukkal · 2019-10-12T14:52:07Z

Hi Chris,

If you have a look at the compute_loss function in train.py, the loss function that was used before is the binary cross-entropy whereas in the latest version it is the Huber loss. One thing to notice as well is that total_loss= score_loss in both versions. Maybe it is more suitable to first learn the score only then finetune on the other tasks.

yhkim8412 · 2019-10-16T06:16:58Z

Hi Tom,

Thanks again for updating the repo and providing the inference script (only set batch size to 8).
Like Chris, I did a clean clone of the repo and launched training as explained in the readme.

However, there seems to be an issue.

I got these values during training.
==> Training epoch complete
score : 1.9330e+02
position: 1.6398e+07
dimension: 3.2379e+06
angle : 9.0900e+04
total : 1.9727e+07
=== Beginning epoch 100 of 600 ===

This does not seem to be trained correctly.
Is there any issue on the SIZE of INPUT IMAGE?

I would appreciate any help on that - thanks again!

Best regards,
Younghyun

chris-doe · 2019-10-16T12:37:49Z

Hi aloukkal,

Yes, I am aware of the changes affecting the loss function and confidence map representation.
The problem I was facing was: Using the new representation and loss computation, my network was not able to get certainity about depth at all. Even if I only trained on one single example/image and even if I tried to only learn the confidence score map of that single example, the network was not able to learn that specific score map (which would result in a right detection for this single training example).

jackkwok · 2019-11-28T00:15:47Z

Can someone share the last known working version in this repo?

IAMShashankk · 2022-05-13T08:38:52Z

@chris-doe @aloukkal @yhkim8412 @jackkwok Do you have any update on the issues you described here?

IAMShashankk · 2022-05-13T08:39:48Z

Hi Tom,

Thanks again for updating the repo and providing the inference script (only set batch size to 8). Like Chris, I did a clean clone of the repo and launched training as explained in the readme.

However, there seems to be an issue.

I got these values during training. ==> Training epoch complete score : 1.9330e+02 position: 1.6398e+07 dimension: 3.2379e+06 angle : 9.0900e+04 total : 1.9727e+07 === Beginning epoch 100 of 600 ===

This does not seem to be trained correctly. Is there any issue on the SIZE of INPUT IMAGE?

I would appreciate any help on that - thanks again!

Best regards, Younghyun

Even I am getting the same losses on the current version of the repo. How did you fix it?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training not working anymore #3

Training not working anymore #3

chris-doe commented Sep 3, 2019 •

edited

Loading

aloukkal commented Oct 12, 2019 •

edited

Loading

yhkim8412 commented Oct 16, 2019 •

edited

Loading

chris-doe commented Oct 16, 2019

jackkwok commented Nov 28, 2019

IAMShashankk commented May 13, 2022

IAMShashankk commented May 13, 2022

Training not working anymore #3

Training not working anymore #3

Comments

chris-doe commented Sep 3, 2019 • edited Loading

aloukkal commented Oct 12, 2019 • edited Loading

yhkim8412 commented Oct 16, 2019 • edited Loading

chris-doe commented Oct 16, 2019

jackkwok commented Nov 28, 2019

IAMShashankk commented May 13, 2022

IAMShashankk commented May 13, 2022

chris-doe commented Sep 3, 2019 •

edited

Loading

aloukkal commented Oct 12, 2019 •

edited

Loading

yhkim8412 commented Oct 16, 2019 •

edited

Loading