Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running the t code shows loss=nan when calculating the coco dataset. #68

Open
liaochuanlin opened this issue Aug 4, 2023 · 4 comments

Comments

@liaochuanlin
Copy link

Running the t code shows loss=nan when calculating the coco dataset.

@sanitizer84
Copy link

I encountered the same problem with the coco_stuff dataset.

2023-08-07 16:07:13,007 INFO [trainer.py, 229] Train Epoch: 0 Train Iteration: 30 Time 4.248s / 10iters, (0.425) Forward Time 2.557s / 10iters, (0.256) Backward Time 1.583s / 10iters, (0.158) Loss Time 0.075s / 10iters, (0.007) Data load 0.033s / 10iters, (0.003317)
Learning rate = [0.0009995649894856365, 0.009995649894856365, 0.009995649894856365] Loss = nan (ave = nan)

Is it because there aren't enough training epochs?

@SchuckLee
Copy link

I encountered the same error, have you solved this problem?

@kevinkevin556
Copy link

kevinkevin556 commented Apr 29, 2024

I encountered the same problem when I changed the architecture to my own model. In my case, I found some elements of exp_logits + neg_logits could be zeros, thus resulting in inf after a log function.

After making a small change from

log_prob = logits - torch.log(exp_logits + neg_logits)

to

log_prob = logits - torch.log(exp_logits + neg_logits + 1e-10)

Everything was going well.

@SchuckLee
Copy link

I encountered the same problem when I changed the architecture to my own model. In my case, I found some elements of exp_logits + neg_logits could be zeros, thus resulting in inf after a log function.

After making a small change from

log_prob = logits - torch.log(exp_logits + neg_logits)

to

log_prob = logits - torch.log(exp_logits + neg_logits + 1e-10)

Everything was going well.

That makes sense for me, thanks for your help!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants