The training process loss is nan #2

Xuagent · 2022-05-05T07:40:25Z

Model：
class MobileNetV3L_64(nn.Module):
def init(self):
super(MobileNetV3L_64, self).init()
self.backbone = timm.create_model('mobilenetv3_large_100', pretrained=True, exportable=True)
self.backbone.classifier = Identity()
self.fc = nn.Linear(1280, 64)
self.l2_norm = L2Norm()

def forward(self, x):
    x = self.backbone(x)
    print(self.fc.bias)
    x = self.fc(x)
    x = self.l2_norm(x)
    return x

Freezing all params...
Unfreezing fc
Unfreezing l2_norm
Total parameters: 4284016
Trainable: 81984
Non-trainable: 4202032
Loss metric: semihard triplet loss.
Overall progrress: 0%| | 0/200 [00:00<?, ?it/sP
arameter containing: | 0/1157 [00:00<?, ?it/s]
tensor([-0.0278, 0.0100, -0.0074, 0.0099, -0.0233, 0.0184, 0.0254, 0.0190,
-0.0035, -0.0131, -0.0202, 0.0249, 0.0030, -0.0152, 0.0108, -0.0017,
0.0087, 0.0180, -0.0020, 0.0107, 0.0183, 0.0091, 0.0024, -0.0217,
0.0095, 0.0122, -0.0010, -0.0135, 0.0237, 0.0144, 0.0194, 0.0059,
-0.0019, -0.0021, 0.0274, -0.0133, 0.0193, -0.0204, -0.0190, 0.0040,
-0.0178, 0.0049, 0.0126, -0.0026, -0.0035, 0.0175, 0.0258, -0.0009,
0.0181, 0.0096, -0.0056, -0.0118, 0.0132, -0.0062, 0.0272, 0.0249,
-0.0076, -0.0042, 0.0186, 0.0279, 0.0120, 0.0230, -0.0012, 0.0220],
device='cuda:0', requires_grad=True)
P
arameter containing: | 1/1157 [00:02<57:10, 2.97s/it]
tensor([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],
device='cuda:0', requires_grad=True)

The bias of fc is all nan the second time, resulting in loss being nan.
How to modify to solve this problem?
Thanks

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The training process loss is nan #2

The training process loss is nan #2

Xuagent commented May 5, 2022

The training process loss is nan #2

The training process loss is nan #2

Comments

Xuagent commented May 5, 2022