Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] loss will suddenly increase very high in the train process #277

Open
Oratacth opened this issue Jan 5, 2023 · 0 comments
Open

[BUG] loss will suddenly increase very high in the train process #277

Oratacth opened this issue Jan 5, 2023 · 0 comments
Labels
bug Something isn't working

Comments

@Oratacth
Copy link

Oratacth commented Jan 5, 2023

Does anyone else have this problem when training?
like this :
Epoch 2/50 |# | (25/500) | Total: 0:00:14 | ETA: 0:04:09 | loss: 2.4039 | loss_kp_2d: 1.47 | loss_kp_3d: 0.98 | e_m_disc_loss: 0.03 | d_m_disc_real: 0.04 | d_m_disc_fake: 0.27 | d_m_disc_loss: 0.31 | data:
Epoch 2/50 |# | (26/500) | Total: 0:00:14 | ETA: 0:04:08 | loss: 2.4214 | loss_kp_2d: 1.76 | loss_kp_3d: 0.76 | e_m_disc_loss: 0.03 | d_m_disc_real: 0.03 | d_m_disc_fake: 0.28 | d_m_disc_loss: 0.31 | data:
Epoch 2/50 |# | (27/500) | Total: 0:00:15 | ETA: 0:04:07 | loss: 2.4028 | loss_kp_2d: 0.75 | loss_kp_3d: 0.83 | e_m_disc_loss: 0.03 | d_m_disc_real: 0.03 | d_m_disc_fake: 0.28 | d_m_disc_loss: 0.32 | data:
Epoch 2/50 |# | (28/500) | Total: 0:00:15 | ETA: 0:04:07 | loss: 2.3830 | loss_kp_2d: 0.72 | loss_kp_3d: 0.79 | e_m_disc_loss: 0.04 | d_m_disc_real: 0.03 | d_m_disc_fake: 0.27 | d_m_disc_loss: 0.30 | data:
Epoch 2/50 |# | (29/500) | Total: 0:00:16 | ETA: 0:04:05 | loss: 2.3815 | loss_kp_2d: 0.97 | loss_kp_3d: 1.05 | e_m_disc_loss: 0.05 | d_m_disc_real: 0.03 | d_m_disc_fake: 0.25 | d_m_disc_loss: 0.28 | data:
Epoch 2/50 |# | (30/500) | Total: 0:00:16 | ETA: 0:04:05 | loss: 2.3664 | loss_kp_2d: 0.82 | loss_kp_3d: 0.66 | e_m_disc_loss: 0.08 | d_m_disc_real: 0.03 | d_m_disc_fake: 0.20 | d_m_disc_loss: 0.23 | data:
Epoch 2/50 |# | (31/500) | Total: 0:00:17 | ETA: 0:04:04 | loss: 16.4265 | loss_kp_2d: 433.47 | loss_kp_3d: 0.84 | e_m_disc_loss: 0.32 | d_m_disc_real: 0.03 | d_m_disc_fake: 0.06 | d_m_disc_loss: 0.09 | da
Epoch 2/50 |## | (32/500) | Total: 0:00:17 | ETA: 0:03:58 | loss: 26.1461 | loss_kp_2d: 323.38 | loss_kp_3d: 1.10 | e_m_disc_loss: 0.49 | d_m_disc_real: 0.03 | d_m_disc_fake: 0.01 | d_m_disc_loss: 0.05 | da
Epoch 2/50 |## | (33/500) | Total: 0:00:18 | ETA: 0:03:57 | loss: 41.6130 | loss_kp_2d: 530.82 | loss_kp_3d: 1.06 | e_m_disc_loss: 0.71 | d_m_disc_real: 0.06 | d_m_disc_fake: 0.04 | d_m_disc_loss: 0.10 | da
Epoch 2/50 |## | (34/500) | Total: 0:00:18 | ETA: 0:03:56 | loss: 52.5822 | loss_kp_2d: 409.64 | loss_kp_3d: 1.20 | e_m_disc_loss: 0.78 | d_m_disc_real: 0.14 | d_m_disc_fake: 0.06 | d_m_disc_loss: 0.20 | da
Epoch 2/50 |## | (35/500) | Total: 0:00:19 | ETA: 0:03:56 | loss: 64.3204 | loss_kp_2d: 457.43 | loss_kp_3d: 2.06 | e_m_disc_loss: 0.80 | d_m_disc_real: 0.13 | d_m_disc_fake: 0.07 | d_m_disc_loss: 0.19 | da
Epoch 2/50 |## | (36/500) | Total: 0:00:19 | ETA: 0:03:55 | loss: 70.2876 | loss_kp_2d: 273.15 | loss_kp_3d: 3.72 | e_m_disc_loss: 0.64 | d_m_disc_real: 0.08 | d_m_disc_fake: 0.06 | d_m_disc_loss: 0.14 | da
Epoch 2/50 |## | (37/500) | Total: 0:00:20 | ETA: 0:03:55 | loss: 75.5481 | loss_kp_2d: 255.48 | loss_kp_3d: 7.11 | e_m_disc_loss: 0.92 | d_m_disc_real: 0.07 | d_m_disc_fake: 0.03 | d_m_disc_loss: 0.10 | da
Epoch 2/50 |## | (38/500) | Total: 0:00:20 | ETA: 0:03:54 | loss: 80.9875 | loss_kp_2d: 269.64 | loss_kp_3d: 10.41 | e_m_disc_loss: 0.77 | d_m_disc_real: 0.04 | d_m_disc_fake: 0.02 | d_m_disc_loss: 0.06 | d
Epoch 2/50 |## | (39/500) | Total: 0:00:21 | ETA: 0:03:51 | loss: 83.9370 | loss_kp_2d: 185.57 | loss_kp_3d: 9.21 | e_m_disc_loss: 0.45 | d_m_disc_real: 0.04 | d_m_disc_fake: 0.01 | d_m_disc_loss: 0.05 | da
Epoch 2/50 |## | (40/500) | Total: 0:00:21 | ETA: 0:03:50 | loss: 85.7856 | loss_kp_2d: 150.03 | loss_kp_3d: 7.09 | e_m_disc_loss: 0.20 | d_m_disc_real: 0.03 | d_m_disc_fake: 0.06 | d_m_disc_loss: 0.10 | da
Epoch 2/50 |## | (41/500) | Total: 0:00:22 | ETA: 0:03:55 | loss: 90.0160 | loss_kp_2d: 251.98 | loss_kp_3d: 5.89 | e_m_disc_loss: 0.16 | d_m_disc_real: 0.03 | d_m_disc_fake: 0.11 | d_m_disc_loss: 0.14 | da
Epoch 2/50 |## | (42/500) | Total: 0:00:22 | ETA: 0:03:54 | loss: 93.1862 | loss_kp_2d: 216.82 | loss_kp_3d: 5.28 | e_m_disc_loss: 0.11 | d_m_disc_real: 0.03 | d_m_disc_fake: 0.14 | d_m_disc_loss: 0.17 | da
Epoch 2/50 |## | (43/500) | Total: 0:00:23 | ETA: 0:03:54 | loss: 95.2027 | loss_kp_2d: 172.50 | loss_kp_3d: 6.58 | e_m_disc_loss: 0.16 | d_m_disc_real: 0.03 | d_m_disc_fake: 0.12 | d_m_disc_loss: 0.15 | da
Epoch 2/50 |## | (44/500) | Total: 0:00:23 | ETA: 0:03:51 | loss: 96.1961 | loss_kp_2d: 130.74 | loss_kp_3d: 7.57 | e_m_disc_loss: 0.25 | d_m_disc_real: 0.03 | d_m_disc_fake: 0.06 | d_m_disc_loss: 0.10 | da
Epoch 2/50 |## | (45/500) | Total: 0:00:24 | ETA: 0:03:51 | loss: 96.5522 | loss_kp_2d: 104.14 | loss_kp_3d: 7.58 | e_m_disc_loss: 0.32 | d_m_disc_real: 0.03 | d_m_disc_fake: 0.06 | d_m_disc_loss: 0.10 | da
Epoch 2/50 |## | (46/500) | Total: 0:00:24 | ETA: 0:03:50 | loss: 98.0207 | loss_kp_2d: 156.55 | loss_kp_3d: 6.67 | e_m_disc_loss: 0.43 | d_m_disc_real: 0.03 | d_m_disc_fake: 0.05 | d_m_disc_loss: 0.08 | da
Epoch 2/50 |### | (47/500) | Total: 0:00:25 | ETA: 0:03:50 | loss: 97.9087 | loss_kp_2d: 85.54 | loss_kp_3d: 6.79 | e_m_disc_loss: 0.37 | d_m_disc_real: 0.03 | d_m_disc_fake: 0.06 | d_m_disc_loss: 0.09 | dat
Epoch 2/50 |### | (48/500) | Total: 0:00:25 | ETA: 0:03:49 | loss: 97.6625 | loss_kp_2d: 78.66 | loss_kp_3d: 6.91 | e_m_disc_loss: 0.47 | d_m_disc_real: 0.05 | d_m_disc_fake: 0.07 | d_m_disc_loss: 0.12 | dat
Epoch 2/50 |### | (49/500) | Total: 0:00:26 | ETA: 0:03:48 | loss: 98.7095 | loss_kp_2d: 142.36 | loss_kp_3d: 5.67 | e_m_disc_loss: 0.55 | d_m_disc_real: 0.08 | d_m_disc_fake: 0.04 | d_m_disc_loss: 0.12 | da
Epoch 2/50 |### | (50/500) | Total: 0:00:26 | ETA: 0:03:47 | loss: 98.3281 | loss_kp_2d: 72.09 | loss_kp_3d: 6.76 | e_m_disc_loss: 0.75 | d_m_disc_real: 0.11 | d_m_disc_fake: 0.03 | d_m_disc_loss: 0.14 | dat
Epoch 2/50 |### | (51/500) | Total: 0:00:27 | ETA: 0:03:47 | loss: 98.9621 | loss_kp_2d: 122.80 | loss_kp_3d: 6.97 | e_m_disc_loss: 0.59 | d_m_disc_real: 0.11 | d_m_disc_fake: 0.03 | d_m_disc_loss: 0.14 | da
Epoch 2/50 |### | (52/500) | Total: 0:00:27 | ETA: 0:03:45 | loss: 98.5644 | loss_kp_2d: 71.65 | loss_kp_3d: 5.90 | e_m_disc_loss: 0.67 | d_m_disc_real: 0.12 | d_m_disc_fake: 0.04 | d_m_disc_loss: 0.16 | dat
Epoch 2/50 |### | (53/500) | Total: 0:00:28 | ETA: 0:03:44 | loss: 98.9029 | loss_kp_2d: 109.82 | loss_kp_3d: 5.81 | e_m_disc_loss: 0.65 | d_m_disc_real: 0.09 | d_m_disc_fake: 0.05 | d_m_disc_loss: 0.14 | da
Epoch 2/50 |### | (54/500) | Total: 0:00:28 | ETA: 0:03:44 | loss: 98.7054 | loss_kp_2d: 81.66 | loss_kp_3d: 5.94 | e_m_disc_loss: 0.56 | d_m_disc_real: 0.07 | d_m_disc_fake: 0.06 | d_m_disc_loss: 0.13 | dat
Epoch 2/50 |### | (55/500) | Total: 0:00:29 | ETA: 0:03:43 | loss: 98.1078 | loss_kp_2d: 58.58 | loss_kp_3d: 6.82 | e_m_disc_loss: 0.48 | d_m_disc_real: 0.06 | d_m_disc_fake: 0.06 | d_m_disc_loss: 0.12 | dat

This is my cfg:
2023-01-05 19:37:06,989 GPU name -> NVIDIA GeForce RTX 3060
2023-01-05 19:37:06,990 GPU feat -> _CudaDeviceProperties(name='NVIDIA GeForce RTX 3060', major=8, minor=6, total_memory=12287MB, multi_processor_count=28)
2023-01-05 19:37:06,990 {'CUDNN': CfgNode({'BENCHMARK': True, 'DETERMINISTIC': False, 'ENABLED': True}),
'DATASET': CfgNode({'SEQLEN': 16, 'OVERLAP': 0.5}),
'DEBUG': False,
'DEBUG_FREQ': 5,
'DEVICE': 'cuda',
'EXP_NAME': 'vibe',
'LOGDIR': 'results/vibe_tests\05-01-2023_19-37-06_vibe',
'LOSS': {'D_MOTION_LOSS_W': 0.5,
'KP_2D_W': 300.0,
'KP_3D_W': 300.0,
'POSE_W': 60.0,
'SHAPE_W': 0.06},
'MODEL': {'TEMPORAL_TYPE': 'gru',
'TGRU': {'ADD_LINEAR': True,
'BIDIRECTIONAL': False,
'HIDDEN_SIZE': 1024,
'NUM_LAYERS': 2,
'RESIDUAL': True}},
'NUM_WORKERS': 0,
'OUTPUT_DIR': 'results/vibe_tests',
'SEED_VALUE': -1,
'TRAIN': {'BATCH_SIZE': 64,
'DATASETS_2D': ['Insta'],
'DATASETS_3D': ['MPII3D'],
'DATASET_EVAL': 'ThreeDPW',
'DATA_2D_RATIO': 0.6,
'END_EPOCH': 50,
'GEN_LR': 5e-05,
'GEN_MOMENTUM': 0.9,
'GEN_OPTIM': 'Adam',
'GEN_WD': 0.0,
'LR_PATIENCE': 5,
'MOT_DISCR': {'ATT': {'DROPOUT': 0.2,
'LAYERS': 3,
'SIZE': 1024},
'DIM': 1024,
'FEATURE_POOL': 'attention',
'HIDDEN_SIZE': 1024,
'LR': 0.0001,
'MOMENTUM': 0.9,
'NUM_LAYERS': 2,
'OPTIM': 'Adam',
'UPDATE_STEPS': 1,
'WD': 0.0001},
'NUM_ITERS_PER_EPOCH': 500,
'PRETRAINED': '',
'PRETRAINED_REGRESSOR': 'data/vibe_data/spin_model_checkpoint.pth.tar',
'RESUME': '',
'START_EPOCH': 0}}

@Oratacth Oratacth added the bug Something isn't working label Jan 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant