Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not able to reproduce the effects of finetune InterVideo1 #143

Open
hardlipay opened this issue Jul 11, 2024 · 1 comment
Open

Not able to reproduce the effects of finetune InterVideo1 #143

hardlipay opened this issue Jul 11, 2024 · 1 comment

Comments

@hardlipay
Copy link

Hellow , nice job !
I can not reproduce the MSRVTT finetuned model,and I set each args as the log

Also I check each problems ,such as dataloade or the weights. I still can not do it.

And I see in the log ,you have a pretrained model which could different than mine , is right?

this one,could you provide the weight for me.Thank you!

pretrained_path: /mnt/lustre/share_data/liyizhuo/projects/all-in-one/outputs/outputs_cotrain/models/clip_kc_new_L14_vtc_cap_3plusM_step400k_bz1792/ensemble.ckpt

@hardlipay
Copy link
Author

It is my log
2024-07-11 11:21:38,063:INFO: Effective parameters:
2024-07-11 11:21:38,063:INFO: device: cuda:4 n_gpu: 8
2024-07-11 11:21:38,063:INFO: device: cuda:5 n_gpu: 8
2024-07-11 11:21:38,063:INFO: device: cuda:3 n_gpu: 8
2024-07-11 11:21:38,063:INFO: device: cuda:7 n_gpu: 8
2024-07-11 11:21:38,063:INFO: device: cuda:6 n_gpu: 8
2024-07-11 11:21:38,063:INFO: device: cuda:1 n_gpu: 8
2024-07-11 11:21:38,063:INFO: device: cuda:2 n_gpu: 8
2024-07-11 11:21:38,063:INFO: <<< batch_size: 512
2024-07-11 11:21:38,063:INFO: <<< batch_size_val: 16
2024-07-11 11:21:38,063:INFO: <<< cache_dir:
2024-07-11 11:21:38,063:INFO: <<< cdcr: 0
2024-07-11 11:21:38,063:INFO: <<< clip_evl: False
2024-07-11 11:21:38,063:INFO: <<< coef_lr: 0.005
2024-07-11 11:21:38,063:INFO: <<< cross_model: cross-base
2024-07-11 11:21:38,063:INFO: <<< cross_num_hidden_layers: 4
2024-07-11 11:21:38,063:INFO: <<< data_path: ./msrvtt_data/MSRVTT_data.json
2024-07-11 11:21:38,063:INFO: <<< datatype: msrvtt
2024-07-11 11:21:38,063:INFO: <<< dist_url: tcp://127.0.0.1:29500
2024-07-11 11:21:38,063:INFO: <<< do_eval: False
2024-07-11 11:21:38,063:INFO: <<< do_lower_case: False
2024-07-11 11:21:38,063:INFO: <<< do_pretrain: False
2024-07-11 11:21:38,063:INFO: <<< do_train: True
2024-07-11 11:21:38,063:INFO: <<< epochs: 5
2024-07-11 11:21:38,063:INFO: <<< eval_frame_order: 0
2024-07-11 11:21:38,063:INFO: <<< expand_msrvtt_sentences: True
2024-07-11 11:21:38,063:INFO: <<< feature_framerate: 1
2024-07-11 11:21:38,063:INFO: <<< features_path: /openbayes/home/InternVideo/dataset/11_new
2024-07-11 11:21:38,063:INFO: <<< fp16: False
2024-07-11 11:21:38,063:INFO: <<< fp16_opt_level: O1
2024-07-11 11:21:38,064:INFO: <<< freeze_layer_num: 0
2024-07-11 11:21:38,064:INFO: <<< gpu: 0
2024-07-11 11:21:38,064:INFO: <<< gradient_accumulation_steps: 1
2024-07-11 11:21:38,064:INFO: <<< hard_negative_rate: 0.5
2024-07-11 11:21:38,064:INFO: <<< init_model: None
2024-07-11 11:21:38,064:INFO: <<< interaction: no
2024-07-11 11:21:38,064:INFO: <<< linear_patch: 2d
2024-07-11 11:21:38,064:INFO: <<< local_rank: 0
2024-07-11 11:21:38,064:INFO: <<< loose_type: True
2024-07-11 11:21:38,064:INFO: <<< lr: 0.001
2024-07-11 11:21:38,064:INFO: <<< lr_decay: 0.9
2024-07-11 11:21:38,064:INFO: <<< margin: 0.1
2024-07-11 11:21:38,064:INFO: <<< max_frames: 12
2024-07-11 11:21:38,064:INFO: <<< max_words: 77
2024-07-11 11:21:38,064:INFO: <<< mergeclip: False
2024-07-11 11:21:38,064:INFO: <<< mergeweight: 0.5
2024-07-11 11:21:38,064:INFO: <<< n_display: 50
2024-07-11 11:21:38,064:INFO: <<< n_gpu: 1
2024-07-11 11:21:38,064:INFO: <<< n_pair: 1
2024-07-11 11:21:38,064:INFO: <<< negative_weighting: 1
2024-07-11 11:21:38,064:INFO: <<< num_thread_reader: 16
2024-07-11 11:21:38,064:INFO: <<< output_dir: ./ret_mgpu_mydata_finetune_use_pretrained_path
2024-07-11 11:21:38,064:INFO: <<< pretrained_clip_name: /openbayes/home/InternVideo/dataset/pretrained_weights/clip/ViT-B-32.pt
2024-07-11 11:21:38,064:INFO: <<< pretrained_path: /openbayes/home/InternVideo/dataset/pretrained_weights/InternVideo-MM-L-14.ckpt
2024-07-11 11:21:38,064:INFO: <<< rank: 0
2024-07-11 11:21:38,064:INFO: <<< resume_model: None
2024-07-11 11:21:38,064:INFO: <<< sampled_use_mil: False
2024-07-11 11:21:38,064:INFO: <<< seed: 42
2024-07-11 11:21:38,064:INFO: <<< sim_header: meanP
2024-07-11 11:21:38,064:INFO: <<< slice_framepos: 2
2024-07-11 11:21:38,064:INFO: <<< task_type: retrieval
2024-07-11 11:21:38,064:INFO: <<< text_num_hidden_layers: 12
2024-07-11 11:21:38,064:INFO: <<< train_csv: ./msrvtt_data/MSRVTT_train.9k.csv
2024-07-11 11:21:38,064:INFO: <<< train_frame_order: 0
2024-07-11 11:21:38,064:INFO: <<< use_mil: False
2024-07-11 11:21:38,064:INFO: <<< val_csv: ./msrvtt_data/MSRVTT_JSFUSION_test.csv
2024-07-11 11:21:38,064:INFO: <<< video_dim: 1024
2024-07-11 11:21:38,064:INFO: <<< visual_num_hidden_layers: 12
2024-07-11 11:21:38,064:INFO: <<< warmup_proportion: 0.1
2024-07-11 11:21:38,065:INFO: <<< world_size: 8
2024-07-11 11:21:38,065:INFO: <<< wti_arch: 0
2024-07-11 11:21:38,065:INFO: device: cuda:0 n_gpu: 8
2024-07-11 11:21:39,913:INFO: loading archive file /output/InternVideo/InternVideo1/Downstream/Video-Text-Retrieval/modules/cross-base
2024-07-11 11:21:39,913:INFO: Model config {
"attention_probs_dropout_prob": 0.1,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 512,
"initializer_range": 0.02,
"intermediate_size": 2048,
"max_position_embeddings": 128,
"num_attention_heads": 8,
"num_hidden_layers": 4,
"type_vocab_size": 2,
"vocab_size": 512
}
.......024-07-11 11:21:55,630:INFO: ***** Running training ***** 2024-07-11 11:21:55,631:INFO: Num examples = 180000 2024-07-11 11:21:55,631:INFO: Batch size = 512 2024-07-11 11:21:55,631:INFO: Num steps = 1755 2024-07-11 11:22:32,338:INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 11:22:32,344:INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 11:22:32,421:INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 11:22:32,422:INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 11:22:32,423:INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 11:22:32,423:INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 11:22:32,426:INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 11:22:32,425:INFO: Reducer buckets have been rebuilt in this iteration. 2024-07-11 11:24:07,584:INFO: Epoch: 1/5, Step: 50/351, Lr: , Loss: 2.227273, Time/step: 2.635337 2024-07-11 11:25:32,854:INFO: Epoch: 1/5, Step: 100/351, Lr: , Loss: 2.295935, Time/step: 1.705309 2024-07-11 11:27:00,094:INFO: Epoch: 1/5, Step: 150/351, Lr: , Loss: 2.706636, Time/step: 1.744746 2024-07-11 11:28:26,450:INFO: Epoch: 1/5, Step: 200/351, Lr: , Loss: 2.736752, Time/step: 1.727095 2024-07-11 11:29:53,309:INFO: Epoch: 1/5, Step: 250/351, Lr: , Loss: 2.657757, Time/step: 1.737177 2024-07-11 11:31:19,335:INFO: Epoch: 1/5, Step: 300/351, Lr: , Loss: 2.597078, Time/step: 1.720514 2024-07-11 11:32:48,897:INFO: Epoch: 1/5, Step: 350/351, Lr: , Loss: 2.315990, Time/step: 1.791219 2024-07-11 11:32:51,493:INFO: Epoch 1/5 Finished, Train Loss: 2.562695 2024-07-11 11:32:56,494:INFO: Model saved to ./ret_mgpu_mydata_finetune_use_pretrained_path/pytorch_model.bin 2024-07-11 11:32:56,495:INFO: Optimizer saved to ./ret_mgpu_mydata_finetune_use_pretrained_path/pytorch_opt.bin 2024-07-11 11:33:16,583:INFO: sim matrix size: 1000, 1000 2024-07-11 11:33:16,768:INFO: Length-T: 1000, Length-V:1000 2024-07-11 11:33:16,768:INFO: ------------------------------------------------------------ 2024-07-11 11:33:16,768:INFO: DSL Text-to-Video: 2024-07-11 11:33:16,768:INFO: >>> R@1: 18.7 - R@5: 43.2 - R@10: 55.7 - Median R: 8.0 - Mean R: 45.0 2024-07-11 11:33:16,768:INFO: DSL Video-to-Text: 2024-07-11 11:33:16,768:INFO: >>> V2T$R@1: 18.0 - V2T$R@5: 42.6 - V2T$R@10: 54.4 - V2T$Median R: 8.5 - V2T$Mean R: 43.7 2024-07-11 11:33:16,768:INFO: ------------------------------------------------------------ 2024-07-11 11:33:16,768:INFO: Text-to-Video: 2024-07-11 11:33:16,769:INFO: >>> R@1: 15.7 - R@5: 39.4 - R@10: 52.6 - Median R: 9.0 - Mean R: 48.4 2024-07-11 11:33:16,769:INFO: Video-to-Text: 2024-07-11 11:33:16,769:INFO: >>> V2T$R@1: 15.1 - V2T$R@5: 39.3 - V2T$R@10: 50.7 - V2T$Median R: 10.0 - V2T$Mean R: 48.8 2024-07-11 11:33:16,770:INFO: The best model is: ./ret_mgpu_mydata_finetune_use_pretrained_path/pytorch_model.bin, the R1 is: 15.7000 2024-07-11 11:34:40,019:INFO: Epoch: 2/5, Step: 49/351, Lr: , Loss: 1.967601, Time/step: 1.661535 2024-07-11 11:36:04,756:INFO: Epoch: 2/5, Step: 99/351, Lr: , Loss: 1.975278, Time/step: 1.694740 2024-07-11 11:37:29,064:INFO: Epoch: 2/5, Step: 149/351, Lr: , Loss: 1.755215, Time/step: 1.686095 2024-07-11 11:38:54,342:INFO: Epoch: 2/5, Step: 199/351, Lr: , Loss: 1.821315, Time/step: 1.705289 2024-07-11 11:40:19,421:INFO: Epoch: 2/5, Step: 249/351, Lr: , Loss: 1.728463, Time/step: 1.701576 2024-07-11 11:41:45,084:INFO: Epoch: 2/5, Step: 299/351, Lr: , Loss: 1.622554, Time/step: 1.713192 2024-07-11 11:43:11,891:INFO: Epoch: 2/5, Step: 349/351, Lr: , Loss: 1.491672, Time/step: 1.736059 2024-07-11 11:43:16,443:INFO: Epoch 2/5 Finished, Train Loss: 1.796257 2024-07-11 11:43:19,608:INFO: Model saved to ./ret_mgpu_mydata_finetune_use_pretrained_path/pytorch_model.bin 2024-07-11 11:43:19,609:INFO: Optimizer saved to ./ret_mgpu_mydata_finetune_use_pretrained_path/pytorch_opt.bin 2024-07-11 11:43:36,846:INFO: sim matrix size: 1000, 1000 2024-07-11 11:43:37,040:INFO: Length-T: 1000, Length-V:1000 2024-07-11 11:43:37,040:INFO: ------------------------------------------------------------ 2024-07-11 11:43:37,040:INFO: DSL Text-to-Video: 2024-07-11 11:43:37,040:INFO: >>> R@1: 20.9 - R@5: 45.5 - R@10: 56.9 - Median R: 7.0 - Mean R: 38.2 2024-07-11 11:43:37,040:INFO: DSL Video-to-Text: 2024-07-11 11:43:37,040:INFO: >>> V2T$R@1: 21.0 - V2T$R@5: 45.2 - V2T$R@10: 55.9 - V2T$Median R: 7.5 - V2T$Mean R: 38.9 2024-07-11 11:43:37,040:INFO: ------------------------------------------------------------ 2024-07-11 11:43:37,040:INFO: Text-to-Video: 2024-07-11 11:43:37,040:INFO: >>> R@1: 20.3 - R@5: 43.1 - R@10: 55.2 - Median R: 8.0 - Mean R: 42.8 2024-07-11 11:43:37,040:INFO: Video-to-Text: 2024-07-11 11:43:37,040:INFO: >>> V2T$R@1: 18.8 - V2T$R@5: 42.0 - V2T$R@10: 54.3 - V2T$Median R: 8.0 - V2T$Mean R: 46.8 2024-07-11 11:43:37,042:INFO: The best model is: ./ret_mgpu_mydata_finetune_use_pretrained_path/pytorch_model.bin, the R1 is: 20.3000 2024-07-11 11:44:59,654:INFO: Epoch: 3/5, Step: 48/351, Lr: , Loss: 1.288813, Time/step: 1.648480 2024-07-11 11:46:27,554:INFO: Epoch: 3/5, Step: 98/351, Lr: , Loss: 1.066350, Time/step: 1.757993 2024-07-11 11:47:53,040:INFO: Epoch: 3/5, Step: 148/351, Lr: , Loss: 1.266637, Time/step: 1.709701 2024-07-11 11:49:20,329:INFO: Epoch: 3/5, Step: 198/351, Lr: , Loss: 1.178898, Time/step: 1.745760 2024-07-11 11:50:46,582:INFO: Epoch: 3/5, Step: 248/351, Lr: , Loss: 1.115819, Time/step: 1.724987 2024-07-11 11:52:14,526:INFO: Epoch: 3/5, Step: 298/351, Lr: , Loss: 1.098051, Time/step: 1.758858 2024-07-11 11:53:43,083:INFO: Epoch: 3/5, Step: 348/351, Lr: , Loss: 1.026722, Time/step: 1.771078 2024-07-11 11:53:49,132:INFO: Epoch 3/5 Finished, Train Loss: 1.159029 2024-07-11 11:53:52,302:INFO: Model saved to ./ret_mgpu_mydata_finetune_use_pretrained_path/pytorch_model.bin 2024-07-11 11:53:52,302:INFO: Optimizer saved to ./ret_mgpu_mydata_finetune_use_pretrained_path/pytorch_opt.bin 2024-07-11 11:54:09,578:INFO: sim matrix size: 1000, 1000 2024-07-11 11:54:09,769:INFO: Length-T: 1000, Length-V:1000 2024-07-11 11:54:09,769:INFO: ------------------------------------------------------------ 2024-07-11 11:54:09,769:INFO: DSL Text-to-Video: 2024-07-11 11:54:09,769:INFO: >>> R@1: 24.2 - R@5: 47.6 - R@10: 61.6 - Median R: 6.0 - Mean R: 37.8 2024-07-11 11:54:09,769:INFO: DSL Video-to-Text: 2024-07-11 11:54:09,769:INFO: >>> V2T$R@1: 23.7 - V2T$R@5: 48.1 - V2T$R@10: 61.1 - V2T$Median R: 6.0 - V2T$Mean R: 37.0 2024-07-11 11:54:09,769:INFO: ------------------------------------------------------------ 2024-07-11 11:54:09,769:INFO: Text-to-Video: 2024-07-11 11:54:09,769:INFO: >>> R@1: 21.7 - R@5: 48.2 - R@10: 59.9 - Median R: 6.0 - Mean R: 40.4 2024-07-11 11:54:09,769:INFO: Video-to-Text: 2024-07-11 11:54:09,769:INFO: >>> V2T$R@1: 19.0 - V2T$R@5: 44.7 - V2T$R@10: 56.5 - V2T$Median R: 7.0 - V2T$Mean R: 46.6 2024-07-11 11:54:09,771:INFO: The best model is: ./ret_mgpu_mydata_finetune_use_pretrained_path/pytorch_model.bin, the R1 is: 21.7000 2024-07-11 11:55:30,040:INFO: Epoch: 4/5, Step: 47/351, Lr: , Loss: 0.729010, Time/step: 1.601594 2024-07-11 11:56:57,121:INFO: Epoch: 4/5, Step: 97/351, Lr: , Loss: 0.806269, Time/step: 1.741610 2024-07-11 11:58:22,053:INFO: Epoch: 4/5, Step: 147/351, Lr: , Loss: 0.769861, Time/step: 1.698622 2024-07-11 11:59:49,861:INFO: Epoch: 4/5, Step: 197/351, Lr: , Loss: 0.791283, Time/step: 1.756077 2024-07-11 12:01:16,158:INFO: Epoch: 4/5, Step: 247/351, Lr: , Loss: 0.659234, Time/step: 1.725929 2024-07-11 12:02:43,180:INFO: Epoch: 4/5, Step: 297/351, Lr: , Loss: 0.681899, Time/step: 1.740434 2024-07-11 12:04:11,277:INFO: Epoch: 4/5, Step: 347/351, Lr: , Loss: 0.799178, Time/step: 1.761938 2024-07-11 12:04:19,146:INFO: Epoch 4/5 Finished, Train Loss: 0.740242 2024-07-11 12:04:22,161:INFO: Model saved to ./ret_mgpu_mydata_finetune_use_pretrained_path/pytorch_model.bin 2024-07-11 12:04:22,162:INFO: Optimizer saved to ./ret_mgpu_mydata_finetune_use_pretrained_path/pytorch_opt.bin 2024-07-11 12:04:39,896:INFO: sim matrix size: 1000, 1000 2024-07-11 12:04:40,081:INFO: Length-T: 1000, Length-V:1000 2024-07-11 12:04:40,082:INFO: ------------------------------------------------------------ 2024-07-11 12:04:40,082:INFO: DSL Text-to-Video: 2024-07-11 12:04:40,082:INFO: >>> R@1: 23.2 - R@5: 49.4 - R@10: 60.7 - Median R: 6.0 - Mean R: 38.2 2024-07-11 12:04:40,082:INFO: DSL Video-to-Text: 2024-07-11 12:04:40,082:INFO: >>> V2T$R@1: 22.0 - V2T$R@5: 48.3 - V2T$R@10: 61.3 - V2T$Median R: 6.0 - V2T$Mean R: 37.8 2024-07-11 12:04:40,082:INFO: ------------------------------------------------------------ 2024-07-11 12:04:40,082:INFO: Text-to-Video: 2024-07-11 12:04:40,082:INFO: >>> R@1: 22.0 - R@5: 49.5 - R@10: 60.6 - Median R: 6.0 - Mean R: 41.4 2024-07-11 12:04:40,082:INFO: Video-to-Text: 2024-07-11 12:04:40,082:INFO: >>> V2T$R@1: 20.3 - V2T$R@5: 44.4 - V2T$R@10: 56.2 - V2T$Median R: 8.0 - V2T$Mean R: 48.2 2024-07-11 12:04:40,083:INFO: The best model is: ./ret_mgpu_mydata_finetune_use_pretrained_path/pytorch_model.bin, the R1 is: 22.0000 2024-07-11 12:05:58,897:INFO: Epoch: 5/5, Step: 46/351, Lr: , Loss: 0.583870, Time/step: 1.572555 2024-07-11 12:07:24,675:INFO: Epoch: 5/5, Step: 96/351, Lr: , Loss: 0.448119, Time/step: 1.715497 2024-07-11 12:08:49,759:INFO: Epoch: 5/5, Step: 146/351, Lr: , Loss: 0.492565, Time/step: 1.701642 2024-07-11 12:10:16,849:INFO: Epoch: 5/5, Step: 196/351, Lr: , Loss: 0.521736, Time/step: 1.741791 2024-07-11 12:11:42,517:INFO: Epoch: 5/5, Step: 246/351, Lr: , Loss: 0.593410, Time/step: 1.713338 2024-07-11 12:13:10,964:INFO: Epoch: 5/5, Step: 296/351, Lr: , Loss: 0.528416, Time/step: 1.768792 2024-07-11 12:14:38,727:INFO: Epoch: 5/5, Step: 346/351, Lr: , Loss: 0.520307, Time/step: 1.755247 2024-07-11 12:14:48,573:INFO: Epoch 5/5 Finished, Train Loss: 0.503294 2024-07-11 12:14:50,565:INFO: Model saved to ./ret_mgpu_mydata_finetune_use_pretrained_path/pytorch_model.bin 2024-07-11 12:14:50,565:INFO: Optimizer saved to ./ret_mgpu_mydata_finetune_use_pretrained_path/pytorch_opt.bin 2024-07-11 12:14:58,720:INFO: sim matrix size: 1000, 1000 2024-07-11 12:14:58,905:INFO: Length-T: 1000, Length-V:1000 2024-07-11 12:14:58,905:INFO: ------------------------------------------------------------ 2024-07-11 12:14:58,905:INFO: DSL Text-to-Video: 2024-07-11 12:14:58,905:INFO: >>> R@1: 22.6 - R@5: 47.3 - R@10: 59.7 - Median R: 6.0 - Mean R: 39.5 2024-07-11 12:14:58,905:INFO: DSL Video-to-Text: 2024-07-11 12:14:58,905:INFO: >>> V2T$R@1: 21.3 - V2T$R@5: 46.5 - V2T$R@10: 59.7 - V2T$Median R: 7.0 - V2T$Mean R: 40.6 2024-07-11 12:14:58,905:INFO: ------------------------------------------------------------ 2024-07-11 12:14:58,905:INFO: Text-to-Video: 2024-07-11 12:14:58,905:INFO: >>> R@1: 22.0 - R@5: 48.5 - R@10: 59.5 - Median R: 6.0 - Mean R: 43.5 2024-07-11 12:14:58,905:INFO: Video-to-Text: 2024-07-11 12:14:58,905:INFO: >>> V2T$R@1: 17.6 - V2T$R@5: 41.8 - V2T$R@10: 53.3 - V2T$Median R: 9.0 - V2T$Mean R: 54.3 2024-07-11 12:14:58,906:INFO: The best model is: ./ret_mgpu_mydata_finetune_use_pretrained_path/pytorch_model.bin, the R1 is: 22.0000

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant