Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

training details #7

Closed
kuaileqipaoshui opened this issue Mar 17, 2024 · 11 comments
Closed

training details #7

kuaileqipaoshui opened this issue Mar 17, 2024 · 11 comments

Comments

@kuaileqipaoshui
Copy link

RuntimeError: probability tensor contains either inf, nan or element < 0

There was an error during the evaluation. I felt that there was a problem with the installed package version. Could you provide the version of your installed package?

@ch3cook-fdu
Copy link
Contributor

Please try building the environment with the following order:

  1. Set up conda environment.
conda create -n ll3da python=3.8
conda activate ll3da
  1. Install PyTorch:
pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu116
  1. Install other packages:
pip install h5py scipy cython plyfile 'trimesh>=2.35.39,<2.35.40' transformers>=4.37.0
  1. build pointnet++ and giou support:
cd third_party/pointnet2
python setup.py install
cd ../../utils
python cython_compile.py build_ext --inplace

If the issue still appears, please let me know.

@kuaileqipaoshui
Copy link
Author

conda activate ll3da

Thanks, I will try. I have a question, when I look at the script, the evaluation is also based on only the trained checkpoint (--test_ckpt ./ckpts/opt-1.3b/ll3da-generalist/checkpoint.pth ), the tuned checkpoints are not used. What is the use of the tuned checkpoint?

@ch3cook-fdu
Copy link
Contributor

We train our model on the combination of Nr3D and ScanRefer for describing objects. However, these two datasets are annotated in different styles. Thus it is required to tune on each dataset, respectively.

@kuaileqipaoshui

This comment was marked as duplicate.

@ch3cook-fdu
Copy link
Contributor

Since LL3DA is a 3D generalist, it can distinguish different tasks given human interactions. You can directly evaluate on ScanQA with the generalist checkpoint, or try fine-tuning it.

@kuaileqipaoshui
Copy link
Author

Since LL3DA is a 3D generalist, it can distinguish different tasks given human interactions. You can directly evaluate on ScanQA with the generalist checkpoint, or try fine-tuning it.

----------------------Evaluation-----------------------
INFO: iou@0.5 matched proposals: [1525 / 2068],
[BLEU-1] Mean: 0.6246, Max: 1.0000, Min: 0.0000
[BLEU-2] Mean: 0.5269, Max: 1.0000, Min: 0.0000
[BLEU-3] Mean: 0.4311, Max: 1.0000, Min: 0.0000
[BLEU-4] Mean: 0.3519, Max: 1.0000, Min: 0.0000
[CIDEr] Mean: 0.5911, Max: 5.4976, Min: 0.0000
[ROUGE-L] Mean: 0.5407, Max: 1.0000, Min: 0.1015
[METEOR] Mean: 0.2519, Max: 1.0000, Min: 0.0448

When I directly evaluate on ScanQA with the generalist checkpoint, I got the above result. I found that the result of C@0.5 is very different from that in the paper, and other metrics are similar to those in the paper. Why is this?
ll3da1

@ch3cook-fdu
Copy link
Contributor

It seems the result you listed comes from the ScanRefer dataset for 3D dense captioning.

The results differ mainly because of 1. The randomness in data pre-processing (point down sampling), 2. Different PyTorch versions, and 3. randomness in training.

Please refer to: ch3cook-fdu/Vote2Cap-DETR#12 for more information.

Also, you are encouraged to check out the training log to see whether the performance aligns.

Additionally, the performance of 3D dense captioning might differ a little, since we do not distinguish ScanRefer from Nr3D during training. Maybe you should tune the model on each dataset for 3D dense captioning only.

@matthewdm0816
Copy link

Hi, I tried the train.generalist.sh, but I can't reproduce a close performance as reported in the paper. the only change is the 24 batch size instead of 4 to speedup training

here are the eval logs on ScanQA, Nr3D and ScanRefer, at 20th epoch
----------------------Evaluation-----------------------

[BLEU-1] Mean: 0.3028, Max: 1.0000, Min: 0.0000
[BLEU-2] Mean: 0.1904, Max: 1.0000, Min: 0.0000
[BLEU-3] Mean: 0.1283, Max: 1.0000, Min: 0.0000
[BLEU-4] Mean: 0.0875, Max: 1.0000, Min: 0.0000
[CIDEr] Mean: 0.4818, Max: 8.0511, Min: 0.0000
[ROUGE-L] Mean: 0.2636, Max: 1.0000, Min: 0.0000
[METEOR] Mean: 0.1058, Max: 1.0000, Min: 0.0000
Evaluate [19/32]; Batch [0/1]; Evaluating on iter: 12999; Iter time 261.13; Mem 70618.97MB

----------------------Evaluation-----------------------
INFO: iou@0.5 matched proposals: [712 / 1214],
[BLEU-1] Mean: 0.5626, Max: 1.0000, Min: 0.0006
[BLEU-2] Mean: 0.3753, Max: 0.8165, Min: 0.0000
[BLEU-3] Mean: 0.2223, Max: 0.6583, Min: 0.0000
[BLEU-4] Mean: 0.1339, Max: 0.5756, Min: 0.0000
[CIDEr] Mean: 0.0945, Max: 1.2465, Min: 0.0000
[ROUGE-L] Mean: 0.4495, Max: 0.8299, Min: 0.1843
[METEOR] Mean: 0.2157, Max: 0.5162, Min: 0.0783
Evaluate [19/32]; Batch [0/1]; Evaluating on iter: 12999; Iter time 262.18; Mem 70618.97MB

----------------------Evaluation-----------------------
INFO: iou@0.5 matched proposals: [1506 / 2068],
[BLEU-1] Mean: 0.6056, Max: 1.0000, Min: 0.0000
[BLEU-2] Mean: 0.4881, Max: 1.0000, Min: 0.0000
[BLEU-3] Mean: 0.3775, Max: 0.9410, Min: 0.0000
[BLEU-4] Mean: 0.2926, Max: 0.8654, Min: 0.0000
[CIDEr] Mean: 0.3024, Max: 3.1209, Min: 0.0000
[ROUGE-L] Mean: 0.4990, Max: 0.9412, Min: 0.1015
[METEOR] Mean: 0.2349, Max: 0.5416, Min: 0.0448

the training log is here

it would be nice if the pretrained checkpoints/pre-processed point clouds can be downloaded to minimize the randomness

@ch3cook-fdu
Copy link
Contributor

ch3cook-fdu commented Mar 23, 2024

The actual batch size of our original configure is 4 x 8 gpus = 32 per iteration. To re-produce our results, we encourage you to train with the exact same config as we provided.

Please track the training process on the number of iterations rather than epoch numbers. Based on our experience, training LL3DA with only 13k iterations is far from convergence.

We are actively working on packing the pre-trained weights, please stay tuned.

@kuaileqipaoshui
Copy link
Author

The actual batch size of our original configure is 4 x 8 gpus = 32 per iteration. To re-produce our results, we encourage you to train with the exact same config as we provided.

Please track the training process on the number of iterations rather than epoch numbers. Based on our experience, training LL3DA with only 13k iterations is far from convergence.

We are actively working on packing the pre-trained weights, please stay tuned.

when I use actual batch size of the original configure is 4 x 8 gpus = 32 per iteration, I find the training log:
Epoch [2/32]; Iter [11990/127936]; Loss 1.51; LR 9.79e-05; Iter time 0.46; ETA 14:48:34; Mem 18615.49MB
Loss in not finite. Skip this training step.
Loss in not finite. Skip this training step.
Loss in not finite. Skip this training step.
Loss in not finite. Skip this training step.
Loss in not finite. Skip this training step.
Loss in not finite. Skip this training step.
Loss in not finite. Skip this training step.
Loss in not finite. Skip this training step.
Epoch [3/32]; Iter [12000/127936]; Loss 1.51; LR 9.79e-05; Iter time 0.48; ETA 15:23:59; Mem 18615.49MB
What happened?

@ch3cook-fdu
Copy link
Contributor

ch3cook-fdu commented Mar 26, 2024

Because of the mixed precision training, the training process might not be that stable. As long as the model training continues, you can just ignore this message.

@ch3cook-fdu ch3cook-fdu changed the title version training details Mar 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants