-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
training details #7
Comments
Please try building the environment with the following order:
If the issue still appears, please let me know. |
Thanks, I will try. I have a question, when I look at the script, the evaluation is also based on only the trained checkpoint (--test_ckpt ./ckpts/opt-1.3b/ll3da-generalist/checkpoint.pth ), the tuned checkpoints are not used. What is the use of the tuned checkpoint? |
We train our model on the combination of Nr3D and ScanRefer for describing objects. However, these two datasets are annotated in different styles. Thus it is required to tune on each dataset, respectively. |
This comment was marked as duplicate.
This comment was marked as duplicate.
Since LL3DA is a 3D generalist, it can distinguish different tasks given human interactions. You can directly evaluate on ScanQA with the generalist checkpoint, or try fine-tuning it. |
It seems the result you listed comes from the ScanRefer dataset for 3D dense captioning. The results differ mainly because of 1. The randomness in data pre-processing (point down sampling), 2. Different PyTorch versions, and 3. randomness in training. Please refer to: ch3cook-fdu/Vote2Cap-DETR#12 for more information. Also, you are encouraged to check out the training log to see whether the performance aligns. Additionally, the performance of 3D dense captioning might differ a little, since we do not distinguish ScanRefer from Nr3D during training. Maybe you should tune the model on each dataset for 3D dense captioning only. |
Hi, I tried the train.generalist.sh, but I can't reproduce a close performance as reported in the paper. the only change is the 24 batch size instead of 4 to speedup training here are the eval logs on ScanQA, Nr3D and ScanRefer, at 20th epoch [BLEU-1] Mean: 0.3028, Max: 1.0000, Min: 0.0000 ----------------------Evaluation----------------------- ----------------------Evaluation----------------------- the training log is here it would be nice if the pretrained checkpoints/pre-processed point clouds can be downloaded to minimize the randomness |
The actual batch size of our original configure is 4 x 8 gpus = 32 per iteration. To re-produce our results, we encourage you to train with the exact same config as we provided. Please track the training process on the number of iterations rather than epoch numbers. Based on our experience, training LL3DA with only 13k iterations is far from convergence. We are actively working on packing the pre-trained weights, please stay tuned. |
when I use actual batch size of the original configure is 4 x 8 gpus = 32 per iteration, I find the training log: |
Because of the mixed precision training, the training process might not be that stable. As long as the model training continues, you can just ignore this message. |
RuntimeError: probability tensor contains either
inf
,nan
or element < 0There was an error during the evaluation. I felt that there was a problem with the installed package version. Could you provide the version of your installed package?
The text was updated successfully, but these errors were encountered: