-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
torch.multiprocessing.spawn.ProcessExitedException: process 3 terminated with signal SIGKILL #19
Comments
How does the RAM usage evolve throughout the training on MegaDepth? |
about 50% (total 125G). when I begin training on MegaDepth. But when the training crushed, I have not inspect the RAM usage. |
max RAM usage : 98% before training crushed |
When I set the "conf.plot == None" in Function " do_evaluation()" , everything goes OK. the max RAM usage reduce to 75%. |
I have optimized how we handle figures during training in PR #30, does this help? |
when I train lightGlue using
python -m gluefactory.train sp+lg_megadepth \ --conf gluefactory/configs/superpoint-open+lightglue_megadepth.yaml \ train.load_experiment=sp+lg_homography \ data.load_features.do=True --distributed
**process killed after **
[10/17/2023 04:26:12 gluefactory INFO] [E 4 | it 1000] loss {total 1.731E+00, last 7.856E-01, assignment_nll 7.856E-01, nll_pos 1.262E+00, nll_neg 3.087E-01, num_matchable 4.165E+02, num_unmatchable 7.160E+02, confidence 2.601E-01, row_norm 8.259E-01} .
Can you offer me some advice to solve this problem? Thanks ~
The text was updated successfully, but these errors were encountered: