torch.cuda.OutOfMemoryError #15

iason-r · 2023-12-23T03:03:47Z

我使用的是16G的GPU，请问这个报错和数据集大小有关系吗

JunyuanDeng · 2023-12-23T06:28:45Z

16G应该是够的，您可以尝试降低batch size，最简单的做法就是把这一行的chunksize之间除以10：chunk_size//10。当然，这个会大大降低渲染速度

iason-r · 2024-01-03T02:08:21Z

请问需要修改的是哪个文件的chunk_size

JunyuanDeng · 2024-01-03T02:36:59Z

你可以点上面的超链接

iason-r · 2024-01-03T08:14:58Z

我修改了chunk_size为chunk_size//10之后依然报错
Traceback (most recent call last):
File "/home/sucronav/.conda/envs/torch/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/home/sucronav/.conda/envs/torch/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/sucronav/renbin/NeRF-LOAM/src/mapping.py", line 112, in spin
self.do_mapping(share_data, tracked_frame)
File "/home/sucronav/renbin/NeRF-LOAM/src/mapping.py", line 179, in do_mapping
bundle_adjust_frames(
File "/home/sucronav/renbin/NeRF-LOAM/src/variations/render_helpers.py", line 398, in bundle_adjust_frames
final_outputs = render_rays(
File "/home/sucronav/renbin/NeRF-LOAM/src/variations/render_helpers.py", line 211, in render_rays
intersections, hits = ray_intersect(
File "/home/sucronav/.local/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/sucronav/renbin/NeRF-LOAM/src/variations/voxel_helpers.py", line 534, in ray_intersect
pts_idx, min_depth, max_depth = svo_ray_intersect(
File "/home/sucronav/.local/lib/python3.8/site-packages/torch/autograd/function.py", line 506, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/home/sucronav/renbin/NeRF-LOAM/src/variations/voxel_helpers.py", line 108, in forward
children = children.expand(S * G, *children.size()).contiguous()
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.61 GiB (GPU 0; 7.79 GiB total capacity; 979.16 MiB already allocated; 2.48 GiB free; 1016.00 MiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
程序运行过程中，我每秒刷新一次nvidia-smi
据我观察，gpu内存占用应该是没有达到100%的

iason-r · 2024-01-03T08:21:13Z

JunyuanDeng · 2024-01-03T08:41:13Z

8G的显存确实有点少了，我之前没有写过把程序放在两张卡上运行过，不知道怎么修改。如果可以的话，尽量用16G以上的显存。

iason-r · 2024-01-16T01:38:04Z

我使用24g的显存跑，但我跑了接近24个小时，才跑到68%，请问这正常吗

JunyuanDeng · 2024-01-16T04:30:05Z

嗯，目前确实是越跑越慢的，是我们目前的优化方向，您可以使用subscene分支，加快速度。记得fetch新的git更新

iason-r · 2024-01-17T02:52:39Z

emm，24g显存为什么也报OutOfMemory
insert keyframe
********** current num kfs: 18 **********
tracking frame: 99%|███████████████████████████████████████████████████████████████████████████████████▎| 4501/4540 [47:26:23<44:59, 69.22s/it]Process Process-2:
Traceback (most recent call last):
File "/home/rb/anaconda3/envs/torch/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/home/rb/anaconda3/envs/torch/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/rb/NeRF-LOAM/src/mapping.py", line 112, in spin
self.do_mapping(share_data, tracked_frame)
File "/home/rb/NeRF-LOAM/src/mapping.py", line 179, in do_mapping
bundle_adjust_frames(
File "/home/rb/NeRF-LOAM/src/variations/render_helpers.py", line 398, in bundle_adjust_frames
final_outputs = render_rays(
File "/home/rb/NeRF-LOAM/src/variations/render_helpers.py", line 211, in render_rays
intersections, hits = ray_intersect(
File "/home/rb/anaconda3/envs/torch/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/rb/NeRF-LOAM/src/variations/voxel_helpers.py", line 534, in ray_intersect
pts_idx, min_depth, max_depth = svo_ray_intersect(
File "/home/rb/anaconda3/envs/torch/lib/python3.8/site-packages/torch/autograd/function.py", line 506, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/home/rb/NeRF-LOAM/src/variations/voxel_helpers.py", line 108, in forward
children = children.expand(S * G, *children.size()).contiguous()
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 5.50 GiB (GPU 0; 23.69 GiB total capacity; 5.14 GiB already allocated; 5.22 GiB free; 7.45 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

iason-r · 2024-01-17T02:59:47Z

对了，之前因为
sampled_rays_d = frame.rays_d[sample_mask].cuda()报错RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)
所以我把
sample_mask = frame.sample_mask.cuda()
sampled_rays_d = frame.rays_d[sample_mask].cuda()
改成了
sample_mask = frame.sample_mask.cuda()
sample_mask = sample_mask.cuda()
frame.rays_d = frame.rays_d.cuda()
sampled_rays_d = frame.rays_d[sample_mask]
不知道对整体有什么影响吗

JunyuanDeng · 2024-01-17T09:45:02Z

Tried to allocate 5.50 GiB (GPU 0; 23.69 GiB total capacity; 5.14 GiB already allocated; 5.22 GiB free; 7.45 GiB reserved in total by PyTorch)
5G已经分配，7.5G reserved，理论上24G的卡，你还有11.5G的显存剩余，请问您还运行这别的算法吗，你可以用subscene分支来跑，这样会更快。

iason-r · 2024-01-20T07:31:35Z

作者您好，我这一段时间一直在尝试运行咱们的包，但是一直卡在OutOfMemory这个问题，在解决这个问题的过程中我想学习一下咱们的代码，请问关于学习我们的代码以及深度学习这方面您有什么建议吗，我今年研一，入学之后一直在做项目，刚开始做科研，之前没有深入了解过深度学习

JunyuanDeng · 2024-01-21T07:23:27Z

如果没有了解过深度学习，你可以先学习一下Dive-into-DL-PyTorch这本书，中英文版都有。当然你也可以先学习一下机器学习，如果时间赶就不用了。学了基本的深度学习知识后，如果你想知道slam的东西，还得学习一下SLAM14讲这本书，了解基础知识。然后这些看完就有基本的slam和深度学习知识，之后再看看nerf的pytorch实现代码，了解nerf的原理，最后就可以看看有哪些nerf-slam的内容了，比如这个库，找找里面引用和star多的库看看。

iason-r · 2024-01-22T12:45:35Z

好的，谢谢您了
这两天跑通了，但是跑了几次都跑飞了，感觉定位不太好呀

JunyuanDeng · 2024-01-22T12:54:05Z

是跑kitti吗？如果是别的场景，可能需要调整一下学习率。

iason-r · 2024-01-22T12:54:47Z

跑的kitti00

hhongwei1009 · 2024-06-20T03:43:57Z

好的，谢谢您了这两天跑通了，但是跑了几次都跑飞了，感觉定位不太好呀

请问你是怎么跑通的啊，我也陷入了out of memory

boyang9602 · 2024-09-02T19:46:41Z

@hhongwei1009 你是什么显卡? 我用的和作者一样的，可以跑通的
我试了KITTI 09，用的默认的09的配置，结果跟论文中差不多(稍微差一点)

APE w.r.t. translation part (m)
(with SE(3) Umeyama alignment)

       max	16.220123
      mean	5.422567
    median	3.834926
       min	0.216227
      rmse	7.052621
       sse	79135.482800
       std	4.509460

hhongwei1009 · 2024-10-29T06:02:07Z

@hhongwei1009 你是什么显卡? 我用的和作者一样的，可以跑通的我试了KITTI 09，用的默认的09的配置，结果跟论文中差不多(稍微差一点)
APE w.r.t. translation part (m)
(with SE(3) Umeyama alignment)

       max	16.220123
      mean	5.422567
    median	3.834926
       min	0.216227
      rmse	7.052621
       sse	79135.482800
       std	4.509460

我是4080S，已经放弃了

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

torch.cuda.OutOfMemoryError #15

torch.cuda.OutOfMemoryError #15

iason-r commented Dec 23, 2023

JunyuanDeng commented Dec 23, 2023

iason-r commented Jan 3, 2024

JunyuanDeng commented Jan 3, 2024

iason-r commented Jan 3, 2024

iason-r commented Jan 3, 2024

JunyuanDeng commented Jan 3, 2024

iason-r commented Jan 16, 2024

JunyuanDeng commented Jan 16, 2024

iason-r commented Jan 17, 2024

iason-r commented Jan 17, 2024

JunyuanDeng commented Jan 17, 2024

iason-r commented Jan 20, 2024

JunyuanDeng commented Jan 21, 2024

iason-r commented Jan 22, 2024

JunyuanDeng commented Jan 22, 2024

iason-r commented Jan 22, 2024

hhongwei1009 commented Jun 20, 2024 •

edited

Loading

boyang9602 commented Sep 2, 2024 •

edited

Loading

hhongwei1009 commented Oct 29, 2024

torch.cuda.OutOfMemoryError #15

torch.cuda.OutOfMemoryError #15

Comments

iason-r commented Dec 23, 2023

JunyuanDeng commented Dec 23, 2023

iason-r commented Jan 3, 2024

JunyuanDeng commented Jan 3, 2024

iason-r commented Jan 3, 2024

iason-r commented Jan 3, 2024

JunyuanDeng commented Jan 3, 2024

iason-r commented Jan 16, 2024

JunyuanDeng commented Jan 16, 2024

iason-r commented Jan 17, 2024

iason-r commented Jan 17, 2024

JunyuanDeng commented Jan 17, 2024

iason-r commented Jan 20, 2024

JunyuanDeng commented Jan 21, 2024

iason-r commented Jan 22, 2024

JunyuanDeng commented Jan 22, 2024

iason-r commented Jan 22, 2024

hhongwei1009 commented Jun 20, 2024 • edited Loading

boyang9602 commented Sep 2, 2024 • edited Loading

hhongwei1009 commented Oct 29, 2024

hhongwei1009 commented Jun 20, 2024 •

edited

Loading

boyang9602 commented Sep 2, 2024 •

edited

Loading