Floating point exception (core dumped) #725

Kitsunetic · 2024-10-30T23:07:16Z

I always get floating point exception while I'm using SubMConv3d.

Here is my test code:

import torch as th
from spconv.pytorch import SubMConv3d, SparseConvTensor

xyz = th.randint(0, 32, (1000, 4), dtype=th.int64, device='cuda')
xyz[:, 0] = 0
feat = th.randn(1000, 32, device='cuda', dtype=th.float32)
sp = SparseConvTensor(feat, xyz, (32, 32, 32), 1, 1, 1)

conv = SubMConv3d(32, 64, 3).cuda()
conv(sp)

>>> Floating point exception (core dumped)

I'm using PyTorch 2.3.0 with CUDA 11.8, and spconv-cu18==2.3.6.
Is there something wrong in my code, or someone knows the clue?

I have tested with A5000 and RTX 2080Ti GPUs but the result was always same.

shim94kr · 2024-11-06T07:06:28Z

I'm experiencing the exact same issue.

I've found that it works fine with kernel_size=1, but consistently crashes with kernel_size=3 or any other size.

@Kitsunetic Have you fixed this issue?

Kitsunetic · 2024-11-07T12:53:02Z

I'm experiencing the exact same issue.

I've found that it works fine with kernel_size=1, but consistently crashes with kernel_size=3 or any other size.

@Kitsunetic Have you fixed this issue?

No, I'm still figuring out the solution.

shim94kr · 2024-11-07T13:23:50Z

I found that downgrading PyTorch to version 2.2.2 resolves the issue.

Kitsunetic · 2024-11-07T13:51:23Z

which cuda version did you use?

shim94kr · 2024-11-07T13:53:30Z

I use CUDA 12.1, and I installed spconv-cu120.

Kitsunetic · 2024-11-11T10:18:06Z

Unfortunately, I'm still getting same issue with my retrial on nvidia/cuda:12.1.0-cudnn8-devel-ubuntu22.04 docker image with Pytorch 2.2.2 with CUDA 12.1. I have tested with both ubuntu 22.04 and 20.04. Couly you give me more detail about your environment?

shim94kr · 2024-11-11T14:13:03Z

I set up the environment using the following .yaml file with conda env create -f ***.yaml. This is a different .yaml file than the one referenced in Issue #317, particularly with torch and torchvision configurations.

name: pointcept
channels:
  - pyg
  - pytorch
  - nvidia/label/cuda-12.1.1
  - nvidia
  - bioconda
  - conda-forge
  - defaults
dependencies:
  - python=3.9
  - pip
  - cuda
  - conda-forge::cudnn
  - gcc=12.1
  - gxx=12.1
  - pytorch=2.2.2
  - torchvision=0.17.2
  - pytorch-cuda=12.1
  - ninja
  - google-sparsehash
  - h5py
  - pyyaml
  - tensorboard
  - tensorboardx
  - yapf
  - addict
  - einops
  - scipy
  - plyfile
  - termcolor
  - timm
  - ftfy
  - regex
  - tqdm
  - matplotlib
  - black
  - open3d
  - pytorch-cluster
  - pytorch-scatter
  - pytorch-sparse
  - pip:
    - torch_geometric
#    - spconv-cu120
    - git+https://github.com/octree-nn/ocnn-pytorch.git
    - git+https://github.com/openai/CLIP.git
    - git+https://github.com/Dao-AILab/flash-attention.git
    - ./libs/pointops
    - ./libs/pointgroup_ops

After this setup, I installed the following additional components:

cd libs/pointops
python setup.py install 
cd ../..

pip install spconv-cu120

Kitsunetic · 2024-11-12T02:57:54Z

Thank you for sharing.
However... I'm still getting same error even with environment based on provided yaml file.
I expect this is not only the problem of dependencies, but also entire environment like OS can be related. So, I'm still figuring out the reason.
Anyway, thank you again for your sharing! If you found another clue, please share with me!

JunseoMin · 2024-11-21T09:03:39Z

Hi,

I have the same issue...
Please share it with me if you solve this issue.

in my case, the exception occurs when the kernel size = 3

Thanks!

Ecalpal · 2024-11-24T14:07:08Z

same as you
python 3.11, pytorch 2.5.0, cuda 12.1

I found that downgrading PyTorch to version 2.2.2 resolves the issue.

and this works

GCChen97 · 2024-12-13T06:29:31Z

It seems to do with numpy version >= 2.0.0. I installed numpy 1.26.4 and spconv.pytorch.ops.implicit_gemm can be called without raising Floating point exception (core dumped). Something is wrong with the arg masks of implicit_gemm which is also a numpy array.

JunseoMin · 2024-12-16T07:38:40Z

It seems to do with numpy version >= 2.0.0. I installed numpy 1.26.4 and spconv.pytorch.ops.implicit_gemm can be called without raising Floating point exception (core dumped). Something is wrong with the arg masks of implicit_gemm which is also a numpy array.

This worked for me! Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Floating point exception (core dumped) #725

Floating point exception (core dumped) #725

Kitsunetic commented Oct 30, 2024 •

edited

Loading

shim94kr commented Nov 6, 2024 •

edited

Loading

Kitsunetic commented Nov 7, 2024

shim94kr commented Nov 7, 2024 •

edited

Loading

Kitsunetic commented Nov 7, 2024

shim94kr commented Nov 7, 2024

Kitsunetic commented Nov 11, 2024

shim94kr commented Nov 11, 2024 •

edited

Loading

Kitsunetic commented Nov 12, 2024

JunseoMin commented Nov 21, 2024 •

edited

Loading

Ecalpal commented Nov 24, 2024

GCChen97 commented Dec 13, 2024

JunseoMin commented Dec 16, 2024

Floating point exception (core dumped) #725

Floating point exception (core dumped) #725

Comments

Kitsunetic commented Oct 30, 2024 • edited Loading

shim94kr commented Nov 6, 2024 • edited Loading

Kitsunetic commented Nov 7, 2024

shim94kr commented Nov 7, 2024 • edited Loading

Kitsunetic commented Nov 7, 2024

shim94kr commented Nov 7, 2024

Kitsunetic commented Nov 11, 2024

shim94kr commented Nov 11, 2024 • edited Loading

Kitsunetic commented Nov 12, 2024

JunseoMin commented Nov 21, 2024 • edited Loading

Ecalpal commented Nov 24, 2024

GCChen97 commented Dec 13, 2024

JunseoMin commented Dec 16, 2024

Kitsunetic commented Oct 30, 2024 •

edited

Loading

shim94kr commented Nov 6, 2024 •

edited

Loading

shim94kr commented Nov 7, 2024 •

edited

Loading

shim94kr commented Nov 11, 2024 •

edited

Loading

JunseoMin commented Nov 21, 2024 •

edited

Loading