-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to solve this error when I finetune the model use custom dataset? #12
Comments
There are many possible causes for this error, such as discrepancies between the specified and actual number of GPUs, or an out-of-bounds index somewhere in the code. Without more information, I can't pinpoint the issue. The best solution would be to add the following lines to import os
os.environ['CUDA_LAUNCH_BLOCKING'] = '1' Alternatively, use debug mode to identify the specific location of the problem. |
我也遇到了这个问题,我的原因是因为我自定义的数据集的类别数量在配置文件中写少了,不匹配导致索引超出,然后报了一样的错误 |
感谢告知 |
src/data/datasetset/coco_dataset.py: category2label = kwargs.get('category2label', None) 这里代码建议修改: 使用coco格式的数据集,"category_id"自动从1开始的,而labels值需要从0开始,上述报错就是这个原因。 |
D-FINE/configs/dataset/obj365_detection.yml: evaluator: num_classes: 366 # 上面bug就是因为这个类别数设置的问题。如果是自定义数据集训练应该写成clsnum + 1 |
../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [30,0,0], thread: [92,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [30,0,0], thread: [93,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [30,0,0], thread: [94,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [30,0,0], thread: [95,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [30,0,0], thread: [32,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [30,0,0], thread: [33,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [30,0,0], thread: [34,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [30,0,0], thread: [35,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [30,0,0], thread: [36,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [30,0,0], thread: [37,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [30,0,0], thread: [38,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [30,0,0], thread: [39,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [30,0,0], thread: [40,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [30,0,0], thread: [41,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [30,0,0], thread: [42,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [30,0,0], thread: [43,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [30,0,0], thread: [44,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [30,0,0], thread: [45,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [30,0,0], thread: [46,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [30,0,0], thread: [47,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [30,0,0], thread: [48,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [30,0,0], thread: [49,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [30,0,0], thread: [50,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [30,0,0], thread: [51,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [30,0,0], thread: [52,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [30,0,0], thread: [53,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [30,0,0], thread: [54,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [30,0,0], thread: [55,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [30,0,0], thread: [56,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [30,0,0], thread: [57,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [30,0,0], thread: [58,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [30,0,0], thread: [59,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [30,0,0], thread: [60,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [30,0,0], thread: [61,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [30,0,0], thread: [62,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [30,0,0], thread: [63,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [138,0,0], thread: [0,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [138,0,0], thread: [1,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [138,0,0], thread: [2,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [138,0,0], thread: [3,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [138,0,0], thread: [4,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [138,0,0], thread: [5,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [138,0,0], thread: [6,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [138,0,0], thread: [7,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [138,0,0], thread: [8,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [138,0,0], thread: [9,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [138,0,0], thread: [10,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [138,0,0], thread: [11,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [138,0,0], thread: [12,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [138,0,0], thread: [13,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [138,0,0], thread: [14,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [138,0,0], thread: [15,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [138,0,0], thread: [16,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [138,0,0], thread: [17,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [138,0,0], thread: [18,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [138,0,0], thread: [19,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [138,0,0], thread: [20,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [138,0,0], thread: [21,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [138,0,0], thread: [22,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [138,0,0], thread: [23,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [138,0,0], thread: [24,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [138,0,0], thread: [25,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [138,0,0], thread: [26,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [138,0,0], thread: [27,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [138,0,0], thread: [28,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [138,0,0], thread: [29,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [138,0,0], thread: [30,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [138,0,0], thread: [31,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [4,0,0], thread: [96,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [4,0,0], thread: [97,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [4,0,0], thread: [98,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [4,0,0], thread: [99,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [4,0,0], thread: [100,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [4,0,0], thread: [101,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [4,0,0], thread: [102,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [4,0,0], thread: [103,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [4,0,0], thread: [104,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [4,0,0], thread: [105,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [4,0,0], thread: [106,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [4,0,0], thread: [107,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [4,0,0], thread: [108,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [4,0,0], thread: [109,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [4,0,0], thread: [110,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [4,0,0], thread: [111,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [4,0,0], thread: [112,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [4,0,0], thread: [113,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [4,0,0], thread: [114,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [4,0,0], thread: [115,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [4,0,0], thread: [116,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [4,0,0], thread: [117,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [4,0,0], thread: [118,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [4,0,0], thread: [119,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [4,0,0], thread: [120,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [4,0,0], thread: [121,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [4,0,0], thread: [122,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [4,0,0], thread: [123,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [4,0,0], thread: [124,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [4,0,0], thread: [125,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [4,0,0], thread: [126,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [4,0,0], thread: [127,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [42,0,0], thread: [96,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [42,0,0], thread: [97,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [42,0,0], thread: [98,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [42,0,0], thread: [99,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [42,0,0], thread: [100,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [42,0,0], thread: [101,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [42,0,0], thread: [102,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [42,0,0], thread: [103,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [42,0,0], thread: [104,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [42,0,0], thread: [105,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [42,0,0], thread: [106,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [42,0,0], thread: [107,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [42,0,0], thread: [108,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [42,0,0], thread: [109,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [42,0,0], thread: [110,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [42,0,0], thread: [111,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [42,0,0], thread: [112,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [42,0,0], thread: [113,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [42,0,0], thread: [114,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [42,0,0], thread: [115,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [42,0,0], thread: [116,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [42,0,0], thread: [117,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [42,0,0], thread: [118,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [42,0,0], thread: [119,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [42,0,0], thread: [120,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [42,0,0], thread: [121,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [42,0,0], thread: [122,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [42,0,0], thread: [123,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [42,0,0], thread: [124,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [42,0,0], thread: [125,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [42,0,0], thread: [126,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [42,0,0], thread: [127,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [151,0,0], thread: [96,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [151,0,0], thread: [97,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [151,0,0], thread: [98,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [151,0,0], thread: [99,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [151,0,0], thread: [100,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [151,0,0], thread: [101,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [151,0,0], thread: [102,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [151,0,0], thread: [103,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [151,0,0], thread: [104,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [151,0,0], thread: [105,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [151,0,0], thread: [106,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [151,0,0], thread: [107,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [151,0,0], thread: [108,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [151,0,0], thread: [109,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [151,0,0], thread: [110,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [151,0,0], thread: [111,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [151,0,0], thread: [112,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [151,0,0], thread: [113,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [151,0,0], thread: [114,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [151,0,0], thread: [115,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [151,0,0], thread: [116,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [151,0,0], thread: [117,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [151,0,0], thread: [118,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [151,0,0], thread: [119,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [151,0,0], thread: [120,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [151,0,0], thread: [121,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [151,0,0], thread: [122,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [151,0,0], thread: [123,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [151,0,0], thread: [124,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [151,0,0], thread: [125,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [151,0,0], thread: [126,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [151,0,0], thread: [127,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [43,0,0], thread: [96,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [43,0,0], thread: [97,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [43,0,0], thread: [98,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [43,0,0], thread: [99,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [43,0,0], thread: [100,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [43,0,0], thread: [101,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [43,0,0], thread: [102,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [43,0,0], thread: [103,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [43,0,0], thread: [104,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [43,0,0], thread: [105,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [43,0,0], thread: [106,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [43,0,0], thread: [107,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [43,0,0], thread: [108,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [43,0,0], thread: [109,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [43,0,0], thread: [110,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [43,0,0], thread: [111,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [43,0,0], thread: [112,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [43,0,0], thread: [113,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [43,0,0], thread: [114,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [43,0,0], thread: [115,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [43,0,0], thread: [116,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [43,0,0], thread: [117,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [43,0,0], thread: [118,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [43,0,0], thread: [119,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [43,0,0], thread: [120,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [43,0,0], thread: [121,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [43,0,0], thread: [122,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [43,0,0], thread: [123,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [43,0,0], thread: [124,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [43,0,0], thread: [125,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [43,0,0], thread: [126,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [43,0,0], thread: [127,0,0] Assertion
-sizes[i] <= index && index < sizes[i] && "index out of bounds"
failed.[rank1]:[E1030 19:36:07.513873021 ProcessGroupNCCL.cpp:1515] [PG 0 (default_pg) Rank 1] Process group watchdog thread terminated with exception: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with
TORCH_USE_CUDA_DSA
to enable device-side assertions.Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:43 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7f2f86a70f86 in /home/ahs/anaconda3/envs/py310/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7f2f86a1fd10 in /home/ahs/anaconda3/envs/py310/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7f2f86b4bf08 in /home/ahs/anaconda3/envs/py310/lib/python3.10/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x56 (0x7f2f87d683e6 in /home/ahs/anaconda3/envs/py310/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0xa0 (0x7f2f87d6d600 in /home/ahs/anaconda3/envs/py310/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::watchdogHandler() + 0x1da (0x7f2f87d742ba in /home/ahs/anaconda3/envs/py310/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x10c (0x7f2f87d766fc in /home/ahs/anaconda3/envs/py310/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #7: + 0xdbbf4 (0x7f2fd550fbf4 in /home/ahs/anaconda3/envs/py310/bin/../lib/libstdc++.so.6)
frame #8: + 0x76db (0x7f2fd75486db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #9: clone + 0x3f (0x7f2fd6acc61f in /lib/x86_64-linux-gnu/libc.so.6)
[rank3]:[E1030 19:36:07.515488087 ProcessGroupNCCL.cpp:1515] [PG 0 (default_pg) Rank 3] Process group watchdog thread terminated with exception: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with
TORCH_USE_CUDA_DSA
to enable device-side assertions.Exception raised from c10_cuda_check_implementation at ../c10/cuda/CUDAException.cpp:43 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x96 (0x7f6bcb2adf86 in /home/ahs/anaconda3/envs/py310/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7f6bcb25cd10 in /home/ahs/anaconda3/envs/py310/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #2: c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) + 0x118 (0x7f6bcb388f08 in /home/ahs/anaconda3/envs/py310/lib/python3.10/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10d::ProcessGroupNCCL::WorkNCCL::finishedGPUExecutionInternal() const + 0x56 (0x7f6bcc5a53e6 in /home/ahs/anaconda3/envs/py310/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #4: c10d::ProcessGroupNCCL::WorkNCCL::isCompleted() + 0xa0 (0x7f6bcc5aa600 in /home/ahs/anaconda3/envs/py310/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #5: c10d::ProcessGroupNCCL::watchdogHandler() + 0x1da (0x7f6bcc5b12ba in /home/ahs/anaconda3/envs/py310/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #6: c10d::ProcessGroupNCCL::ncclCommWatchdog() + 0x10c (0x7f6bcc5b36fc in /home/ahs/anaconda3/envs/py310/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so)
frame #7: + 0xdbbf4 (0x7f6c19d4cbf4 in /home/ahs/anaconda3/envs/py310/bin/../lib/libstdc++.so.6)
frame #8: + 0x76db (0x7f6c1bd856db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #9: clone + 0x3f (0x7f6c1b30961f in /lib/x86_64-linux-gnu/libc.so.6)
W1030 19:36:08.253000 140521457722304 torch/distributed/elastic/multiprocessing/api.py:858] Sending process 96054 closing signal SIGTERM
E1030 19:36:08.468000 140521457722304 torch/distributed/elastic/multiprocessing/api.py:833] failed (exitcode: -6) local_rank: 0 (pid: 96052) of binary: /home/ahs/anaconda3/envs/py310/bin/python
Traceback (most recent call last):
File "/home/ahs/anaconda3/envs/py310/bin/torchrun", line 8, in
sys.exit(main())
File "/home/ahs/anaconda3/envs/py310/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 348, in wrapper
return f(*args, **kwargs)
File "/home/ahs/anaconda3/envs/py310/lib/python3.10/site-packages/torch/distributed/run.py", line 901, in main
run(args)
File "/home/ahs/anaconda3/envs/py310/lib/python3.10/site-packages/torch/distributed/run.py", line 892, in run
elastic_launch(
File "/home/ahs/anaconda3/envs/py310/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 133, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/ahs/anaconda3/envs/py310/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
train.py FAILED
Failures:
[1]:
time : 2024-10-30_19:36:08
host : ahs-SYS-4029GP-TRTC-ZY001
rank : 1 (local_rank: 1)
exitcode : -6 (pid: 96053)
error_file: <N/A>
traceback : Signal 6 (SIGABRT) received by PID 96053
[2]:
time : 2024-10-30_19:36:08
host : ahs-SYS-4029GP-TRTC-ZY001
rank : 3 (local_rank: 3)
exitcode : -6 (pid: 96055)
error_file: <N/A>
traceback : Signal 6 (SIGABRT) received by PID 96055
Root Cause (first observed failure):
[0]:
time : 2024-10-30_19:36:08
host : ahs-SYS-4029GP-TRTC-ZY001
rank : 0 (local_rank: 0)
exitcode : -6 (pid: 96052)
error_file: <N/A>
traceback : Signal 6 (SIGABRT) received by PID 96052
The text was updated successfully, but these errors were encountered: