Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA Error while using Python solver #26

Closed
ocetintas opened this issue Feb 3, 2022 · 7 comments
Closed

CUDA Error while using Python solver #26

ocetintas opened this issue Feb 3, 2022 · 7 comments
Assignees

Comments

@ocetintas
Copy link

Hi,

For some graphs, I am encountering cudaErrorIllegalAddress error while running rama_cuda function. I am adding a script that reproduces this error in the next comment. Here is the full output:

Running solver type PD which offers best compute time versus quality tradeoff.
Going to use NVIDIA GeForce GTX 1050 Ti 6.1, device number 0
3-cycles: found # of triangles: 0, budget: 7004
3-cycles: found # of triangles: 0, budget: 7004
4-cycles: number of expansions: 89675
4-cycles: found # of triangles: 0, budget: 7004
3-cycles: found # of triangles: 0, budget: 7004
4-cycles: number of expansions: 89675
4-cycles: found # of triangles: 0, budget: 7004
5-cycles: number of expansions: 2270526
5-cycles: found # of triangles: 0, budget: 7004
initial energy = 41519
matched sum: 66, rel_increase: 66
matched sum: 132, rel_increase: 0.985075
matched sum: 198, rel_increase: 0.496241
matched sum: 264, rel_increase: 0.331658
matched sum: 330, rel_increase: 0.249057
matched sum: 396, rel_increase: 0.199396
matched sum: 462, rel_increase: 0.166247
matched sum: 528, rel_increase: 0.142549
matched sum: 594, rel_increase: 0.124764
matched sum: 660, rel_increase: 0.110924
# vertices = 2383
# matched edges = 330 / 48523
original A size 2383x2349
contracted A size 2053x2019
energy reduction 330
energy after iteration 0: 41189, #components = 2053
3-cycles: found # of triangles: 0, budget: 2571
matched sum: 72, rel_increase: 72
matched sum: 142, rel_increase: 0.958904
matched sum: 210, rel_increase: 0.475524
matched sum: 278, rel_increase: 0.322275
matched sum: 344, rel_increase: 0.236559
matched sum: 410, rel_increase: 0.191304
matched sum: 474, rel_increase: 0.155718
matched sum: 538, rel_increase: 0.134737
matched sum: 602, rel_increase: 0.118738
matched sum: 666, rel_increase: 0.106136
# vertices = 2053
# matched edges = 333 / 37814
original A size 2053x2019
contracted A size 1720x1687
energy reduction 821
energy after iteration 1: 40368, #components = 1720
3-cycles: found # of triangles: 0, budget: 1792
matched sum: 74, rel_increase: 74
matched sum: 146, rel_increase: 0.96
matched sum: 216, rel_increase: 0.47619
matched sum: 282, rel_increase: 0.304147
matched sum: 346, rel_increase: 0.226148
matched sum: 412, rel_increase: 0.190202
matched sum: 476, rel_increase: 0.154964
matched sum: 540, rel_increase: 0.134172
matched sum: 604, rel_increase: 0.118299
matched sum: 668, rel_increase: 0.105785
# vertices = 1720
# matched edges = 334 / 28267
original A size 1720x1687
contracted A size 1386x1353
energy reduction 1713
energy after iteration 2: 38655, #components = 1386
3-cycles: found # of triangles: 0, budget: 1203
matched sum: 72, rel_increase: 72
matched sum: 142, rel_increase: 0.958904
matched sum: 206, rel_increase: 0.447552
matched sum: 274, rel_increase: 0.328502
matched sum: 338, rel_increase: 0.232727
matched sum: 402, rel_increase: 0.188791
matched sum: 470, rel_increase: 0.168734
matched sum: 538, rel_increase: 0.144374
matched sum: 602, rel_increase: 0.118738
matched sum: 666, rel_increase: 0.106136
# vertices = 1386
# matched edges = 333 / 20020
original A size 1386x1353
contracted A size 1053x1021
energy reduction 3347
energy after iteration 3: 35308, #components = 1053
3-cycles: found # of triangles: 0, budget: 770
matched sum: 76, rel_increase: 76
matched sum: 144, rel_increase: 0.883117
matched sum: 212, rel_increase: 0.468966
matched sum: 278, rel_increase: 0.309859
matched sum: 342, rel_increase: 0.229391
matched sum: 404, rel_increase: 0.180758
matched sum: 468, rel_increase: 0.158025
matched sum: 530, rel_increase: 0.132196
matched sum: 592, rel_increase: 0.116761
matched sum: 654, rel_increase: 0.104553
# vertices = 1053
# matched edges = 327 / 12750
original A size 1053x1021
contracted A size 726x694
energy reduction 5538
energy after iteration 4: 29770, #components = 726
3-cycles: found # of triangles: 0, budget: 452
matched sum: 68, rel_increase: 68
matched sum: 134, rel_increase: 0.956522
matched sum: 200, rel_increase: 0.488889
matched sum: 264, rel_increase: 0.318408
matched sum: 326, rel_increase: 0.233962
matched sum: 388, rel_increase: 0.189602
matched sum: 446, rel_increase: 0.1491
matched sum: 502, rel_increase: 0.12528
matched sum: 556, rel_increase: 0.107356
matched sum: 608, rel_increase: 0.0933573
# vertices = 726
# matched edges = 304 / 6582
original A size 726x694
contracted A size 422x393
energy reduction 6775
energy after iteration 5: 22995, #components = 422
3-cycles: found # of triangles: 0, budget: 217
matched sum: 66, rel_increase: 66
matched sum: 132, rel_increase: 0.985075
matched sum: 194, rel_increase: 0.466165
matched sum: 254, rel_increase: 0.307692
matched sum: 304, rel_increase: 0.196078
matched sum: 350, rel_increase: 0.15082
matched sum: 388, rel_increase: 0.108262
matched sum: 388, rel_increase: 0
# vertices = 422
# matched edges = 194 / 2357
original A size 422x393
contracted A size 228x204
energy reduction 7963
energy after iteration 6: 15032, #components = 228
3-cycles: found # of triangles: 0, budget: 95
matched sum: 64, rel_increase: 64
matched sum: 126, rel_increase: 0.953846
matched sum: 178, rel_increase: 0.409449
matched sum: 214, rel_increase: 0.201117
matched sum: 214, rel_increase: 0
# vertices = 228
# matched edges = 107 / 693
original A size 228x204
contracted A size 121x115
energy reduction 7655
energy after iteration 7: 7377, #components = 121
3-cycles: found # of triangles: 0, budget: 41
matched sum: 62, rel_increase: 62
matched sum: 104, rel_increase: 0.666667
matched sum: 104, rel_increase: 0
# vertices = 121
# matched edges = 52 / 193
original A size 121x115
contracted A size 68x61
energy reduction 6249
energy after iteration 8: 1128, #components = 68
3-cycles: found # of triangles: 0, budget: 21
matched sum: 56, rel_increase: 56
matched sum: 56, rel_increase: 0
# vertices = 68
# matched edges = 28 / 56
original A size 68x61
contracted A size 40x35
energy reduction 4479
energy after iteration 9: -3351, #components = 40
3-cycles: found # of triangles: 0, budget: 10
matched sum: 12, rel_increase: 12
matched sum: 12, rel_increase: 0
# vertices = 40
# matched edges = 6 / 16
original A size 40x35
contracted A size 33x28
energy reduction 151
energy after iteration 10: -3502, #components = 33
terminate called after throwing an instance of 'thrust::system::system_error'
  what():  CUDA free failed: cudaErrorIllegalAddress: an illegal memory access was encountered
Aborted
@ocetintas
Copy link
Author

ocetintas commented Feb 3, 2022

You can reproduce this issue with the objects in this link using the following script:

import pickle
import rama_py

with open("ix0", "rb") as fp:
	ix0 = pickle.load(fp)

with open("ix1", "rb") as fp:
	ix1 = pickle.load(fp)


with open("costs", "rb") as fp:
	costs = pickle.load(fp)

rama_py.rama_cuda(ix0, ix1, costs, rama_py.multicut_solver_options("PD"))

@aabbas90 aabbas90 self-assigned this Feb 3, 2022
@aabbas90
Copy link
Collaborator

aabbas90 commented Feb 4, 2022

Hi, Thanks for reporting. The issue is actually just in the way multicut instance is being created. So in RAMA we expect that node indices always start from 0 and there are no missing node indices. For example on a graph with 1000 nodes, the node indices should be in [0, 999]. After preprocessing your instance to satisfy this requirement with the following code, the instance runs successfully (actually RAMA finds optimal solution!).

import pickle
import rama_py
import numpy as np

with open("ix0", "rb") as fp:
   ix0 = pickle.load(fp)

with open("ix1", "rb") as fp:
   ix1 = pickle.load(fp)


with open("costs", "rb") as fp:
   costs = pickle.load(fp)
ix0 = np.array(ix0)
ix1 = np.array(ix1)
unique_node_ids = np.unique(np.concatenate((ix0, ix1), 0))
if(unique_node_ids[-1] != unique_node_ids.shape[0] - 1):
   print("Mapping node IDs such that they are contiguous and no node is missing.")
   ids_mapping = np.zeros((unique_node_ids[-1] + 1), dtype = np.int64)
   ids_mapping[unique_node_ids] = 1
   ids_mapping = np.cumsum(ids_mapping) - 1
   ix0 = np.take_along_axis(ids_mapping, ix0, 0)
   ix1 = np.take_along_axis(ids_mapping, ix1, 0)

costs = np.array(costs)
rama_py.rama_cuda(ix0, ix1, costs, rama_py.multicut_solver_options("PD"))

@ocetintas
Copy link
Author

Hi, thanks for your quick response! I can confirm that the revised script works. Furthermore, I've conducted few more experiments with similar type of graphs and did not encounter an issue. As you mentioned, one should first map the edge indices so that node ids are consecutive. Afterward, the clusters can be "unmapped" if need be. I close this issue now. Thank you for open-sourcing your great work!

@jpapon
Copy link

jpapon commented May 17, 2022

Hello, sorry for posting on a closed issue, but I ran in to this issue sporadically before I started sanitizing my input as above.
Do you know where this crash might occur if the edge list is missing nodes? Trying to test and reproduce it, but all small examples I tried such as the following did not cause a crash:
el=[0,1,2,4,5] er=[5,4,0,1,2] ew=[1,1,1,1,1]

@aabbas90
Copy link
Collaborator

Hello, sorry for posting on a closed issue, but I ran in to this issue sporadically before I started sanitizing my input as above. Do you know where this crash might occur if the edge list is missing nodes? Trying to test and reproduce it, but all small examples I tried such as the following did not cause a crash: el=[0,1,2,4,5] er=[5,4,0,1,2] ew=[1,1,1,1,1]

Hi, difficult to say. Perhaps you can sanitize the above example you gave and see the results. Then check where the results
start to mismatch (e.g. by

inline void print_vector(const thrust::device_vector<T>& v, const char* name, const int num = 0)
) if you send incorrect format.

@jpapon
Copy link

jpapon commented May 18, 2022

Okay, I can try that. The issue I'm facing is I have a graph with some nodes that are disconnected, and while it usually works fine, it occasionally crashes when I give it that (i.e. an edge list in which a few nodes do not appear).
Renumbering works consistently without issue, so it's not a huge problem, was just curious to see if there was some particular function imposing this constraint.

@aabbas90
Copy link
Collaborator

Hi @jpapon and @ocetintas,

Just wanted to update that I have added a functionality for sanitizing the input graph and perform the inversion already inside the RAMA code. Please check the section https://github.com/pawelswoboda/RAMA/blob/master/README.md#input-format for usage (if you need).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants