Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems in using concurrent solver, both tny and omp not works #780

Open
richardclli opened this issue Jan 22, 2024 · 10 comments
Open

Problems in using concurrent solver, both tny and omp not works #780

richardclli opened this issue Jan 22, 2024 · 10 comments

Comments

@richardclli
Copy link

richardclli commented Jan 22, 2024

Describe the bug

SCIPOPT Suite version = 8.1.0
Compile options: TPI=tny
Problem: cause segmentation fault

SCIPOPT Suite version = 8.1.0
Compile options: TPI=omp
Problem: Forced to call SCIPsolve() instead of SCIPsolveConcurrent() because SCIPtpiGetNumThreads() returns 1
Workaround: Modify scip.pxi and force to skip the check can run concurrent solve successfully

System

  • OS: OpenSUSE Leap 15.5
  • Version 4.4.0
  • SCIP version 8.1.0
  • How did you install pyscipopt? Self compile both scipopt suite and PySCIPOpt

Additional context
Add any other context about the problem here.

@Joao-Dionisio
Copy link
Collaborator

Hello @richardclli! Thanks for your issue. Can you please just confirm whether SCIP is actually using parallelism to solve your problem, and isn't just using a single thread when you call solveConcurrent?

@richardclli
Copy link
Author

richardclli commented Jan 24, 2024

Yes, concurrent solving is reported in the log. So the case using TPI=omp is working properly when I just bypass the SCIPtpiGetNumThreads() checking. I am checking the source code why SCIPtpiGetNumThreads() returns 1 when using TNY=omp, but no clues yet.

initializing seeds to 1963210296 in concurrent solver 'scip-2'
initializing seeds to 1332858414 in concurrent solver 'scip-3'
initializing seeds to 1541326760 in concurrent solver 'scip-4'
initializing seeds to 247360965 in concurrent solver 'scip-5'
initializing seeds to 387742462 in concurrent solver 'scip-6'
initializing seeds to 520723434 in concurrent solver 'scip-7'
initializing seeds to 1176648445 in concurrent solver 'scip-8'
starting solve in concurrent solver 'scip-3'
starting solve in concurrent solver 'scip-1'
starting solve in concurrent solver 'scip-2'
starting solve in concurrent solver 'scip-8'
starting solve in concurrent solver 'scip-6'
starting solve in concurrent solver 'scip-7'
starting solve in concurrent solver 'scip-5'
starting solve in concurrent solver 'scip-4'

Hello @richardclli! Thanks for your issue. Can you please just confirm whether SCIP is actually using parallelism to solve your problem, and isn't just using a single thread when you call solveConcurrent?

@Alpha-Girl
Copy link

Alpha-Girl commented Jan 25, 2024

I face the same problem.
Describe the bug

SCIPOPT Suite version = 8.1.0
Compile options: TPI=tny
Problem: cause segmentation fault Segmentation fault (core dumped)

System

  • OS: Ubuntu 22.04.3 LTS
  • Version 4.4.0
  • SCIP version 8.1.0
  • How did you install pyscipopt? Self compile both scipopt suite and PySCIPOpt

@Joao-Dionisio
Copy link
Collaborator

Hey @mmghannam, can you take a look at this? I haven't been able to figure out what's wrong. There have been some problems with this method over time, it seems.

@Joao-Dionisio
Copy link
Collaborator

Joao-Dionisio commented Feb 14, 2024

Hey, @richardclli @Alpha-Girl! I also need to use solveConcurrent, so I guess this is the best time to look into it :D

Can you give me a step-by-step on how you compiled SCIP with the parallelism option, and how you linked pyscipopt to it?

EDIT: I was finally able to use solveConcurrent. @richardclli, are you sure that when you are running PySCIPOpt you are linking to the correct SCIP?

@richardclli
Copy link
Author

richardclli commented Feb 20, 2024

Hey, @richardclli @Alpha-Girl! I also need to use solveConcurrent, so I guess this is the best time to look into it :D

Can you give me a step-by-step on how you compiled SCIP with the parallelism option, and how you linked pyscipopt to it?

EDIT: I was finally able to use solveConcurrent. @richardclli, are you sure that when you are running PySCIPOpt you are linking to the correct SCIP?

Yes, I am pretty sure about this. And I managed to make it works with the following tweaks:

  1. I compile everything from scratch
  2. I compile SCIPOPT with TPI=omp (not using tny, as it may not work in Linux as I found out in some other discussions, not sure the reason)
  3. I compile PySCIPOPT, just modified the code to not checking SCIPtpiGetNumThreads() and call solveConcurrent directly.

Now I am trying to see how concurrent solve be scale up, not sure if it can works well in the HPC (super computing) environment.

@Joao-Dionisio
Copy link
Collaborator

Yes, I am pretty sure about this. And I managed to make it works with the following tweaks:

  1. I compile everything from scratch
  2. I compile SCIPOPT with TPI=omp (not using tny, as it may not work in Linux as I found out in some other discussions, not sure the reason)
  3. I compile PySCIPOPT, just modified the code to not checking SCIPtpiGetNumThreads() and call solveConcurrent directly.

Now I am trying to see how concurrent solve be scale up, not sure if it can works well in the HPC (super computing) environment.

Interesting, I was able to compile with the tny option in Ubuntu. But please do let me know if the speedup is achieved! Cheers :)

@richardclli
Copy link
Author

Interesting, I was able to compile with the tny option in Ubuntu. But please do let me know if the speedup is achieved! Cheers :)

Yes, I can compile as well, but it will gives a core dump immediately when trying to solve.

@liangbug
Copy link
Contributor

liangbug commented Apr 8, 2024

I debugged it within pyscipopt and scip.

if SCIPtpiGetNumThreads() == 1:

As tpi=omp, SCIPtpiGetNumThreads() calls omp_get_num_threads() that always returns 1 due to not enclosing parallel region.
Only SCIPconcurrentSolve() uses the macro TPI_PARA that is omp parallel
https://github.com/scipopt/scip/blob/e4d2ae5dfab7d0945c0a4c0c63d21eb60c737839/src/tpi/tpi_openmp.c#L418
https://www.openmp.org/spec-html/5.0/openmpsu111.html

As tpi=tny, SCIPtpiGetNumThreads() returns _threadpool->nthreads which _threadpool is null and then _threadpool->nthreads causes segmentation fault.
_threadpool is initialized in SCIPsolveConcurrent(), SCIPtpiGetNumThreads() should not be called before executing SCIPsolveConcurrent().
https://github.com/scipopt/scip/blob/e4d2ae5dfab7d0945c0a4c0c63d21eb60c737839/src/tpi/tpi_tnycthrd.c#L577

I suggest that
the code scip.pxi def solveConcurrent(self):, remove below code.

if SCIPtpiGetNumThreads() == 1:
warnings.warn("SCIP was compiled without task processing interface. Parallel solve not possible - using optimize() instead of solveConcurrent()")
self.optimize()
else:

and the test code test_model.py, fix code as below
def test_solve_concurrent():
s = Model()
x = s.addVar("x", vtype = 'C', obj = 1.0)
y = s.addVar("y", vtype = 'C', obj = 2.0)
c = s.addCons(x + y <= 10.0)
s.setMaximize()
s.solveConcurrent()
assert s.getStatus() == 'optimal'
assert s.getObjVal() == 20.0

def test_solve_concurrent():
    s = Model()
    x = s.addVar("x", vtype = 'C', obj = 1.0)
    y = s.addVar("y", vtype = 'C', obj = 2.0)
    c = s.addCons(x + y <= 10.0)
    s.setPresolve(SCIP_PARAMSETTING.OFF)
    s.setMaximize()
    s.solveConcurrent()
    if s.getStage() != SCIP_STAGE.PROBLEM:
        assert s.getStatus() == 'optimal'
        assert s.getObjVal() == 20.0

@richardclli
Copy link
Author

I debugged it within pyscipopt and scip.

if SCIPtpiGetNumThreads() == 1:

As tpi=omp, SCIPtpiGetNumThreads() calls omp_get_num_threads() that always returns 1 due to not enclosing parallel region. Only SCIPconcurrentSolve() uses the macro TPI_PARA that is omp parallel https://github.com/scipopt/scip/blob/e4d2ae5dfab7d0945c0a4c0c63d21eb60c737839/src/tpi/tpi_openmp.c#L418 https://www.openmp.org/spec-html/5.0/openmpsu111.html

As tpi=tny, SCIPtpiGetNumThreads() returns _threadpool->nthreads which _threadpool is null and then _threadpool->nthreads causes segmentation fault. _threadpool is initialized in SCIPsolveConcurrent(), SCIPtpiGetNumThreads() should not be called before executing SCIPsolveConcurrent(). https://github.com/scipopt/scip/blob/e4d2ae5dfab7d0945c0a4c0c63d21eb60c737839/src/tpi/tpi_tnycthrd.c#L577

I suggest that the code scip.pxi def solveConcurrent(self):, remove below code.

if SCIPtpiGetNumThreads() == 1:
warnings.warn("SCIP was compiled without task processing interface. Parallel solve not possible - using optimize() instead of solveConcurrent()")
self.optimize()
else:

and the test code test_model.py, fix code as below

def test_solve_concurrent():
s = Model()
x = s.addVar("x", vtype = 'C', obj = 1.0)
y = s.addVar("y", vtype = 'C', obj = 2.0)
c = s.addCons(x + y <= 10.0)
s.setMaximize()
s.solveConcurrent()
assert s.getStatus() == 'optimal'
assert s.getObjVal() == 20.0

def test_solve_concurrent():
    s = Model()
    x = s.addVar("x", vtype = 'C', obj = 1.0)
    y = s.addVar("y", vtype = 'C', obj = 2.0)
    c = s.addCons(x + y <= 10.0)
    s.setPresolve(SCIP_PARAMSETTING.OFF)
    s.setMaximize()
    s.solveConcurrent()
    if s.getStage() != SCIP_STAGE.PROBLEM:
        assert s.getStatus() == 'optimal'
        assert s.getObjVal() == 20.0

Wow, nice catch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants