Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for METIS to partition a cpgrid #725

Merged
merged 16 commits into from
Jul 4, 2024

Conversation

lisajulia
Copy link
Contributor

@lisajulia lisajulia commented Jun 3, 2024

Add support for METIS to partition a cpgrid.

For the reference manual:
This PR and #733, #736, #737, #738 and OPM/opm-simulators#5477 belong together.
Now, a grid can be partitioned using the METIS partitioner or the Zoltan partitioner (which was the only one available before):

  • To use METIS, start flow with --partition-method=2 --edge-weights-method=2 --imbalance-tol=1.1 --metis-params=metis-params.json".
  • The zoltan imbalance parameter has been renamed to --imbalance-parameter (since now that is the parameter used for both partitioners)
  • With the parameter --partition-method, you can select the partitioning method: 1 is zoltan, 2 is metis.
  • To use METIS, you need to install libmetis-dev.
  • You can give flow a params file, the parameters are listed here file:///home/lnebel/Downloads/manual.pdf
  • I've encountered some problems when Metis is compiled with 32-bit integers and when edge-weights-method=1 (default), see also here: Add support for METIS to partition a cpgrid  #725 (comment)
    For that reason: You either should use Metis with 64-bit integers (for this I had to download metis and compile it myself) or try the log-edge-weights-method.
  • To use ScotchMETIS, you need to install libscotchmetis-dev, then there are no specific METIS parameters available. Then the imbalance parameter is interpreted differently: for scotchmetis, it needs to between 0 and 1 where for metis > 1. To account for that, we subtract 1, when ScotchMETIS is used and the imbalance parameter is > 1.

@lisajulia lisajulia requested a review from blattms June 3, 2024 11:32
@lisajulia
Copy link
Contributor Author

jenkins build this please

@lisajulia lisajulia marked this pull request as draft June 3, 2024 11:34
Copy link
Member

@blattms blattms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks quite good.

opm/grid/common/MetisPartition.cpp Outdated Show resolved Hide resolved
opm/grid/common/MetisPartition.cpp Outdated Show resolved Hide resolved
opm/grid/common/ZoltanGraphFunctions.cpp Outdated Show resolved Hide resolved
opm/grid/common/ZoltanGraphFunctions.cpp Outdated Show resolved Hide resolved
opm/grid/common/ZoltanGraphFunctions.cpp Outdated Show resolved Hide resolved
opm/grid/common/MetisPartition.cpp Outdated Show resolved Hide resolved
opm/grid/common/MetisPartition.cpp Outdated Show resolved Hide resolved
opm/grid/common/MetisPartition.cpp Outdated Show resolved Hide resolved
@lisajulia lisajulia force-pushed the feature/metis-support branch 2 times, most recently from 3d1f269 to 8b2780b Compare June 4, 2024 15:52
@lisajulia
Copy link
Contributor Author

@blattms: I've added my most recent changes, I've briefly tested SPE9/SPE9_CP.DATA with 4 processes and these are the results:
METIS:
Screenshot from 2024-06-04 17-59-00
Zoltan:
Screenshot from 2024-06-04 19-00-27
'Nothing' (for reference):
Screenshot from 2024-06-04 18-58-14

@lisajulia
Copy link
Contributor Author

lisajulia commented Jun 5, 2024

Update: When doing the "dry run" for the spe11c_cp case on 6 processes I get the picture below (for Zoltan that looks better) and on 14 processes (which is the maximum on my machine) I even get another segmentation fault.. :(
I'll try to reproduce something similar with an "easier" problem and investigate different Metis parameters

Metis, dry run with 6 processes:
Screenshot from 2024-06-05 08-06-26

Zoltan, dry run with 6 processes:
Screenshot from 2024-06-05 08-23-36

@lisajulia
Copy link
Contributor Author

lisajulia commented Jun 6, 2024

@blattms: For a grid without wells, this would be the spot where the edge weights are set, right?

And here for a grid with wells, right?

Where does the vector float *ewgts get filled in the case without wells?

As far as I see it, I'm filling everything correctly, since I have extracted the respective things to this function, which I am calling from the METIS code..

@lisajulia lisajulia force-pushed the feature/metis-support branch from 8b2780b to 20d857c Compare June 14, 2024 08:57
@lisajulia lisajulia marked this pull request as ready for review June 14, 2024 08:58
@blattms
Copy link
Member

blattms commented Jun 14, 2024

hey, maybe we can improve the title a bit. That would be really cool. I am having a look at the rest now,

@lisajulia lisajulia changed the title Feature/metis support Add support for METIS to partition a cpgrid Jun 14, 2024
@lisajulia lisajulia force-pushed the feature/metis-support branch from 20d857c to 711308c Compare June 14, 2024 11:55
@lisajulia
Copy link
Contributor Author

jenkins build this please

Copy link
Member

@blattms blattms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Just a few comment.

I like the cleanup that you did for the zoltan code. 🦾

Note that in CMakeLists.txt we still require ZOLTAN by default (option REQUIRE_ZOLTAN is ON). Maybe should change that and make sure that either ZOLTAN or METIS is there if the user requires a full-fledged partitioner. This is also fine in a separate PR:

opm/grid/common/MetisPartition.cpp Show resolved Hide resolved
opm/grid/common/MetisPartition.cpp Outdated Show resolved Hide resolved
opm/grid/common/MetisPartition.hpp Outdated Show resolved Hide resolved
opm/grid/common/ZoltanGraphFunctions.cpp Outdated Show resolved Hide resolved
opm/grid/cpgrid/CpGrid.cpp Outdated Show resolved Hide resolved
opm/grid/cpgrid/CpGrid.cpp Show resolved Hide resolved
opm/grid/cpgrid/CpGrid.cpp Outdated Show resolved Hide resolved
@blattms
Copy link
Member

blattms commented Jun 14, 2024

benchmark please

@lisajulia
Copy link
Contributor Author

Update: When doing the "dry run" for the spe11c_cp case on 6 processes I get the picture below (for Zoltan that looks better) and on 14 processes (which is the maximum on my machine) I even get another segmentation fault.. :( I'll try to reproduce something similar with an "easier" problem and investigate different Metis parameters

Metis, dry run with 6 processes: Screenshot from 2024-06-05 08-06-26

Zoltan, dry run with 6 processes: Screenshot from 2024-06-05 08-23-36

For the record: I found the error here: METIS need to be compiled with 64-bit integers for this to work with the default edgeWeightMethod, I've added a check for this https://github.com/OPM/opm-grid/pull/725/files#diff-1c35cc536b7520ac857320ed480551e301d775a8c9d51300ea2e01b76489769aR54

@lisajulia lisajulia force-pushed the feature/metis-support branch from e1ee426 to 6bb1729 Compare June 14, 2024 13:15
@lisajulia lisajulia force-pushed the feature/metis-support branch from a400cb1 to e1c48f7 Compare June 14, 2024 14:32
@lisajulia lisajulia force-pushed the feature/metis-support branch from e1c48f7 to d9bc594 Compare June 14, 2024 14:36
@lisajulia
Copy link
Contributor Author

jenkins build this please

@lisajulia lisajulia force-pushed the feature/metis-support branch from 7a04cf6 to c3e17d1 Compare June 24, 2024 12:06
@lisajulia
Copy link
Contributor Author

jenkins build this opm-common=4115 please

1 similar comment
@lisajulia
Copy link
Contributor Author

jenkins build this opm-common=4115 please

@lisajulia lisajulia force-pushed the feature/metis-support branch from c3e17d1 to a29048d Compare June 25, 2024 07:35
@lisajulia
Copy link
Contributor Author

jenkins build this opm-common=4115 please

@lisajulia
Copy link
Contributor Author

jenkins build this opm-common=4120 please

@lisajulia lisajulia force-pushed the feature/metis-support branch from a29048d to 72f49cf Compare June 25, 2024 07:44
@lisajulia
Copy link
Contributor Author

jenkins build this opm-common=4120 please

1 similar comment
@lisajulia
Copy link
Contributor Author

jenkins build this opm-common=4120 please

@lisajulia lisajulia force-pushed the feature/metis-support branch 3 times, most recently from 2114747 to 136c1ad Compare June 25, 2024 13:27
lisajulia and others added 3 commits June 25, 2024 15:59
For this:
- Include scotch.h in MetisPartition and in the opm-grid-prereqs.cmake
- Manually set the SCOTCH_METIS_VERSION to 5
- Do not make METIS Options available when using scotch-metis
When compiling shared libaries we got linker errors for functions that
were defined multiple times.
…orSpecificCellAndIncrementNeighborCounter and fillNBORGIDAndWeightsForSpecificCellAndIncrementNeighborCounterForGridWithWells for the case ID = long
@lisajulia lisajulia force-pushed the feature/metis-support branch from 136c1ad to eeb7f19 Compare June 25, 2024 14:00
@lisajulia
Copy link
Contributor Author

jenkins build this opm-common=4120 please

@lisajulia
Copy link
Contributor Author

rerun "jenkins build this opm-common=4120 please" as soon as Debian has been updated on Jenkins

@lisajulia
Copy link
Contributor Author

jenkins build this please

@lisajulia
Copy link
Contributor Author

@blattms Finally green!

@blattms blattms merged commit 5fa1688 into OPM:master Jul 4, 2024
1 check passed
@lisajulia lisajulia deleted the feature/metis-support branch July 4, 2024 11:00
@alfbr
Copy link
Member

alfbr commented Jul 11, 2024

I am afraid this broke building on Red Hat 8 over here. Does it introduce a hard dependency on Ptscotch?

@blattms
Copy link
Member

blattms commented Jul 11, 2024

Not intentionally. I did not discover this while reviewing at least. There is a version assumption on metis/scotch somehwere. Maybe Redhat 8 ships an older version?

Would be cool if you would share the build error.

@lisajulia
Copy link
Contributor Author

Uh ok - yes please share the build error and then I have a look!
I've added this line, but that does not have the "Required": https://github.com/OPM/opm-grid/pull/725/files#diff-c7b77f1ec7a3026e3a527b6133ec1dcfc1dc4b5e910ed77b1e2bf350a733cbcdR33

@lisajulia
Copy link
Contributor Author

@alfbr: Can you build flow again with this fix here: OPM/opm-common#4140?
I hope that solves the problem!
If yes, then I'm sorry for the error and thanks for notifying!
If not, then I would need more info to reproduce this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants