Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

{phys}[gompi/2022a] libGridXC v1.1.0 #17768

Merged

Conversation

ahkole
Copy link
Contributor

@ahkole ahkole commented Apr 20, 2023

(created using eb --new-pr)

sources = [SOURCELOWER_TAR_GZ]
checksums = ['e7883e57a4db2438ee59437740291c06e0cfe1c8ae1901e4001f32312307e46a']

configopts = "-DWITH_MPI=ON -DWITH_LIBXC=ON"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, we can't do both mpi and non-mpi version of this anymore? I'll admit I'm not super familiar with this software or the stuff that uses it, perhaps the non-mpi versions was pointless.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, we can't do both mpi and non-mpi version of this anymore? I'll admit I'm not super familiar with this software or the stuff that uses it, perhaps the non-mpi versions was pointless.

That seems to be a downside of the new cmake build system that they have implemented yes. As far as I could find in the installation instructions there no longer is an option to request a multiconfig build and it always builds the library with the same name (so redoing a second time with/without MPI would just overwrite the previous build). I personally never used the non-MPI version (and don't think there is a large use case for it), but nevertheless I could ask the developers if they know if there's still a way to have two versions?

@Micket Micket added the update label Apr 20, 2023
Co-authored-by: Mikael Öhman <micketeer@gmail.com>
@Micket
Copy link
Contributor

Micket commented Apr 21, 2023

Test report by @Micket
FAILED
Build succeeded for 0 out of 1 (1 easyconfigs in total)
vera-c1 - Linux Rocky Linux 8.6, x86_64, Intel Xeon Processor (Skylake), Python 3.6.8
See https://gist.github.com/Micket/3a74c927ce8ff9fc3e88a565defc9b6c for a full test report.

Copy link
Contributor

@Micket Micket left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got a build failure, i suspect out-of-order parallel compilation since it's fortran

  107 |     use xc_f03_lib_m
      |         1
Fatal Error: Cannot open module file xc_f03_lib_m.mod for reading at (1): No such file or directory

I think this needs a parallel = 1

@ahkole
Copy link
Contributor Author

ahkole commented Apr 21, 2023

Got a build failure, i suspect out-of-order parallel compilation since it's fortran

  107 |     use xc_f03_lib_m
      |         1
Fatal Error: Cannot open module file xc_f03_lib_m.mod for reading at (1): No such file or directory

I think this needs a parallel = 1

@Micket Thanks for noticing this! I guess my build just completed without errors due to the luck. I have added the parallel = 1 to force serial compilation.

@Micket
Copy link
Contributor

Micket commented Apr 21, 2023

Test report by @Micket
FAILED
Build succeeded for 0 out of 1 (1 easyconfigs in total)
vera-c1 - Linux Rocky Linux 8.6, x86_64, Intel Xeon Processor (Skylake), Python 3.6.8
See https://gist.github.com/Micket/a65f6e8f924c5378459e1dfa850ce757 for a full test report.

@ahkole
Copy link
Contributor Author

ahkole commented Apr 21, 2023

Test report by @Micket FAILED Build succeeded for 0 out of 1 (1 easyconfigs in total) vera-c1 - Linux Rocky Linux 8.6, x86_64, Intel Xeon Processor (Skylake), Python 3.6.8 See https://gist.github.com/Micket/a65f6e8f924c5378459e1dfa850ce757 for a full test report.

I see that the build still fails for your test. On closer inspection, xc_f03_lib_m is actually a module from the dependency library libxc so this error cannot be caused by a parallel out-of-order build. Are you sure that it is available on the machine that is used to run the test? I looked in the log file of your test and the output of the configure step did seem to mention that the libxc library and corresponding Fortran interface was found but I don't see any -I/path/to/libxc/root/include flags in the compile commands that seem to be used. For example on the cluster I used to test my easyconfig there are flags like this -I/science-nfs/vsm01/projects/B2QC2/.local/easybuild/software/libxc/5.2.3-GCC-11.3.0/include in the logs.

@Micket
Copy link
Contributor

Micket commented Apr 21, 2023

Yes the libxc mod files are there in the dependency.

I spot

-- The Fortran compiler identification is GNU 11.3.0
-- Detecting Fortran compiler ABI info
-- Detecting Fortran compiler ABI info - done
-- Check for working Fortran compiler: /apps/Test2/software/OpenMPI/4.1.4-GCC-11.3.0/bin/mpifort - skipped
-- Found MPI_Fortran: /apps/Test2/software/OpenMPI/4.1.4-GCC-11.3.0/bin/mpifort (found version "3.1")
-- Found MPI: TRUE (found version "3.1")
-- Found PkgConfig: /usr/bin/pkg-config (found version "1.4.2")
-- Checking for module 'libxcf90'
--   Found libxcf90, version 5.2.3
-- Checking for module 'libxc'
--   Found libxc, version 5.2.3
-- Checking for module 'libxcf03'
--   Found libxcf03, version 5.2.3
--    ---> Found F2003 interface for libxc. Using it as default
-- Configuring docs
-- Configuring libgridxc tests...
-- Maximum number of usable MPI ranks: 32
-- Number of MPI ranks in tests: 4
-- Configuring done
-- Generating done
-- Build files have been written to: /dev/shm/libGridXC/1.1.0/gompi-2022a/easybuild_obj

The pkg-config part suggests to me that this should have a builddep on pkgconf 1.8.0 (possibly related to missing include paths)

@Micket
Copy link
Contributor

Micket commented Apr 22, 2023

Hm, no that unfortunately wasn't it.

The failed build has the cmakecache

$ grep -i libxc CMakeCache.txt  | grep -i include
LIBXC_C_INCLUDEDIR:INTERNAL=/apps/Test2/software/libxc/5.2.3-GCC-11.3.0/include
LIBXC_C_INCLUDE_DIRS:INTERNAL=
LIBXC_C_STATIC_INCLUDE_DIRS:INTERNAL=
LIBXC_C_libxc_INCLUDEDIR:INTERNAL=
LIBXC_F03_INCLUDEDIR:INTERNAL=/apps/Test2/software/libxc/5.2.3-GCC-11.3.0/include
LIBXC_F03_INCLUDE_DIRS:INTERNAL=
LIBXC_F03_STATIC_INCLUDE_DIRS:INTERNAL=
LIBXC_F03_libxcf03_INCLUDEDIR:INTERNAL=
LIBXC_F90_INCLUDEDIR:INTERNAL=/apps/Test2/software/libxc/5.2.3-GCC-11.3.0/include
LIBXC_F90_INCLUDE_DIRS:INTERNAL=
LIBXC_F90_STATIC_INCLUDE_DIRS:INTERNAL=
LIBXC_F90_libxcf90_INCLUDEDIR:INTERNAL=

and apparently, the xcf90 target just uses the blank one;

        target_include_directories(
          libxc::xcf90
          INTERFACE
          "${LIBXC_F90_INCLUDE_DIRS}"
        )

:/

@ahkole
Copy link
Contributor Author

ahkole commented Apr 22, 2023

Any idea why this happens? All the _INCLUDEDIR do seem to contain the correct path to the headers.

@Micket
Copy link
Contributor

Micket commented Apr 22, 2023

Could you please give a build a try with the additional build dep ('pkgconf', '1.8.0'), , easier to tell if this is predictable avoiding the OS differences.

@Micket
Copy link
Contributor

Micket commented Apr 22, 2023

I think i figured out what goes wrong here;

  1. We populate CPATH with these include paths already. Though, fortran compilers doesn't read that presumably.
  2. Since it's this path is in CPATH, it's somehow considered a "system path" by pkgconf/pkg-config
  3. pkg-config/pkgconf intentionally excludes "system path"'s from their --cflags
$ pkg-config --cflags libxcf90
(no output)
$ CPATH= pkg-config --cflags libxcf90
-I/apps/Test2/software/libxc/5.2.3-GCC-11.3.0/include

Verified in the source
https://github.com/pkgconf/pkgconf/blob/78f3abc9359cbe08258c381445b445206b2a0485/libpkgconf/client.c#L132

I'm not sure what your pkg-config does to not do this. This behaviour has been around for a while, at least in pkgconf, and from what i can tell pkg-config does the same.

('CMake', '3.23.1'),
]

parallel = 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this is needed or not

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On my cluster it also succeeds the build without the parallel = 1 (using 64 cores for compilation). But I don't know how robust this is and if it can fail under certain circumstances.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets remove it. Some projects actually do set up dependency tracking for mod files properly, so lets not limit it without knowing it has ever failed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I removed the parallel = 1 again

@ahkole
Copy link
Contributor Author

ahkole commented Apr 22, 2023

Could you please give a build a try with the additional build dep ('pkgconf', '1.8.0'), , easier to tell if this is predictable avoiding the OS differences.

I tried this as you suggested and indeed then the build also fails on my cluster. I guess the reason it worked for me before is because the system pkg-config on this cluster is ancient (version 0.27.1).

preconfigopts = 'CPATH= '  # gfortran ignores CPATH, but pkgconf also excludes dirs from CPATH

Suggested workaround

I also tried this suggested workaround and then the build succeeds again (while keeping pkgconfig 1.8.0 as a build dependency)! I'll update my PR to include the changes.

@ahkole
Copy link
Contributor Author

ahkole commented Apr 22, 2023

Thanks for debugging this @Micket ! I was wondering, could this problem also be fixed without manually having to unset CPATH during the configure step? Would this require a change to either pkg-config, EasyBuild, the Fortran compilers or the build system of libGridXC?

@Micket
Copy link
Contributor

Micket commented Apr 22, 2023

See easybuilders/easybuild-framework#3331 for issue and discussion there, but it hasn't really moved forward in many years, so workaround for now until we can make this drastic change.

Copy link
Contributor

@Micket Micket left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@Micket Micket added this to the next release (4.7.2) milestone Apr 24, 2023
@Micket
Copy link
Contributor

Micket commented Apr 24, 2023

Test report by @Micket
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
vera-c1 - Linux Rocky Linux 8.6, x86_64, Intel Xeon Processor (Skylake), Python 3.6.8
See https://gist.github.com/Micket/a08e4fc69a2fd61e6e2ccf0148ea2c9b for a full test report.

@Micket
Copy link
Contributor

Micket commented Apr 24, 2023

Going in, thanks @ahkole!

@Micket Micket merged commit f14b013 into easybuilders:develop Apr 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants