Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build failure for libreadline-8.1-GCCcore-11.2.0 #16014

Closed
0 opened this issue Aug 10, 2022 · 18 comments · Fixed by #16064 or #16270
Closed

Build failure for libreadline-8.1-GCCcore-11.2.0 #16014

0 opened this issue Aug 10, 2022 · 18 comments · Fixed by #16064 or #16270
Milestone

Comments

@0
Copy link

0 commented Aug 10, 2022

EasyBuild 4.6.0 fails to build libreadline-8.1-GCCcore-11.2.0.eb:

== FAILED: Installation ended unsuccessfully
(build directory: /tmp/easybuild/libreadline/8.1/GCCcore-11.2.0): build failed (first 300 chars):
cmd "/opt/easybuild/sources/generic/eb_v4.6.0/ConfigureMake/config.guess" exited with exit code 1 and output:
/bin/bash: /opt/easybuild/software/ncurses/6.2-GCCcore-11.2.0/lib/libncursesw.so.6:
version `NCURSES6_TINFO_5.0.19991023' not found (required by /usr/lib/libreadline.so.8)

(Line breaks have been added for legibility.)

On this system, /bin/sh is provided by bash, which is configured using --with-curses, so it requires both libreadline.so and libncursesw.so:

$ ldd /bin/sh
        [...]
        libreadline.so.8 => /usr/lib/libreadline.so.8
        libncursesw.so.6 => /usr/lib/libncursesw.so.6

Since config.guess is a shell script, it can't be executed when the EasyBuild libncursesw.so is used together with the system libreadline.so:

$ head -n1 /opt/easybuild/sources/generic/eb_v4.6.0/ConfigureMake/config.guess
#! /bin/sh
$ LD_LIBRARY_PATH=/opt/easybuild/software/ncurses/6.2-GCCcore-11.2.0/lib /bin/sh
/bin/sh: /opt/easybuild/software/ncurses/6.2-GCCcore-11.2.0/lib/libncursesw.so.6:
version `NCURSES6_TINFO_5.0.19991023' not found (required by /usr/lib/libreadline.so.8)
@boegel boegel added this to the 4.x milestone Aug 10, 2022
@boegel
Copy link
Member

boegel commented Aug 10, 2022

That's... interesting.

I guess this means that we need to run config.guess using LD_LIBRARY_PATH= ./config.guess in the determine_build_and_host_type method of ConfigureMake?

@0 Are you up for giving that a try, by customizing the configuremake.py easyblock, and making EasyBuild use it via --include-easyblocks?

@boegel
Copy link
Member

boegel commented Aug 10, 2022

@0 Can you provide some more information about the bash that you're using here? Which operating system are you on (please share the output of eb --show-system-info)? Is this a standard bash installation, or something you've built yourself?

@0
Copy link
Author

0 commented Aug 11, 2022

I guess this means that we need to run config.guess using LD_LIBRARY_PATH= ./config.guess in the determine_build_and_host_type method of ConfigureMake?

@0 Are you up for giving that a try, by customizing the configuremake.py easyblock, and making EasyBuild use it via --include-easyblocks?

If I set LD_LIBRARY_PATH to be empty for the duration of run_cmd for config.guess, that step proceeds correctly, and identifies my system as "x86_64-pc-linux-gnu". The configure step, however, then fails in a familiar way:

== FAILED: Installation ended unsuccessfully
(build directory: /tmp/easybuild/libreadline/8.1/GCCcore-11.2.0): build failed (first 300 chars):
cmd " ./configure --prefix=/opt/easybuild/software/libreadline/8.1-GCCcore-11.2.0
--build=x86_64-pc-linux-gnu  --host=x86_64-pc-linux-gnu " exited with exit code 1 and output:
/bin/bash: /opt/easybuild/software/ncurses/6.2-GCCcore-11.2.0/lib/libncursesw.so.6:
version `NCURSES6_TINFO_5.0.19991023' n (took 0 secs)

I tried passing LD_LIBRARY_PATH=... as an argument to ./configure rather than having it set in the environment, but that makes no difference. The configure script sets the variables that are passed in, so it still breaks, just slightly later:

$ LD_LIBRARY_PATH=/opt/easybuild/software/ncurses/6.2-GCCcore-11.2.0/lib ./configure
/bin/sh: /opt/easybuild/software/ncurses/6.2-GCCcore-11.2.0/lib/libncursesw.so.6:
version `NCURSES6_TINFO_5.0.19991023' not found (required by /usr/lib/libreadline.so.8)
$ ./configure LD_LIBRARY_PATH=/opt/easybuild/software/ncurses/6.2-GCCcore-11.2.0/lib
configure: error: cannot run /bin/sh ./support/config.sub

@0 Can you provide some more information about the bash that you're using here? Which operating system are you on (please share the output of eb --show-system-info)? Is this a standard bash installation, or something you've built yourself?

This is the standard Arch Linux bash.

System information:

* OS:
  -> name: Arch Linux
  -> type: Linux
  -> version: UNKNOWN
  -> platform name: x86_64-unknown-linux

* CPU:
  -> vendor: AMD
  -> architecture: x86_64
  -> family: AMD
  -> arch name: UNKNOWN (archspec is not installed?)

* software:
  -> glibc version: 2.36
  -> Python binary: /usr/bin/python
  -> Python version: 3.10.5

@0
Copy link
Author

0 commented Aug 11, 2022

I built a statically-linked bash and bind-mounted it over the system bash in a new namespace, but this atrocious workaround is insufficient:

configure: creating ./config.status
config.status: creating Makefile
awk: /opt/easybuild/software/ncurses/6.2-GCCcore-11.2.0/lib/libncursesw.so.6:
version `NCURSES6_TINFO_5.0.19991023' not found (required by /usr/lib/libreadline.so.8)
config.status: error: could not create Makefile

@Micket
Copy link
Contributor

Micket commented Aug 11, 2022

some options i see

  1. going full RPATH in EB, filtering out LD_LIBRARY_PATH completely from the ground up. I know some centers actually do this, and it seemed like a LOT of work.
  2. RPATH'ing all the binaries/libraries for bash, awk, sed, and what else we might rely on that happens to be affected here. A less heinous workaround is to do this with thin singularity container and build the easyconfigs that conflict via that. Perhaps possible to patchelf --set-rpath '/usr/lib/' /usr/bin/bash in that container recipe that otherwise matches the OS.
  3. We work around this problem in EB, specifically for these few core packages that these common os deps rely on, by e.g. avoiding having the .so.8 symlinks. Might affect other things down the line though, and it would be painful to introduce that change retroactively (anyone who happened to rebuild their ncurses module might drop symlinks that other software was relying on).

@Micket
Copy link
Contributor

Micket commented Aug 11, 2022

Option 2 wasn't as nice as i thought. bash, awk and friends will remain broken for any user who loads these modules, so just fixing it during building isn't enough since these are surely used all the time at runtime as well.
I tried looking to see if there was a way to maybe specify paths just for specific binaries in /etc/ld.so.conf so one doesn't have to fiddle with OS installed binaries but i couldn't find anything (but perhaps my google-fu isn't strong enough)

edit: ok maybe the work around is enough to get to own libreadline module as well, then, maybe bash will start working again since it's not a mixed of OS and EB libraries.

@0
Copy link
Author

0 commented Aug 12, 2022

  1. RPATH'ing all the binaries/libraries for bash, awk, sed, and what else we might rely on that happens to be affected here.

This seemed like the most accessible of the options, so I took all the dynamically-linked ELFs in /usr/bin that didn't already have RPATH set, and used patchelf to set their RPATH to /usr/lib in an overlay mount. (It was necessary to use --force-rpath to set DT_RPATH instead of DT_RUNPATH, because the latter is overridden by LD_LIBRARY_PATH at runtime.) That was enough to successfully build libreadline:

$ module list
No modules loaded
$ module load Core/GCCcore/11.2.0 Compiler/GCCcore/11.2.0/libreadline
$ module list
Currently Loaded Modules:
  1) Core/GCCcore/11.2.0   2) ncurses/6.2   3) Compiler/GCCcore/11.2.0/libreadline/8.1

Option 2 wasn't as nice as i thought. bash, awk and friends will remain broken for any user who loads these modules, so just fixing it during building isn't enough since these are surely used all the time at runtime as well. [...]
edit: ok maybe the work around is enough to get to own libreadline module as well, then, maybe bash will start working again since it's not a mixed of OS and EB libraries.

With both modules loaded and /usr/bin back to normal, some things that link against both libraries work fine:

$ ldd /bin/sh
        [...]
        libreadline.so.8 => /opt/easybuild/software/libreadline/8.1-GCCcore-11.2.0/lib/libreadline.so.8
        libncurses.so.6 => /opt/easybuild/software/ncurses/6.2-GCCcore-11.2.0/lib/libncurses.so.6
$ /bin/sh -c 'echo OK'
OK

However, some binaries are not content with this arrangement:

$ cal
cal: /opt/easybuild/software/ncurses/6.2-GCCcore-11.2.0/lib/libncursesw.so.6:
version `NCURSES6_TINFO_5.0.19991023' not found (required by cal)

As expected, this setup is also quite fragile:

$ module list
Currently Loaded Modules:
  1) Core/GCCcore/11.2.0   2) ncurses/6.2   3) Compiler/GCCcore/11.2.0/libreadline/8.1
$ module unload Compiler/GCCcore/11.2.0/libreadline/8.1
$ module list
/usr/bin/lua: /opt/easybuild/software/ncurses/6.2-GCCcore-11.2.0/lib/libncursesw.so.6:
version `NCURSES6_TINFO_5.0.19991023' not found (required by /usr/lib/libreadline.so.8)
  1. going full RPATH in EB, filtering out LD_LIBRARY_PATH completely from the ground up. I know some centers actually do this, and it seemed like a LOT of work.

What would this entail? It sounds like the most robust option.

@boegel
Copy link
Member

boegel commented Aug 12, 2022

This one definitely belongs in the "how did we not run into this before" category....

Just yesterday, I ran into similar problems due to libtinfo.so.6 now also being symlinked in ncurses (thanks to the changes in #15903). Those changes are not part yet of the latest EasyBuild release, but they definitely seem to make things worse, not better: I ran into trouble building gettext-0.21.eb from scratch in a Ubuntu 20.04 container due to this:

/bin/bash: /apps/Ubuntu-20.04/zen2-ib/software/ncurses/6.2/lib/libtinfo.so.6: no version information available (required by /bin/bash)

There's actually an option 4 too, by simply not installing ncurses with EasyBuild, but assuming it's available in the OS (along with header files, etc.), by configuring EasyBuild with --filter-deps=ncurses.

  1. going full RPATH in EB, filtering out LD_LIBRARY_PATH completely from the ground up. I know some centers actually do this, and it seemed like a LOT of work.

What would this entail? It sounds like the most robust option.

I basically boils down to configuring EasyBuild with --rpath (see also https://docs.easybuild.io/en/latest/RPATH-support.html), and also with --filter-env-vars=LD_LIBRARY_PATH.
There probably is some software where RPATH linking is not working correctly yet (most likely resulting in a failing sanity check), so those will have to be fixed (we're interested in fixes like that anyway, since RPATH linking is purposely used for other reasons in some contexts, like EESSI).

@boegel
Copy link
Member

boegel commented Aug 12, 2022

It looks like the problem caused by the libtinfo.so.6 symlink that was added in #15903 can also be resolved by adding --with-versioned-syms to the ncurses configure command.

@0 Can you check if that also resolved the problem you are seeing?

To do so, you should
i) copy the ncurses easyconfig file (using eb --copy-ec ncurses-6.2.eb . for example)
ii) edit the file to add --with-versioned-syms to local_common_configopts (and make sure that still ends with a space)
iii) rebuild that ncurses easyconfig using eb --force, and then something that uses it as a dependency, like gettext or libreadline

@Micket
Copy link
Contributor

Micket commented Aug 12, 2022

What would this entail? It sounds like the most robust option.

I started thinking as to why i hadn't switched over to rpath + no LD_LIBRARY_PATH myself, before i realized the biggest cost it incurs; users who build software themselves can't rely on LD_LIBRARY_PATH either; they all either have to configure their builds to rpath everything, defined LD_LIBRARY_PATH themselves (yuck), or you'd need to do some fancy wrapper for the compilers so that you can enforce it the same way EB does (doable, but not noticeable complication). I'd say this is the biggest downside of this approach.

However, some binaries are not content with this arrangement:

I will say that calc (and most other) breaking while modules are loaded is kind of expected. Even filtering out LD_LIBRARY_PATH (which certainly helps to a great degree), other stuff variables PYTHONPATH, JAVA_HOME, etc. that can mess OS stuff up.
If a calc was needed in conjunction with other tools, then we could also add calc-1.2.3-GCCcore-11.3.0.eb (but this is not an option for bootstrapping toolchains). So it's mostly all about the core things we directly depend on from the OS; bash, awk, sed, grep, ls, cd, chmod etc.

@0
Copy link
Author

0 commented Aug 15, 2022

There's actually an option 4 too, by simply not installing ncurses with EasyBuild, but assuming it's available in the OS (along with header files, etc.), by configuring EasyBuild with --filter-deps=ncurses.

This does work, but I think it opens everything built with EasyBuild to the possibility of spontaneous breakage with each system update.

It looks like the problem caused by the libtinfo.so.6 symlink that was added in #15903 can also be resolved by adding --with-versioned-syms to the ncurses configure command.

@0 Can you check if that also resolved the problem you are seeing?

Yes, adding --with-versioned-syms fixes the problem with building libreadline-8.1-GCCcore-11.2.0.eb. I can now have mixed versions of these libraries without breaking everything:

$ ldd /bin/bash
        [...]
        libreadline.so.8 => /usr/lib/libreadline.so.8
        libncursesw.so.6 => /opt/easybuild/software/ncurses/6.2-GCCcore-11.2.0/lib/libncursesw.so.6

I basically boils down to configuring EasyBuild with --rpath (see also https://docs.easybuild.io/en/latest/RPATH-support.html), and also with --filter-env-vars=LD_LIBRARY_PATH.

Thank you for the link to the relevant docs! I had misunderstood and assumed that this feature wasn't implemented yet. It's working thus far, and hopefully the alternate universe that EB constructs will remain sufficiently separated from the system packages.

I started thinking as to why i hadn't switched over to rpath + no LD_LIBRARY_PATH myself, before i realized the biggest cost it incurs; users who build software themselves can't rely on LD_LIBRARY_PATH either; they all either have to configure their builds to rpath everything, defined LD_LIBRARY_PATH themselves (yuck), or you'd need to do some fancy wrapper for the compilers so that you can enforce it the same way EB does (doable, but not noticeable complication).

Yes, I can see how that would be annoying, but it seems like the lesser of two problems. If writing custom easyconfigs is a viable option, users can build all their software using EasyBuild to make use of the RPATH machinery.

Thank you, @boegel and @Micket, for the very fast and informative responses! I now have a collection of working solutions to choose from, so feel free to close this issue.

@boegel
Copy link
Member

boegel commented Sep 9, 2022

#16064 doesn't actually fix this yet since it only updates ncurses easyconfigs using the system toolchain, so re-opening this...

@boegel boegel reopened this Sep 9, 2022
@boegel boegel modified the milestones: 4.x, release after 4.6.1 Sep 9, 2022
@surak
Copy link
Contributor

surak commented Sep 19, 2022

I get the same trying to install foss-2022. It stops at libreadline.

/bin/bash: /easybuild/2022/software/ncurses/6.3-GCCcore-11.3.0/lib/libtinfo.so.6: no version information available (required by /bin/bash)
x86_64-pc-linux-gnu

This is the standard bash from ubuntu 20.04: GNU bash, version 5.0.17(1)-release (x86_64-pc-linux-gnu)

eb --show-system-info
System information (haicluster1.fz-juelich.de):

* OS:
  -> name: Ubuntu
  -> type: Linux
  -> version: 20.04
  -> platform name: x86_64-unknown-linux

* CPU:
  -> vendor: AMD
  -> architecture: x86_64
  -> family: AMD
  -> arch name: UNKNOWN (archspec is not installed?)
  -> model: AMD EPYC 7F72 24-Core Processor
  -> speed: 3200.0
  -> cores: 48
  -> features: 3dnowprefetch,abm,adx,aes,aperfmperf,apic,arat,avic,avx,avx2,bmi1,bmi2,bpext,cat_l3,cdp_l3,clflush,clflushopt,clwb,clzero,cmov,cmp_legacy,constant_tsc,cpb,cpuid,cqm,cqm_llc,cqm_mbm_local,cqm_mbm_total,cqm_occup_llc,cr8_legacy,cx16,cx8,de,decodeassists,extapic,extd_apicid,f16c,flushbyasid,fma,fpu,fsgsbase,fxsr,fxsr_opt,ht,hw_pstate,ibpb,ibrs,ibs,irperf,lahf_lm,lbrv,lm,mba,mca,mce,misalignsse,mmx,mmxext,monitor,movbe,msr,mtrr,mwaitx,nonstop_tsc,nopl,npt,nrip_save,nx,osvw,overflow_recov,pae,pat,pausefilter,pclmulqdq,pdpe1gb,perfctr_core,perfctr_llc,perfctr_nb,pfthreshold,pge,pni,popcnt,pse,pse36,rdpid,rdrand,rdseed,rdt_a,rdtscp,rep_good,sep,sev,sha_ni,skinit,smap,smca,sme,smep,ssbd,sse,sse2,sse4_1,sse4_2,sse4a,ssse3,stibp,succor,svm,svm_lock,syscall,tce,topoext,tsc,tsc_scale,umip,v_vmsave_vmload,vgif,vmcb_clean,vme,vmmcall,wbnoinvd,wdt,xgetbv1,xsave,xsavec,xsaveerptr,xsaveopt,xsaves

* GPU:
  -> NVIDIA
    -> 3x NVIDIA GeForce RTX 3090, 515.65.01

* software:
  -> glibc version: 2.31
  -> Python binary: /usr/bin/python
  -> Python version: 3.8.10

@jacekwu1989
Copy link

I encountered the same issue while trying to install intel-2022a. Didn't encounter this previously when I was using older versions of EasyBuild though.

/bin/bash: /home/user/.local/easybuild/software/ncurses/6.3-GCCcore-11.3.0/lib/libtinfo.so.6: no version information available (required by /bin/bash)
/bin/bash: -c: line 1: syntax error near unexpected token `('
/bin/bash: -c: line 1: ` ./configure --prefix=/home/user/.local/easybuild/software/libreadline/8.1.2-GCCcore-11.3.0  --build=/bin/bash: /home/user/.local/easybuild/software/ncurses/6.3-GCCcore-11.3.0/lib/libtinfo.so.6: no version information available (required by /bin/bash)'

I'm using the standard bash from Ubuntu 22.04 as well.

eb --show-system-info
System information (workstation1):

* OS:
  -> name: Ubuntu
  -> type: Linux
  -> version: 22.04
  -> platform name: x86_64-unknown-linux

* CPU:
  -> vendor: Intel
  -> architecture: x86_64
  -> family: Intel
  -> arch name: UNKNOWN (archspec is not installed?)
  -> model: Intel(R) Xeon(R) Platinum 8360Y CPU @ 2.40GHz
  -> speed: 3500.0
  -> cores: 144
  -> features: 3dnowprefetch,abm,acpi,adx,aes,aperfmperf,apic,arat,arch_capabilities,arch_perfmon,art,avx,avx2,avx512_bitalg,avx512_vbmi2,avx512_vnni,avx512_vpopcntdq,avx512bw,avx512cd,avx512dq,avx512f,avx512ifma,avx512vbmi,avx512vl,bmi1,bmi2,bts,cat_l3,clflush,clflushopt,clwb,cmov,constant_tsc,cpuid,cpuid_fault,cqm,cqm_llc,cqm_mbm_local,cqm_mbm_total,cqm_occup_llc,cx16,cx8,dca,de,ds_cpl,dtes64,dtherm,dts,epb,ept,ept_ad,erms,est,f16c,flexpriority,flush_l1d,fma,fpu,fsgsbase,fsrm,fxsr,gfni,ht,ibpb,ibrs,ibrs_enhanced,ida,intel_ppin,intel_pt,invpcid,invpcid_single,la57,lahf_lm,lm,mba,mca,mce,md_clear,mmx,monitor,movbe,msr,mtrr,nonstop_tsc,nopl,nx,ospke,pae,pat,pbe,pcid,pclmulqdq,pconfig,pdcm,pdpe1gb,pebs,pge,pku,pln,pni,popcnt,pse,pse36,pts,rdpid,rdrand,rdseed,rdt_a,rdtscp,rep_good,sdbg,sep,sha_ni,smap,smep,smx,split_lock_detect,ss,ssbd,sse,sse2,sse4_1,sse4_2,ssse3,stibp,syscall,tm,tm2,tme,tpr_shadow,tsc,tsc_adjust,tsc_deadline_timer,umip,vaes,vme,vmx,vnmi,vpclmulqdq,vpid,wbnoinvd,x2apic,xgetbv1,xsave,xsavec,xsaveopt,xsaves,xtopology,xtpr

* GPU:
  -> NVIDIA
    -> 1x NVIDIA RTX A2000, 515.65.01

* software:
  -> glibc version: 2.35
  -> Python binary: /usr/bin/python3
  -> Python version: 3.10.4

@Micket
Copy link
Contributor

Micket commented Sep 20, 2022

The rest of the fixes (for all recent ncurses version) is found in #16270

@boegel
Copy link
Member

boegel commented Sep 23, 2022

@surak @jacekwu1989 Can you verify whether the problem you are seeing is indeed resolved when using the ncurses easyconfigs from #16270 to reinstall ncurses?

For example, use eb --from-pr 16270 ncurses-6.3-GCCcore-11.3.0.eb --force

@surak
Copy link
Contributor

surak commented Sep 23, 2022

Yes. There was a weird bug doing from the pr, so I patched the files myself and reinstalled them in 2 systems. The bug is gone.

@jacekwu1989
Copy link

I also encountered a bug from the PR, but I can't remember what it is now. patching the easyconfig for ncurses manually solved the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment