-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add support for --filter-rpath-sanity-libs to skip RPATH sanity check for designated libraries #4119
Conversation
…ch as libcuda.so.1
…ty check configurable via a command line argument. Also, use only one regular expression to immediately capture the name of the library that is missing. This simplifies the check to see if it is present in the list if ignoreable libraries.
… using --rpath-filter and 2) checks if the rpath sanity check passes if that same library isn't only filtered while setting RPATH (with --rpath-filter), but also ignored when checking in the RPATH sanity check (with -filter-rpath-sanity-libs)
For completeness: this is an example of what the drivers look like
i.e. here, the |
…ed in a prefix that contains libtoy. Thus, make is filter the pattenr .*ltoy.* so that this always matches, regardless of ltoy.so's prefix
…d libtoy.so. There must be another reason why the test suite was failing to filter this from the linkeage before...
I'm puzzled by the CI failures of the unit tests... I added this to the # test sanity error when --rpath-filter is used to filter a required library
toy_ec_txt = read_file(os.path.join(test_ecs, 't', 'toy', 'toy-0.0.eb'))
toy_ec_txt += "\ndependencies = [('libtoy', '0.0', '', SYSTEM)]"
toy_ec_txt += "\nbuildopts = '-ltoy'"
toy_ec = os.path.join(self.test_prefix, 'toy.eb')
write_file(toy_ec, toy_ec_txt)
error_pattern = r"Sanity check failed\: Library libtoy\.so not found"
self.assertErrorRegex(EasyBuildError, error_pattern, self.test_toy_build, ec_file=toy_ec,
extra_args=['--rpath', '--experimental', '--rpath-filter=.*libtoy.*'],
raise_error=True, verbose=False) What this is meant to do is:
On my local machine, I run Yet, in the EasyBuild CI, this produces a failure. If I understand it correctly, this failure tells me that the last line in the above addition did not actually produce an error at all. That puzzles me: how can a |
…this change should only be temporary, to diagnose the issue
Hm, for some reason it doesn't seem to pick up on
I guess, since the
Yet, this looks as intended, which suggest it is picked up:
|
Ok, this confuses me:
It says EDIT: ah, so there is a |
… right after the compilation, and if that matches our expectations
… that could be a reason for this test failure
…r doesn't pick up on static versions
…s --as-needed by default by explicitely disabling this
…lied in any case...
… the possibilities is that it picks this up from LDFLAGS
Found the issue: Ubuntu passes
on an Ubuntu system. Most systems will end with a Of course, I can pass the |
… This makes sure that regardless of whether --as-needed is passed or not, the library will always be linked by the linker. For now, we still print the full output of the build, so that we can check (once) of it all makes sense now. If it does, I'll do one more commit to limit the test to only the check that parses the error for a certain pattern
…ripped all debugging comments.
…e driver as default libs to be filtered from the RPATH sanity check
…r this index test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
test/framework/toy_build.py
Outdated
error_pattern = r"Sanity check failed\: Library libtoy\.so not found" | ||
self.assertErrorRegex(EasyBuildError, error_pattern, self.test_toy_build, ec_file=toy_ec, | ||
extra_args=['--rpath', '--experimental', '--rpath-filter=.*libtoy.*'], | ||
name='toy2', raise_error=True, verbose=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also run again without --rpath-filter
, for completeness sake?
It's generally a good idea to make sure that the test setup is done fully correctly...
name='toy2', raise_error=True, verbose=False) | |
args = ['--rpath', '--experimental', '--rpath-filter=.*libtoy.*'] | |
self.assertErrorRegex(EasyBuildError, error_pattern, self.test_toy_build, ec_file=toy_ec, | |
extra_args=args, name='toy2', raise_error=True, verbose=False) | |
# works fine if --rpath-filter is not used (since then libtoy is RPATH'ed) | |
self.test_toy_build(ec_file=toy_ec, name='toy2', extra_args=args, raise_error=True) |
You will need to clean up the install dir to make sure the next part of the test still works as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to clean? A rebuild will clean the installdir anyway, doesn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, indeed
test/framework/toy_build.py
Outdated
# not rpath-ed. Then, we use --filter-rpath-sanity-libs to make sure the RPATH sanity checks ignores | ||
# the fact that libtoy.so is not found. Thus, this build should complete succesfully | ||
args = ['--rpath', '--experimental', '--rpath-filter=.*libtoy.*', '--filter-rpath-sanity-libs=libtoy.so'] | ||
self.test_toy_build(ec_file=toy_ec, name='toy2', extra_args=args, raise_error=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe here we should also do a check on the output of readelf -d
on the binary, and check the RPATH section with a regex (to make sure it doesn't include libtoy
)?
That mainly makes sense if my suggestion above to also run once without --rpath-filter
is applied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, good idea, but I need some help with this. How do I retrieve the installdir in a test? And what is the appropriate way to run readelf -d
in a test, do we use the run_cmd
function for that? (if so, I need to find where that is defined again... :))
…lter_rpath_sanity_libs, make default RPATH filter as list and have a comment explaining the default
308d40c
to
d52be3b
Compare
…'ldd' and 'readelf -d' on toy-app binary
d52be3b
to
fa7375a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
fixes #4095
This PR will allow a (configurable) list of libraries to be ignored during the RPATH sanity check. This is e.g. needed when cross-compiling an executable that links against
libcuda.so.1
. The way NVIDIA supports cross-compilation is by compiling against a stub library, provided by the CUDA toolkit. The stub library only provides the symbols, but not the imlementation of the actual function that should be used at runtime. For example,<prefix>/CUDA/11.7.0/lib/stubs/libcuda.so.1
contains all the symbols of thelibcuda.so.1
, but no implementations. Thus, it should never be used at runtime: thelibcuda.so.1
that comes with the driver installation is supposed to be used at runtime.That's also why we don't
RPATH
anything that's in astubs
directory (see e.g. #2683 and this snippet of theeasybuild-framework
code).The issue is that the
sanity_check_rpath()
step will still try to unsetLD_LIBRARY_PATH
and see if all libraries are found correctly. On nodes that have the CUDA driver installed, that works fine, aslibcuda.so.1
is picked up from a default location (/usr/lib64/libcuda.so.1
). However, on CPU nodes, this fails. The fact thatlibcuda.so.1
is not found on CPU nodes when cross compiling is entirely legitimate though.A point of discussion is what the default behaviour should be for EasyBuild, i.e. for which default list of libraries it will be 'accepted' if they are not found in the
sanity_check_rpath()
step. In my opinion, it makes sense to includelibcuda.so
,libcuda.so.1
,libnvidia-ml.so
andlibnvidia-ml.so.1
here, as these are the libraries for which only stubs are provided in the CUDA toolkit and are thus supposed to be provided by the driver. For some other libraries (e.g.libcublas.so
), there are stubs in thestubs
directory, but also real ones in the<prefix>/CUDA/11.7.0/lib/
directory. Thus, I assume one would want those RPATH-ed against the latter locations.