Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make ModulesTool.exist more robust w.r.t. module wrappers, aliases, defaults, etc. #3216

Merged
merged 16 commits into from
Apr 30, 2020

Conversation

Flamefire
Copy link
Contributor

@Flamefire Flamefire commented Feb 26, 2020

Allows to find Java/whatver-11 from "module show Java/11" as well as AIModule aliased to TensorFlow/2.1.0 or similar.

The scope widened during the course of this PR trying to fix various test failures with different module tools resulting in a complete implementation which is much more robust.

This now fixes #3215

Proposal: Remove the previous check via module avail and solely rely on module show as the true source which handles broken aliases by returning false instead of True for module avail in Lmod and False otherwise.

@boegel boegel added the change label Feb 29, 2020
@boegel boegel added this to the 4.x milestone Feb 29, 2020
@boegel
Copy link
Member

boegel commented Feb 29, 2020

@Flamefire This would basically also consider the output of module show Java, but I don't see how that helps in general, since this will only produce output for whatever module version is the default?

In addition, this would make things like eb -D significantly slower, since it will do an additional module show for every non-existing module (see also #3203)

Also, the tests should be enhanced.

@Flamefire
Copy link
Contributor Author

I don't understand. If you call module_exist("Java") and module show Java returns a module, then the result of module_exist is True, isn't it?

Currently the problem is: If "Java/11" is an alias for "Java/whatever-11" then module show Java/11 will return ...Java/whetever-11.lua but the before this patch the result of module_exist would be false because it searches for Java/11*.lua (which fails)
With this patch only the part before "/" must match: Java/*.lua

In addition, this would make things like eb -D significantly slower, since it will do an additional module show for every non-existing module (see also #3203)

Why? There is only an additional regexp check, not another module show call

@boegel
Copy link
Member

boegel commented Feb 29, 2020

Ugh, I clearly didn't look at this closely enough... Thanks for clarifying @Flamefire!
And you're right, there's no extra show at all, you're just doing a double take on the output of module show.

In that case, I would:

  • enhance the existing test for exist in test/framework/modules.py (see def test_exist) to cover the case you mentioned
  • add a comment in the code to clarify why we're also considering the basename of the module name in mod_exists_via_show

@Flamefire Flamefire force-pushed the improve_module_exist branch from 5dc3bd6 to 79fce64 Compare March 3, 2020 09:57
@Flamefire
Copy link
Contributor Author

@boegel Done as suggested. I also implemented the show command so that it returns stderr not the combined stdout+stderr to reduce false positives (which can be fatal). Reasoning is that the output we want is contained in stderr (at least for LMod) and checking stderr to can have a path matching the regexp which we don't want.

easybuild/tools/modules.py Outdated Show resolved Hide resolved
easybuild/tools/modules.py Show resolved Hide resolved
@boegel boegel modified the milestones: 4.x, next release (4.1.2?) Mar 8, 2020
@Flamefire Flamefire force-pushed the improve_module_exist branch from 79fce64 to 3ff16d7 Compare March 16, 2020 12:37
# "Java/whatever-11" when searching for "Java/11" (--> basename="Java")
basename = os.path.dirname(mod_name)
if basename:
names_to_check.append(basename)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While looking into this, I noticed this change is not required at all to make the enhanced test pass, because of the fallback mechanism with module_wrapper_exists in ModulesTool.show that was added in #2606 .

So, I'm still puzzled why this change is required at al...

I enhanced the checks in test_exist a bit more, and roll back the changes you made to mod_exists_via_show here, and it still works fine.

See Flamefire/easybuild-framework@improve_module_exist...boegel:improve_module_exist.

I can open a PR to your branch with these changes once I get the test_run_module I added passing with all different module tools.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose we roll back this change for now, which is done as a part of Flamefire#2 .

We can further extend test_exist to cover the use cases that are actually broken (for example module aliases as discussed in #3215), before looking into a fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I'm still puzzled why this change is required at al...

Because as explained in #3215 the fallback shouldn't be there: It is fuzzy and incomplete. If module show returns a module, then the module exists else it does not, no fallback required.

Just looked at why the tests pass and comparing the #3215 and the docu at https://lmod.readthedocs.io/en/latest/093_modulerc.html:

Fallback considers only:

  • module_version command
  • only module paths and 1 given directory

Fallback hence fails for:

  • module_alias
  • system MODULERC files
  • ~/.modulerc.lua or ~/.modulerc

The last one was the problem I found. Users can't change the files in modulepath but they may want to change aliases via a home-folder modulerc.

Counter-proposal: We keep this change and print a fat warning when a module is not found via module show but via the fallback and gradually get rid of the fallback (My assumption is the warning will never trigger with this change)

Or do you see any reason to keep and improve a hand-rolled implementation of part of LMod/envModules?

More arguments: The fallback is particularly tricky as you noted in #2606 (comment) and I'm not sure that this is addressed: If a module wrapper exists but the underlying module does not, what does happen? I can't find such a 2nd check in

def module_wrapper_exists(self, mod_name, modulerc_fn='.modulerc', mod_wrapper_regex_template=None):

Using module show only avoids this problem altogether.

To go forward faster: Is there anything wrong with this PR? Any possible failure you can see? Any test that fails with it? If you want a test which passes after but not before I guess using module_alias will be enough. Can add that Tuesday.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any strong objections against your approach, other than that it partially solves the same issue as module_wrapper_exists does.

I agree that your approach is better, thanks a lot for the clarification.
module_wrapper_exists indeed has shortcomings (quite a lot actually, as you point them out), and it's indeed not EasyBuild's task to re-implement the modules tool.

With that in mind, I suggest proceeding as follows:

  1. Merge undo changes to mod_exists_via_show + additional/enhanced tests Flamefire/easybuild-framework#2 to update this PR, mostly for the additional tests (which are largely not directly related to the changes made in this PR, but are still useful). I spent quite a bit of time on the test for run_module in particular, it would be great if that effort doesn't go to waste...

  2. Revert commit a10a6ac that undoes your changes in mod_exists_via_show (or do a git rebase --interactive to kick it out entirely). I can also kick out that commit from undo changes to mod_exists_via_show + additional/enhanced tests Flamefire/easybuild-framework#2 before you merge it, if you prefer.

  3. Add more checks in test_exist to cover the use cases that are not dealt with by module_wrapper_exists, preferably not only module_alias but also the others, if possible. For ~/.modulerc.lua and ~/.modulerc, you can probably cover those in the test by temporarily redefining $HOME?

  4. Remove module_wrapper_exists entirely, since it's no longer needed with your enhanced mod_exists_via_show. I see no real reason to be careful there, the checks in test_exist should be good enough to ensure that nothing gets broken that was covered by module_wrapper_exists.

    If you prefer being cautious, I guess adding a log.deprecated statement in module_wrapper_exists makes sense too, to leave the fallback in place for now but print a clear warning if it's still triggered. If we take this route, the message should clearly mention to report the use case that triggers the warning, so we know we missed something before we go ahead and remove that block of code.

  5. Make your enhanced mod_exists_via_show a little bit stricter, as follows:

    names_to_check.append(basename + '/')

    The regular expression used to check the output of module show is a bit too liberal now in combination with only using the directory part of the module name.

    Using just Java in the fallback check would result in checking with '^\s*\S*/Java.*(\.lua)?:\s*$' for example, while it should really be '^\s*\S*/Java/.*(\.lua)?:\s*$'.
    Maybe we should even use basename + '/.*', and remove the .* from the mod_exists_regex_template values?
    I'm not sure why it's even there at all, we expect an exact match (except for the .lua part), don't we?

  6. Rename basename to something else, since it causes confusion. You use os.path.dirname to define a variable named basename, which clashes directly with os.path.basename which does the exact opposite (return just the filename). :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will work on this Tuesday, some comments now to leave time for decisions:

For ~/.modulerc.lua and ~/.modulerc, you can probably cover those in the test by temporarily redefining $HOME?

I didn't add tests yet for fear of accidentally removing the real file or e.g. missing to undefine it or so... I'll see what I can come up with

Remove module_wrapper_exists entirely,

I'm all in for that. But what about your usual reasoning someone else might be using it?

print a clear warning if it's still triggered. If we take this route, the message should clearly mention to report the use case that triggers the warning, so we know we missed something before we go ahead and remove that block of code.

We must not call it within EasyBuild or make the message only trigger if module_wrapper_exists returns anything because it is currently called if the previous check does not find anything, which is valid: When the module indeed does not exist. Problem: An alias might exist but the underlying module not. module show would return nothing but module_wrapper_exists would return something. So I'd never call it from EB and deprecate it like other functions

Make your enhanced mod_exists_via_show a little bit stricter, as follows:

Why? If $LMOD_CMD show foo/1.2 returns bar_3.4.lua because someone set the alias as such, then this is valid. AFAIK module show returns nothing when the module is not found (maybe an error? But even that not on stderr of the module tool as that is up to the wrapper)
Hence I don't think it needs to be made any stricter. Use case for the loose version: IBM has a custom TensorFlow. Maybe someone makes an alias TensorFlow_PowerAI to TensorFlow/1.13_Anaconda_... or so to distinguish it from official TF

Do you know of anything where the strict check would be required? Then we should add negative tests too that cover that. But until then I would keep it loose

# "Java/whatever-11" when searching for "Java/11" (--> basename="Java")
basename = os.path.dirname(mod_name)
if basename:
names_to_check.append(basename)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any strong objections against your approach, other than that it partially solves the same issue as module_wrapper_exists does.

I agree that your approach is better, thanks a lot for the clarification.
module_wrapper_exists indeed has shortcomings (quite a lot actually, as you point them out), and it's indeed not EasyBuild's task to re-implement the modules tool.

With that in mind, I suggest proceeding as follows:

  1. Merge undo changes to mod_exists_via_show + additional/enhanced tests Flamefire/easybuild-framework#2 to update this PR, mostly for the additional tests (which are largely not directly related to the changes made in this PR, but are still useful). I spent quite a bit of time on the test for run_module in particular, it would be great if that effort doesn't go to waste...

  2. Revert commit a10a6ac that undoes your changes in mod_exists_via_show (or do a git rebase --interactive to kick it out entirely). I can also kick out that commit from undo changes to mod_exists_via_show + additional/enhanced tests Flamefire/easybuild-framework#2 before you merge it, if you prefer.

  3. Add more checks in test_exist to cover the use cases that are not dealt with by module_wrapper_exists, preferably not only module_alias but also the others, if possible. For ~/.modulerc.lua and ~/.modulerc, you can probably cover those in the test by temporarily redefining $HOME?

  4. Remove module_wrapper_exists entirely, since it's no longer needed with your enhanced mod_exists_via_show. I see no real reason to be careful there, the checks in test_exist should be good enough to ensure that nothing gets broken that was covered by module_wrapper_exists.

    If you prefer being cautious, I guess adding a log.deprecated statement in module_wrapper_exists makes sense too, to leave the fallback in place for now but print a clear warning if it's still triggered. If we take this route, the message should clearly mention to report the use case that triggers the warning, so we know we missed something before we go ahead and remove that block of code.

  5. Make your enhanced mod_exists_via_show a little bit stricter, as follows:

    names_to_check.append(basename + '/')

    The regular expression used to check the output of module show is a bit too liberal now in combination with only using the directory part of the module name.

    Using just Java in the fallback check would result in checking with '^\s*\S*/Java.*(\.lua)?:\s*$' for example, while it should really be '^\s*\S*/Java/.*(\.lua)?:\s*$'.
    Maybe we should even use basename + '/.*', and remove the .* from the mod_exists_regex_template values?
    I'm not sure why it's even there at all, we expect an exact match (except for the .lua part), don't we?

  6. Rename basename to something else, since it causes confusion. You use os.path.dirname to define a variable named basename, which clashes directly with os.path.basename which does the exact opposite (return just the filename). :)

@Flamefire Flamefire force-pushed the improve_module_exist branch 2 times, most recently from 9b01ac8 to 62de10d Compare April 14, 2020 12:01
@Flamefire
Copy link
Contributor Author

Flamefire commented Apr 16, 2020

@boegel Tests failing due to the tests for the fallback printing deprecated messages. Shall I remove those tests?

Edit: Or add "Deprecated functionality" to IGNORE_PATTERNS? Deprecated stuff is already a test failure unless explicitely allowed

Allows to find Java/whatver-11 from "module show Java/11"
@Flamefire Flamefire force-pushed the improve_module_exist branch from 62de10d to 13361a6 Compare April 29, 2020 10:45
@Flamefire
Copy link
Contributor Author

Rebased and removed the succeeding but echoing "DEPRECATED" module_wrapper_exists tests

@Flamefire
Copy link
Contributor Author

@boegel Found multiple issues:

  • module-alias support is different for different versions and tools.
    • module-alias JavaAlias Java/1.8.0_181 is possible for LMod 7+ only
    • module-alias Java/Alias toy/42.1337 works for all but fails the current, restrictive regexp
  • module avail may or may not list aliases (shown for e.g. LMod). If it does, the pointed to modules may not exist, hence module show <mod> would return failure.

I fixed and tested the first case. The second may need a major change and deprecation warning. However it is not totally clear what exists should return for a non-existant aliased module. I'd say False because it cannot be loaded which is usually what is wanted.

@Flamefire Flamefire force-pushed the improve_module_exist branch from 9dbf558 to 45135ab Compare April 29, 2020 15:01
@Flamefire Flamefire force-pushed the improve_module_exist branch 2 times, most recently from b3bd1f3 to 506397b Compare April 30, 2020 07:39
For HMNS the .modulerc needs updating on EnvironmentModules
however the old fallback wrongly considered the current version as correct
(depending on the actual location of the .modulerc) and returned True
even though the module is not usable
Deprecate mod_exists_regex_template in favor for better output parsing
Output parsing of module show is now more accurate as errors are handled
and only first non-whitespace line is checked

Usual outputs for non-existant modules: Empty or `(0):ERROR:...`
For existing modules the first line (after comments etc) contains the
full path of the module file with a colon at the end.
The issue was caused by self.show returning stdout and stderr
As it now returns only stderr the issue is no longer possible
@Flamefire Flamefire force-pushed the improve_module_exist branch from 506397b to 1454dd3 Compare April 30, 2020 08:51
@Flamefire Flamefire changed the title Also check for module basename in module exist Properly handle module output in module_exists_via_show Apr 30, 2020
@easybuilders easybuilders deleted a comment from boegelbot Apr 30, 2020
@easybuilders easybuilders deleted a comment from boegelbot Apr 30, 2020
@easybuilders easybuilders deleted a comment from boegelbot Apr 30, 2020
@easybuilders easybuilders deleted a comment from boegelbot Apr 30, 2020
@easybuilders easybuilders deleted a comment from boegelbot Apr 30, 2020
@easybuilders easybuilders deleted a comment from boegelbot Apr 30, 2020
easybuild/tools/modules.py Outdated Show resolved Hide resolved
easybuild/tools/modules.py Outdated Show resolved Hide resolved
easybuild/tools/modules.py Show resolved Hide resolved
easybuild/tools/modules.py Show resolved Hide resolved
test/framework/modules.py Outdated Show resolved Hide resolved
test/framework/modules.py Show resolved Hide resolved
@Flamefire Flamefire force-pushed the improve_module_exist branch from cdecb65 to 8cb5517 Compare April 30, 2020 14:45
@boegel boegel changed the title Properly handle module output in module_exists_via_show make ModulesTool.exist more robust w.r.t. module wrappers, aliases, defaults, etc. Apr 30, 2020
@boegel boegel merged commit 33d0073 into easybuilders:develop Apr 30, 2020
@Flamefire Flamefire deleted the improve_module_exist branch May 4, 2020 07:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Module alias/wrapper detection is broken/incomplete
2 participants