Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Java/11 not considered present with HMNS on develop #3335

Closed
zao opened this issue May 12, 2020 · 8 comments · Fixed by #3337
Closed

Java/11 not considered present with HMNS on develop #3335

zao opened this issue May 12, 2020 · 8 comments · Fixed by #3337
Milestone

Comments

@zao
Copy link
Contributor

zao commented May 12, 2020

Att: @Flamefire

The changes in #3216 seem to have caused a regression on my Ubuntu 18.04 machine, where something depending on Java/11 fails to notice the installed module.

$ eb R-4.0.0-foss-2020a.eb -rD
*snip*
 * [x] $CFGS/j/Java/Java-11.0.2.eb (module: Core | Java/11.0.2)
 * [ ] $CFGS/j/Java/Java-11.eb (module: Core | Java/11)
 * [x] $CFGS/n/NLopt/NLopt-2.6.1-GCCcore-9.3.0.eb (module: Compiler/GCCcore/9.3.0 | NLopt/2.6.1)
 * [x] $CFGS/l/libsndfile/libsndfile-1.0.28-GCCcore-9.3.0.eb (module: Compiler/GCCcore/9.3.0 | libsndfile/1.0.28)
*snip*
 * [x] $CFGS/s/ScaLAPACK/ScaLAPACK-2.1.0-gompi-2020a.eb (module: MPI/GCC/9.3.0/OpenMPI/4.0.3 | ScaLAPACK/2.1.0)
 * [x] $CFGS/f/foss/foss-2020a.eb (module: Core | foss/2020a)
 * [ ] $CFGS/r/R/R-4.0.0-foss-2020a.eb (module: MPI/GCC/9.3.0/OpenMPI/4.0.3 | R/4.0.0)

This works fine on 4.2.0, but fails on 4.2.0 w/ 3216 applied and on current develop.

The build attempts to find Core/Java/11 which doesn't resolve with my module path, while Java/11 does.

@Flamefire
Copy link
Contributor

Sorry I don't know what HMNS is exactly and how modules are supposed to be named. If the build attempts to find Core/Java/11 and it does not exist, then this is not an issue with #3216 but with HMNS outputting the wrong name or with your setup. My change only fixes the module_exists(module_name) function to match behavior of the module tool, i.e. if module_exists returns True then a module load module_name is possible. It sounds like that is working as expected so I'll leave this to someone familiar with HMNS

@zao
Copy link
Contributor Author

zao commented May 12, 2020

Sorry, HierarchicalMNS, the module naming system where there's a hierarchy of modules.

I can ml Java/11 and ml show Java/11. I don't know what EB used to do, but with the PR EB attempts to ml show Core/Java/11 which doesn't resolve.

You can see this in this log: https://gist.github.com/zao/6c90bab96218fcbe8584918b09a0444b

Regardless of if this is something underlying broken thing that was exposed by changing the method with which things were looked up, it's a regression.

@zao
Copy link
Contributor Author

zao commented May 12, 2020

The existence check in 4.2.0 produced True in the same situation:

https://gist.github.com/zao/1d41be949440435974a3d6c5c8b3953d

@Flamefire
Copy link
Contributor

Regardless of if this is something underlying broken thing that was exposed by changing the method with which things were looked up, it's a regression.

True. I was just pointing out that exactly this is the case: Something is broken trying to use Core/Java/11 as the module name which is not a valid module but previously a (hacky, incomplete and sometimes wrong) fallback was used which did recognize this as a valid module.

Hence me relaying it to someone familiar with HMNS to sort this out.

From your log I see:

== 2020-05-12 15:20:37,413 utilities.py:102 DEBUG Module name Core/Java/11.0.2 validated
== 2020-05-12 15:20:37,413 easyconfig.py:2501 DEBUG Obtained valid full module name Core/Java/11.0.2
== 2020-05-12 15:20:37,413 easyconfig.py:2522 DEBUG Determining short module name for <easybuild.framework.easyconfig.easyconfig.EasyConfig object at 0x7fbfc7dc7c50> (force_visible: False)
== 2020-05-12 15:20:37,413 easyconfig.py:2470 DEBUG No alternative software name specified to determine module name with
== 2020-05-12 15:20:37,413 utilities.py:102 DEBUG Module name Java/11.0.2 validated
== 2020-05-12 15:20:37,413 easyconfig.py:2524 DEBUG Obtained valid short module name Java/11.0.2

I.e. see the Module name Java/11.0.2 validated using the non Core-prefix

And

Determining full module name for {'external_module_metadata': {}, 'short_mod_name': 'Java/11', 'toolchain': {'version': '', 'name': 'system'}, 'name': 'Java', 'full_mod_name': 'Core/Java/11', 'system': True, 'build_only': False, 'versionsuffix': '', 'version': '11', 'toolchain_inherited': False, 'hidden': False, 'external_module': False}

So I guess the short_mod_name should be used. This is supported by

== 2020-05-12 15:20:54,332 modules.py:1591 DEBUG Paths for 'module show' key '('MODULEPATH=/eb/modules/all:/eb/develop/modules:/eb/modules/all/Core:/opt/Lmod-7.8.8/modulefiles/Linux:/opt/Lmod-7.8.8/modulefiles/Core:/opt/Lmod-7.8.8/lmod/lmod/modulefiles/Core', 'lmod', 'Core/Java/11')': ['/eb/modules/all', '/eb/develop/modules', '/eb/modules/all/Core', '/opt/Lmod-7.8.8/modulefiles/Linux', '/opt/Lmod-7.8.8/modulefiles/Core', '/opt/Lmod-7.8.8/lmod/lmod/modulefiles/Core']

which shows that Core s a subdirectory and hence must not be part of the module name, and by the output of module avail which lists 'Core/Java/11.0.2', 'Java/11', 'Java/11.0.2', but no Core/Java/11

@boegel boegel added this to the next release (4.2.1?) milestone May 12, 2020
@boegel
Copy link
Member

boegel commented May 12, 2020

When using HierarchicalMNS, EasyBuild checks for the existence of modules with the "full" name (Core/Java/11 in this case), not the short name (Java/11).

The reason for that is that the module may not be visible without loading other modules first, since that's how the module hierarchy works (loading a GCCcore module makes all modules built with it visible).

If this got broken, we should really fix it, even if it means re-introducing the fallback.

I'm a bit surprised by this regression though, I could've sworn this was covered by the tests...

@Flamefire
Copy link
Contributor

This seems to only apply to aliases. See my last sentence how the underlying module exists with and without Core but the alias only exists without Core.

I'm a bit surprised by this regression though, I could've sworn this was covered by the tests...

The tests are (now) done so as module show <foo> equals module_exists(<foo>) and as ml show Core/Java/11 does not work, False is the correct result. So I'd suggest to re-verify the reasoning for using full-module-names. Maybe checking for the short module name first solves this already. But as mentioned I don't know enough about HMNS to tell for sure

@boegel
Copy link
Member

boegel commented May 16, 2020

@Flamefire I've fixed this issue in #3337 be restoring the fallback mechanism with ModulesTool.module_wrapper_exists, since it's now clear why it was there...

We can reconsider this going forward, but short term I don't think we can easily avoid checking for existence of modules using the "full" module naming starting from the top of the hierarchy. For top-level modules installed in Core (like Core/Java/11) it may be relatively easy to fix, that it's a different story for modules deeper down in the hierarchy, which require a Core module (and potentially others) to be loaded first before they become visible.

If we should discuss this further, please open a dedicated issue on this, since following up in a closed issue isn't going to work out well.

@Flamefire
Copy link
Contributor

Well I brought my arguments forward here and in other issues and the PR. I haven't seen a verification that this incomplete solution is indeed required so nothing changed.

But with the module-show fix that's still present it works for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants