Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

take into account maxparallel for individual extensions #3811

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from

Conversation

boegel
Copy link
Member

@boegel boegel commented Aug 30, 2021

No description provided.

…nto account for individual extensions

Co-authored-by: Alexander Grund <Flamefire@users.noreply.github.com>
Flamefire
Flamefire previously approved these changes Aug 30, 2021
Copy link
Contributor

@Flamefire Flamefire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
Might also want to remove the min from det_parallelism (systemtools.py) which is likely where the issue I mentioned earlier came from.

@Flamefire
Copy link
Contributor

BTW: As an alternative: Why not call self.set_parallel() in the extensions prerun or in init_ext_instances step? Wasn't that the motivation of #2687 ? (Although it would slightly duplicate some work...) and (the latter) would avoid the need to call check_readiness_step which looks a bit odd in that test (although it is required for the parent ECs parallel check, but that shouldn't exactly be there in that specific test anyway)

@Flamefire
Copy link
Contributor

Might also want to remove the min from det_parallelism (systemtools.py) which is likely where the issue I mentioned earlier came from.

Done (with other changes) in #3812

@boegel
Copy link
Member Author

boegel commented Aug 31, 2021

@Flamefire Just calling self.set_parallel() doesn't work, since set_parallel is a method of EasyBlock which only takes into account the top-level easyconfig parameters.

I don't think it's worth the refactoring at all, the current approach is cleaner imho.

At first sight we're duplicating code, but the self.cfg in extension.py is not the same self.cfg as in easyblock.py, and creating a function just to replace the 3 common lines in both places is overkill imho.

@Flamefire
Copy link
Contributor

This is not exactly true. If you call the EasyBlock.set_parallel with the extension as the self param it will work as expected. There is no such thing as different self.cfg unless I'm missing what you mean exactly.
But yes, we have the ExtensionEasyBlock which I forgot is separate from Extension, and from the former we could use self.set_parallel() without issues.

I was just thinking that it might not be a good idea to rely on check_readiness_step having been called, i.e. the parent ECs parallel being reused.
I actually do have an example for an issue: Parent EC has a lower maxparallel than the extension.
Now the parent ECs parallel the build-option and parents maxparallel are combined into the parents parallel and the extension cannot increase that again.
IMO this is a bug, isn't it?

It is unfortunate that EasyBlock and Extension don't have a common base where we could put set_parallel but IIRC EasyBlock.set_parallel(self) from within Extension should work. Or factoring it out to a function which takes a cfg and log and call that, if that looks to hacky

@boegel
Copy link
Member Author

boegel commented Aug 31, 2021

If you call the EasyBlock.set_parallel with the extension as the self param it will work as expected.

First of all, that would be very hacky to say the least, you should call a method of class A (EasyBlock) with an instance of class B (Extension) when A and B are entirely different things.

Also, how exactly would you do that? To call set_parallel you need to refer to it via an instance of EasyBlock (like self.set_parallel), and you don't even get to pass an alternate self, like you suggest.

The other option would be to create a separate function set_parallel and pass it self.cfg (and log), and then fiddle with it a bit more to avoid doing too much work rather than just a simple comparison between parallel and maxparallel.

Your point about a top-level maxparallel affecting extensions is valid I think, I need to think about that...

@akesandgren
Copy link
Contributor

And does ext1(maxparallel=2) followed by ext2(maxparallel=3) end up doing the right thing when top-level parallel=5?

@Flamefire
Copy link
Contributor

Also, how exactly would you do that? To call set_parallel you need to refer to it via an instance of EasyBlock (like self.set_parallel), and you don't even get to pass an alternate self, like you suggest.

See above: EasyBlock.set_parallel(self) should work. IIRC self.foo() is just syntactic sugar for __class__.foo(self)

The other option would be to create a separate function set_parallel ...

That's what I meant above: "Or factoring it out to a function which takes a cfg and log and call that, if that looks to hacky"

In the end I wouldn't even fiddle with it, just take the set_parallel function as-is and be done with it (besides the new parameters when extracting it). I mean: It was written to be called from other places besides the only current one, wasn't it? It already takes the current cfgs parallel, maxparallel and build option into account and with the merged #3812 it is quite cheap to call

@boegel
Copy link
Member Author

boegel commented Aug 31, 2021

Also, how exactly would you do that? To call set_parallel you need to refer to it via an instance of EasyBlock (like self.set_parallel), and you don't even get to pass an alternate self, like you suggest.

See above: EasyBlock.set_parallel(self) should work. IIRC self.foo() is just syntactic sugar for __class__.foo(self)

@Flamefire OK, I stand corrected, it does work (see below), but I flat out refuse to leverage the set_parallel method like that as it is right now, so refactoring it into a function outside of EasyBlock is worth considering.

>>> class A(object):
...     x = None
...     def test(self, y):
...         self.x = y
...
>>> class B(object):
...     x = None
...
>>> a = A()
>>> b = B()
>>> A.test(b, 1)
>>> a.x
>>> b.x
1

The other option would be to create a separate function set_parallel ...

That's what I meant above: "Or factoring it out to a function which takes a cfg and log and call that, if that looks to hacky"

In the end I wouldn't even fiddle with it, just take the set_parallel function as-is and be done with it (besides the new parameters when extracting it). I mean: It was written to be called from other places besides the only current one, wasn't it? It already takes the current cfgs parallel, maxparallel and build option into account and with the merged #3812 it is quite cheap to call

OK, I'll look into that.

And does ext1(maxparallel=2) followed by ext2(maxparallel=3) end up doing the right thing when top-level parallel=5?

@akesandgren Yes, that does work as expected, every extension inherits the parallel=5 from the parent, which remains unchanged across extensions.

@@ -138,6 +138,12 @@ def __init__(self, mself, ext, extra_params=None):
self.log.debug("Skipping unknown custom easyconfig parameter '%s' for extension %s/%s: %s",
key, name, version, value)

self.cfg['parallel'] = self.master.orig_parallel
par = get_parallel_ec_param_value(self.cfg, self.log)
self.log.info("Setting parallelism: %s", par)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant log.info line

Copy link
Contributor

@Flamefire Flamefire Sep 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually both are redundant as get_parallel_ec_param_value does the logging and the log already has the context of the extension (a bit above though, but still)
Edit: Ah no, that does only debug logging...

write_file(test_ec, test_ec_txt)
ec = process_easyconfig(test_ec)[0]
eb = get_easyblock_instance(ec)

# require to trigger setting of 'parallel' value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"required"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you move that setting of orig-parallel to the ctor this is not really required anymore, unless you add a top-level maxparallel and test the mentioned case of an extension having a higher value for it and it should still work.

E.g. instead of the EC parallel (why do we have that anyway? Isn't that basically the same as maxparallel?) use a maxparallel of 2 and a build option parallel of 5, then barbar should still yield in a parallel value of 3

@easybuilders easybuilders deleted a comment from boegelbot Sep 1, 2021
@@ -212,6 +213,8 @@ def __init__(self, ec):
self.postmsg = '' # allow a post message to be set, which can be shown as last output
self.current_step = None

self.orig_parallel = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not set this to the correct value here already?

self.log.debug("Desired parallelism: minimum of 'parallel' build option/easyconfig parameter: %s", par)
# keep track of original value for 'parallel', so it can be restored before determining level
# of parallelism that can be used for extensions
self.orig_parallel = self.cfg['parallel']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move to ctor?

@@ -1025,10 +1025,16 @@ def test_init_extensions(self):

test_ec = os.path.join(self.test_prefix, 'test.eb')
test_ec_txt = toy_ec_txt.replace("('barbar', '0.0', {", "('barbar', '0.0', {'easyblock': 'DummyExtension',")
test_ec_txt = test_ec_txt.replace("('barbar', '0.0', {", "('barbar', '0.0', {'maxparallel': 3,")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this changing the same as the line above? Maybe combine both into one? This looks like a C&P mistake

@boegel
Copy link
Member Author

boegel commented Sep 2, 2021

One thing I didn't take into account here: just setting self.cfg['parallel'] does not seem to be enough to also correct the %(parallel)s template value for that extension, which is what we're using in the jax easyconfig (cfr. easybuilders/easybuild-easyconfigs#13851).

So, this needs a bit more love, it'll have to wait until after the EasyBuild v4.4.2 release, this is by no means a blocker...

@boegel boegel modified the milestones: 4.4.2, release after 4.4.2 Sep 2, 2021
@Flamefire
Copy link
Contributor

If we have to do a bit more anyway: Why not deprecate the EC "parallel" param? It is confusing.
We should have only:

  • build option --parallel (or config)
  • EC maxparallel
  • EC property(!) parallel set by the set_parallelism function calculated from the first 2

This would also catch cases where we access ec['parallel'] before we should. Writes to that property (e.g. by easyblocks) should then also update the template values.
Of course for backwards compat we need to handle access to ec['parallel'] (read: use property with fallback to dict, write: set property or dict)

@Flamefire
Copy link
Contributor

One thing I didn't take into account here: just setting self.cfg['parallel'] does not seem to be enough to also correct the %(parallel)s template value for that extension, which is what we're using in the jax easyconfig (cfr. easybuilders/easybuild-easyconfigs#13851).

See #3842 for a change which implements my above described approach.

@akesandgren
Copy link
Contributor

Conflicts here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants