Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for implementing pre- and post-step hooks #2343

Merged
merged 18 commits into from
Dec 12, 2017

Conversation

boegel
Copy link
Member

@boegel boegel commented Nov 15, 2017

update: documentation update in easybuilders/easybuild#392

@boegel boegel added this to the 3.5.0 milestone Nov 15, 2017
wpoely86
wpoely86 previously approved these changes Nov 15, 2017
@damianam
Copy link
Member

I've thinking about support for hooks for some time already, but in a different way. The use case I had in mind was for updating Lmod's cache as soon as easybuild finishes. What is different from this, is that it'd be a hook at the very end of the installation process for all the easyconfigs being installed, instead of a hook after any given step for each easyconfig. What do you think @boegel ?

Also, thinking in that use case, wouldn't it make sense to allow to call external scripts, instead of python scripts?

@boegel
Copy link
Member Author

boegel commented Nov 15, 2017

@damianam We could extend this to also cover other hooks, rather than just pre- and post-step hooks, sure.

How would you call a hook that only runs at the very end? exit_hook, final_hook, epilogue_hook? Once we agree on that, extending this is trivial. Proper documentation should be added too, of course.

Note that this way, you can still call external scripts, with a trivial hook implementation like:

from easybuild.tools.run import run_cmd

def post_install_hook(self):
    run_cmd('/tmp/post_install_hook.sh')

You just lose a bit of context that way: which software name/version/toolchain, potentially also environment eb is working in (unless you're careful), etc.

Requiring the hooks that eb picks up on to be written in Python is not restrictive at all, and allows you to pick up the EasyBlock instance (and access self.installdir, etc.) and the parsed easyconfig file self.cfg...

Maybe we should discuss this a bit during the next EB conf call?

@damianam
Copy link
Member

Don't get me wrong, I think that most hooks must be python files, for the precise reason you mention. But as you said, in the use case I had in mind the script would be trivial and all that power won't be needed, which means that having a python script "wrapping" a shell script means you have to maintain both and it is kind of overkill. Not a biggie, obviously, as the wrapper is trivial and the extra work minimal.

@boegel
Copy link
Member Author

boegel commented Nov 15, 2017

I see your point, but should we add further configuration options so people can provide hook scripts in whatever scripting language that is now cool? That would just be overhead on our end, without very little benefit for the EasyBuild users, I think?

@damianam
Copy link
Member

Well, I think that'd be overhead if you plan to provide interfaces. But if you simply call run_cmd("whatever") you can be flexible enough IMO. If you need something else then write a python hook. If you don't need it just execute the command.

@boegel
Copy link
Member Author

boegel commented Nov 15, 2017

So, are you saying we should consider adding a --hook-commands too, so you can do --hook-commands=exit_hook:lmod_update?

@damianam
Copy link
Member

I think the distinction between *.py and *.whatever_else is clear enough to don't require a different option. I'd simply overload --hooks

@boegel
Copy link
Member Author

boegel commented Nov 15, 2017

@damianam So, just use run_cmd on anything that doesn't come via a *.py?
But then you'd need to support providing separate scripts/commands per hook that we support, i.e. make some kind of mapping?

@damianam
Copy link
Member

Isn't --hooks=exit_hook:./whatever.sh enough as a mapping? Or do you mean to provide multiple hooks and scripts per run? If the later, we can either extend the syntax, or just ignore the whole thing, since creating a trivial python hook wrapper would be less trouble (to implement in framework, even though it would be slightly more troublesome for EB users) and equally powerful.

@boegel
Copy link
Member Author

boegel commented Nov 15, 2017

@damianam Something like that yes, but that's already quite a bit more logic required than what's there now. :)

@damianam
Copy link
Member

Ignore my comments then :-). With an exit_hook I am already happy

@easybuilders easybuilders deleted a comment from boegelbot Nov 18, 2017
@boegel
Copy link
Member Author

boegel commented Nov 18, 2017

@damianam Support for a start_hook and end_hook are now also implemented.

@@ -71,8 +71,7 @@
from easybuild.tools.filetools import adjust_permissions, apply_patch, back_up_file, change_dir, convert_name
from easybuild.tools.filetools import compute_checksum, copy_file, derive_alt_pypi_url, diff_files, download_file
from easybuild.tools.filetools import encode_class_name, extract_file, is_alt_pypi_url, mkdir, move_logs, read_file
from easybuild.tools.filetools import remove_file, rmtree2, write_file
from easybuild.tools.filetools import verify_checksum, weld_paths
from easybuild.tools.filetools import remove_file, rmtree2, run_hook, write_file, verify_checksum, weld_paths
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

write_file at the end to preserve alphabetical order?

@@ -2434,6 +2436,9 @@ def run_step(self, step, step_methods):
"""
self.log.info("Starting %s step", step)
self.update_config_template_run_step()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling run_hook when hooks = None would work, but seems odd.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather have the logic for dealing with not having any hooks defined in run_hook rather than having to copy-paste the logic to check whether any hooks are defined in multiple places.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable.

# obtain a copy of the starting environment so each build can start afresh
# we shouldn't use the environment from init_session_state, since relevant env vars might have been set since
# e.g. via easyconfig.handle_allowed_system_deps
init_env = copy.deepcopy(os.environ)

run_hook('start', hooks)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The labelling is at the moment "freeform". Nothing wrong with it, but I wonder if it makes sense to restrict it and create a list of valid labels? That way one can also inform the user in case of typos in the hooks list. Think of "why is my stat_hook not working?" kind of situation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, good idea.

So, if someone includes stat_hook, they would get an error that stat is not a valid hook, and then spit out a list of known hooks?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep

:param post_step_hook: indicates whether hook to run is a post-step hook
"""
res = None

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The if below means that all hooks, except start and end need to have prefixes. Conceptually, I find it odd. If I create a install_hook I expect it to be executed after the install step. But it needs to be called post_install_hook. That way it is clear what it does, but I wonder if it makes sense to consider too install_hook.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel install_hook is ambiguous. Some may assume it would be called as soon as the install step is triggered (equivalent with the current pre_install_hook), while your assumption is different.
To avoid guessing & confusion, I feel it's better to be explicit...

With your suggestion above in mind, you'd get an error like this when you define install_hook:

ERROR: 'install_hook' is not a valid hook; available hooks are: ..., pre_install_hook, post_install_hook, ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a reasonable approach.

@boegel
Copy link
Member Author

boegel commented Nov 22, 2017

@damianam Check for valid defined hooks implemented. We can probably do better w.r.t. the produced error message though...

$ eb --hooks test.py example.eb
== temporary log file in case of crash /tmp/eb-7TsXSG/easybuild-wcd5iy.log
ERROR: Found one or more unknown hooks: stat_hook (known hooks: start_hook, pre_fetch_hook, post_fetch_hook, pre_ready_hook, post_ready_hook, pre_source_hook, post_source_hook, pre_patch_hook, post_patch_hook, pre_prepare_hook, post_prepare_hook, pre_configure_hook, post_configure_hook, pre_build_hook, post_build_hook, pre_test_hook, post_test_hook, pre_install_hook, post_install_hook, pre_extensions_hook, post_extensions_hook, pre_postproc_hook, post_postproc_hook, pre_sanitycheck_hook, post_sanitycheck_hook, pre_cleanup_hook, post_cleanup_hook, pre_module_hook, post_module_hook, pre_permissions_hook, post_permissions_hook, pre_package_hook, post_package_hook, pre_testcases_hook, post_testcases_hook, end_hook)

return res


def run_hook(label, known_hooks, pre_step_hook=False, post_step_hook=False, args=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to have args at all? They aren't used anywhere.


hook_name = hook_prefix + label + '_hook'

for hook in known_hooks:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it make sense to have a dictionary instead of a list here?

@@ -2434,6 +2437,9 @@ def run_step(self, step, step_methods):
"""
self.log.info("Starting %s step", step)
self.update_config_template_run_step()

run_hook(step, self.hooks, pre_step_hook=True, args=[self])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now understand why args is needed (in a previous comment on a separate commit I didn't :-P). Question is: Can we test it somehow?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean, test that self is being passed as an argument?

That's already done in , see test_toy_build_hooks in toy_build.py where the post_install_hook there which uses self.name and self.version.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean that args seems like a kind of wildcard that I find trouble understanding. It is passed directly to the hook in all cases except the start and end hooks, where run_hook is executed without any of the optional arguments. And then the hook itself might do something with it, but being it either [] or [self] I don't find it very clear what is the purpose of it. I guess that now, with a slightly better understanding of what you are doing here, my comment is simply "can we rename args to something more descriptive?". With a different name I suppose I would understand it from the beginning.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, only the step hooks get passed an argument (i.e. self, the EasyBlock instance that gives access to all information).

This may change in the future though, if a reason pops up to pass additional arguments.

args is commonly used in Python code, see for example https://pythontips.com/2013/08/04/args-and-kwargs-in-python-explained/.

The idea is that the args list is passed down with *args to "unpack" it when the hook is called.

Hooks should be implemented like this, to be future-proof (i.e. keep working when we may pass down additional (even named) arguments):

def pre_install_hook(self, *args, **kwargs):
    # do something with self

def start_hook(*args, **kwargs):
    # do something (no (named) arguments passed, for now)

I'll clarify this in the yet-to-be-written documentation.

@@ -42,6 +42,7 @@
import fileinput
import glob
import hashlib
import imp
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longer needed

_log = fancylogger.getLogger('hooks', fname=False)

# this should be obtained via EasyBlock.get_steps(),
# but we can't import from easybuild.framework.easyblock without introducing a cyclic dependency...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But maybe we can push out this list, so we can import the same list on both sides?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm not too happy with this either...
We could keep this list somewhere else, I'll see if that a good option.


def load_hooks(hooks_path):
"""Load defined hooks (if any)."""
hooks = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to convert this to a dictionary

@damianam
Copy link
Member

@boegel Regarding the error message, why not using SequenceMatcher to suggest the closest known hook? I believe something similar is done somewhere else in framework to print typos and suggestions.

@boegel
Copy link
Member Author

boegel commented Dec 6, 2017

@damianam Great suggestion, thanks! This looks way better/cooler now, see below.

Other remarks are also tackled, can you please give this another look, and also a quick test?

I'll also look into writing up some documentation for this...

$ cat example_hooks.py
def stat_hook():
    pass

def install_hook(self):
    pass

$ eb --hooks example_hooks.py bzip2-1.0.6.eb -df
== temporary log file in case of crash /tmp/eb-ImL6HO/easybuild-bs00D3.log
ERROR: Found one or more unknown hooks:
* stat_hook (did you mean 'start_hook'?)
* install_hook (did you mean 'pre_install_hook', or 'post_install_hook'?)

List of known hooks: start_hook, pre_fetch_hook, post_fetch_hook, pre_ready_hook, post_ready_hook, pre_source_hook, post_source_hook, pre_patch_hook, post_patch_hook, pre_prepare_hook, post_prepare_hook, pre_configure_hook, post_configure_hook, pre_build_hook, post_build_hook, pre_test_hook, post_test_hook, pre_install_hook, post_install_hook, pre_extensions_hook, post_extensions_hook, pre_postproc_hook, post_postproc_hook, pre_sanitycheck_hook, post_sanitycheck_hook, pre_cleanup_hook, post_cleanup_hook, pre_module_hook, post_module_hook, pre_permissions_hook, post_permissions_hook, pre_package_hook, post_package_hook, pre_testcases_hook, post_testcases_hook, end_hook

@@ -72,7 +72,9 @@
from easybuild.tools.filetools import compute_checksum, copy_file, derive_alt_pypi_url, diff_files, download_file
from easybuild.tools.filetools import encode_class_name, extract_file, is_alt_pypi_url, mkdir, move_logs, read_file
from easybuild.tools.filetools import remove_file, rmtree2, verify_checksum, weld_paths, write_file
from easybuild.tools.hooks import run_hook
from easybuild.tools.hooks import BUILD_STEP, CLEANUP_STEP, CONFIGURE_STEP, EXTENSIONS_STEP, FETCH_STEP, INSTALL_STEP
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not to be picky, but wouldn't it make sense to put the steps in a dict?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you envision that exactly? What would be the key, what the value?

Also, dictionaries are mutable, so not a good choice for constants...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd do something like:

steps_dict = {
    'BUILD_STEP': 'build',
    'CLEANUP_STEP': 'cleanup',
    ....
}
steps = frozenset(steps_dict.items())

And then import steps and use dict(steps)['BUILD_STEP'] for instance. It isn't too elegant, but importing every step individually seems like a PITA, and more difficult to maintain.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@damianam That's kind of bypassing the whole point of introducing constants.

The point here is to avoid "magic strings" like 'build', while typos in constants like BUILD_STEP are easier to catch by linters, etc.
With your proposal, we're back to magic strings like 'BUILD_STEP'...

Also, w.r.t. to maintenance: adding additional steps should be rare in the future, removing steps even more so.

I agree that the import of the constants looks a bit ugly though, but I don't see a better option without bypassing the purpose of using constants.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. We can leave it as it is and blame it on the language ;-)

Copy link
Member

@damianam damianam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have discussed sufficiently about this PR and is mature enough. Let's push it in!

@damianam damianam merged commit 349b91e into easybuilders:develop Dec 12, 2017
@boegel boegel deleted the hooks branch December 12, 2017 11:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants