Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move the logic from Cylc commands/scripts to the cylc package #2802

Closed
kinow opened this issue Oct 16, 2018 · 53 comments
Closed

Move the logic from Cylc commands/scripts to the cylc package #2802

kinow opened this issue Oct 16, 2018 · 53 comments
Milestone

Comments

@kinow
Copy link
Member

kinow commented Oct 16, 2018

For reference: python packaging - Command Line Scripts

I have a branch with a work-in-progress setup.py for Cylc (if you have a look at the branch, skip the deleted files and don't mind the number of commits).

In the setup.py, we have the new list of dependencies for Cylc, like isodatetime, and jinja2, matching versions required. And there is also an entry for scripts. The scripts option is configured to list all files under the bin/ folder.

What it means, is that running pip install cylc (not exactly as we haven't published anything there... but just for example) would install the module cylc (i.e. contents of lib/cylc), its dependencies (e.g. versions needed for jinja2, and isodatetime), and also put the Cylc scripts in the $PATH.

Right now users are asked to do that as a part of the installation process, but with this approach that won't be necessary anymore. The scripts would be installed with the new module in the user environment.

Furthermore, users would be able to create virtualenv's, conda environments, or if the deb/rpm packaging works, they could use that option too. And that way they would be able to manage multiple versions of Cylc.

It is also possible to achieve the same with containers, but containers are at a higher level. The option with pip requires simply a Python installation, pip, and virtualenv to support multiple versions (or in the supercomputer perhaps that load module trick...).

One problem with this approach, however, is that it becomes clear that the scripts under the bin/ folder contain too much logic.

This ticket suggests to move that logic to the Cylc module (i.e. under lib/cylc), within appropriate objects for encapsulation. Write tests (which is harder right now), and then later replace the scripts by the Python entry point: console_scripts.

The console_scripts allow to expose functions as scripts.

In other words, cylc scan would be a thin shim, calling actually a function like SomeCylcObject.scan(*args).

The clear downside is the complexity and risk of this change.

But here's a list of benefits:

  • We can test the scripts
  • IMO, it would be easier to maintain
  • Easier to expose, deprecate, replace scripts, as we would change simply the function exposed in the entry point
  • Users that did a import cylc in a Jupyter Notebook, for example, would be able to write code that calls that same function, without having to fork a process
  • The new Web GUI wouldn't have to fork processes (i.e. start a new process, allocate memory, file handlers, I/O, etc), it would simply import files from the syspath, within the Python world, and execute exactly what a user does with cylc scan or any other command (Probably not entirely true... we could still re-use the main() method in most cases?...)

This can be done after the new Web GUI. I would prefer to try to implement this before, as I think the code would look simpler... but on the other hand, if we are not able to implement it right now, I could benchmark the current approach with the scripts, and compare against this approach and validate the last item.

Cheers
Bruno

@matthewrmshin matthewrmshin added this to the soon milestone Oct 16, 2018
@matthewrmshin
Copy link
Contributor

Agree that most logic in bin/ should be moved under lib/cylc/.

@hjoliver
Copy link
Member

Also agree - everything above sounds good to me. We could do it now, if you think it wouldn't delay us too long?

@kinow
Copy link
Member Author

kinow commented Oct 16, 2018

I'm pondering whether I should give it a try while we have some time before the work on Python 3 on isodatetime/Cylc. Will choose a command that I am familiar like run or scan to see how it goes. Thanks!!

@kinow
Copy link
Member Author

kinow commented Nov 16, 2018

While reviewing #2809 I found that scheduler.py uses subprocess.Popen to execute Cylc commands.

Here's a simplified example of what it does:

import os

from subprocess import Popen, PIPE

cmd = ['cylc', 'run', 'suite', '--no-detach', '--verbose']

proc = Popen(
    cmd,
    stdin=open(os.devnull), stdout=PIPE, stderr=PIPE)
if proc.wait():
    print("Error executing command: %s" % proc.communicate()[1])
else:
    comm = proc.communicate()
    print("Done! %s\n%s" % (comm[0], comm[1]))

If not for the sys.exit() calls mixed with the normal logic - which can later be replaced by return and/or raise statements - this could be a drop-in replacement.

from cylc import scheduler_cli
import sys

cmd = ['run', 'suite', '--no-detach', '--verbose']

sys.argv = cmd

scheduler_cli.main()

I used a dummy suite called suite.

[scheduling]
[[dependencies]]
graph="a"

[runtime]
[[a]]

It finished in approximately 3.5 seconds for both scripts, but I suspect the Python version would be faster if the system is a bit more busy (and thus lacking resources to immediately start a process, due to memory/cpu/etc).

But one advantage of the second option, is that we have more control over what is going on, and can investigate issues and optimize the code more easily.

Here's the call graph for the subprocess.Popen example.

bad

In this graph, you can't really see what's going on after the process was started. The only calls you have are the trigger of the process, and then OS and low level calls to check the status of the process.

Here's what it looks when calling the Python code directly.

good

Here it is clear that the script test2.py starts, then on the right hand side you have the HTTP clients and the DAO's to access the local sqlite database. And on the left hand side you have the HTTP suite server program with CherryPy, taking requests from the client.

And from the statistics (I am using the Profiler bundled with PyCharm) one can see 53.5% of the time spent waiting (scheduler.py sleep spends near 2000ms), and 20% in CherryPy's publish method, which is used to coordinate output of threads in CherryPy. These are good numbers to have a look when working on the new Web layer, with Tornado/AIOHTTP/Flask/etc.

But it is much easier to see and debug it without multiple processes. There could be implications on how Cylc currently works, that could prevent us of moving in this direction.

@kinow
Copy link
Member Author

kinow commented Nov 16, 2018

Interestingly memory_profiler seems to indicate less memory used with the option with subprocess.

Filename: test.py

Line #    Mem usage    Increment   Line Contents
================================================
     7     28.6 MiB     28.6 MiB   @profile()
     8                             def test():
     9     28.6 MiB      0.0 MiB       proc = Popen(
    10     28.6 MiB      0.0 MiB           cmd,
    11     28.7 MiB      0.0 MiB           stdin=open(os.devnull), stdout=PIPE, stderr=PIPE)
    12     28.7 MiB      0.0 MiB       if proc.wait():
    13                                     print("Error executing command: %s" % proc.communicate()[1])
    14                                 else:
    15     28.7 MiB      0.0 MiB           comm = proc.communicate()
    16     28.7 MiB      0.0 MiB           print("Done! %s\n%s" % (comm[0], comm[1]))


Filename: test2.py

Line #    Mem usage    Increment   Line Contents
================================================
     5     42.7 MiB     42.7 MiB   @profile
     6                             def test2():
     7     42.7 MiB      0.0 MiB       cmd = ['run', 'suite', '--no-detach', '--verbose']
     8                             
     9     42.7 MiB      0.0 MiB       sys.argv = cmd
    10                             
    11                             
    12     53.6 MiB     10.9 MiB       scheduler_cli.main()

figure_1

figure_2

@hjoliver
Copy link
Member

While reviewing #2809 I found that scheduler.py uses subprocess.Popen to execute Cylc commands.

@kinow - this only happens on #2809, no? (in which case, shouldn't this be a comment on that PR?)

@kinow
Copy link
Member Author

kinow commented Nov 20, 2018

@hjoliver I think the scheduler.py in the master branch does that too. It will put commands in the SubProcPool to be executed via subprocess.Popen to execute cylc ...., like sending messages.

I have a piece of paper I was tracking down some calls in a suite at work. I can confirm tomorrow, but I think I had a branch somewhere experimenting with functools.partials as a naive/poor-man's replacement for the subprocess.Popen, to measure what happens with performance/monitoring/cpu/etc.

That's why I kept this part here. I think the work on #2809 is all good, as far as I could tell. It works like the other parts of the system. I am still planning to test it in my local environment with a Docker container, then send a +1 ⭐

Another by-product of this test, I hope, will be an experiment - or maybe an example - to show how to use Docker with Travis-CI. This way we could have tests with real remote environments if necessary.

@kinow
Copy link
Member Author

kinow commented Nov 20, 2018

(but re-reading the issue description here, maybe it would be better to put it in another issue, sorry)

@hjoliver
Copy link
Member

@kinow - OK understood - I was just taking your statement too literally! (scheduler.py itself does not execute cylc commands in subprocesses ... but it certainly gets other modules to do it).

@kinow
Copy link
Member Author

kinow commented Jan 3, 2019

Note to self: if there is some sort of common API for the commands, it would be nice to add a simple try/catch for KeyboardException. Some commands in Cylc treat this exception, while others don't (e.g. cat-log).

@matthewrmshin
Copy link
Contributor

Hi @kinow cat-log relies on KeyboardInterrupt to terminate sub-processes such as tail -F ... (which would run forever until signalled.)

@kinow
Copy link
Member Author

kinow commented Jan 3, 2019

But I think it prints the error in the consolr output after you press ctrl+c. Or another one does that. I noticed some commands catch KeyboardInterrupt and exit nicely, while others exit and also let the user see the errors in the terminal

@matthewrmshin
Copy link
Contributor

Yes, fully agreed on consistency with KeyboardInterrupt behaviour.

Just re-reading some of the previous comments...

On cylc.scheduler running sub-processes. It is currently a very cheap way to run things in a pool of parallel workers. (Processes in Unix are also more visible and easier to manage via the operating system. E.g. They are more independent, easier to understand and killable by users.) We have tried threading in the past, but we can perhaps try asyncio in the future.

The biggest commands are:

  • cylc jobs-submit - for job submission, which can in theory run as a function in the background if submitting local jobs, but will require running commands on remote hosts via SSH. The current command encapsulates everything in a uniform way. The command does various I/O and calls out to the workload manager submission command e.g. qsub. Turning this into an async function is going to be interesting.
  • cylc jobs-poll - for job poll. Ditto.
  • cylc jobs-kill - for job kill. Ditto - but should not be used as much.
  • Event mail - Using the mail command here for uniformity with the other stuff and also because mail is easier to configure than Python's smtplib module.
  • Job logs retrieval - Using the rsync command - not aware of a pure Python equivalent.
  • Event handler scripts - These are shell scripts at the moment. Until we allow them to be configured as Python logic (see Python API #1962), we are a bit stuck here.
  • External triggers - some can probably be implemented in pure Python, e.g. via suite network API for external suite polling.
  • Anything else?

@sgaist
Copy link
Contributor

sgaist commented Oct 5, 2019

Following the discutions on #3393, about going with entry_points and a central command with subcommands, it looks like click could be used for that but that would require a multistep migration.
I'd propose to start by moving the "pure python" scripts from bin to a scripts subfolder (i.e. cycle/flow/scripts) and create the corresponding entry_points

That would keep the current workflow as is in terms of implementation.

In a second phase, move the cylc script back to python using click and then make all the other commands sub commands of the main click application.

@kinow
Copy link
Member Author

kinow commented Oct 5, 2019

This sounds like a plan! Are you interested in working on a pull request for this?

For personal projects I use click as I find it easier to use. But it would be good if we could implement this first without throwing away the argpase parts.

Would it be ppssible too @sgaist?

@hjoliver
Copy link
Member

hjoliver commented Oct 9, 2019

@sgaist - if you've gone back to the original cylc python script from several years ago, we'll need to be a) careful about changes made since then in the newer bash script; and b) take a very close look at performance in light of comments above. That script is pretty ancient, and probably not well coded. (On the other hand, it is fundamentally quite simple, so it should be easy to streamline it).

@matthewrmshin
Copy link
Contributor

So do you have a strong opinion either way on this @matthewrmshin ?

For that statement, I have no opinion either way. Just want to provide extra context.

The reality is that apart from the occasional cylc help or cylc --version commands, all other cylc ... commands will end up calling a Python cylc-... command, so there is really not much gain either way. The shell script did have the advantage of being able to set up the environment for Python where necessary. However, now that the project is using package managers, the usefulness of the shell script has totally diminished.

For simplicity (and for the future Python API), I fully support moving towards a pure Python system.

@oliver-sanders
Copy link
Member

oliver-sanders commented Oct 10, 2019

I think we should be able to support macOS too, maybe with some tweaking

Let us know what the issues are, I've fixed a couple of simple Mac OS bugs in the past, we aren't great at sticking to POSIX. Couple of installation notes:

  • The Cylc tests will require coreutils (install via brew) accessible without the g prefix (e.g. ls rather than gls - see the homebrew coreutils docs for more info).
  • If you see issues which look like network, domain resolution, etc, this is something Mac OS users have hit before, it's a DNS configuration problem which I don't think we know how to solve yet.

@oliver-sanders
Copy link
Member

we'll need to be a) careful about changes made since then in the newer bash script

The old script may be good for a proof of concept but there have been a few changes, also the old Python script wasn't great.

@hjoliver
Copy link
Member

hjoliver commented Oct 10, 2019

That's a consensus then 👍 back to Python for bin/cylc but maybe start from the current Bash script rather than the (very) old (and not very good) Python script.

@sgaist
Copy link
Contributor

sgaist commented Oct 10, 2019

As requested, the branch is https://github.com/sgaist/cylc-flow/tree/move_to_entry_points

I have made a minimal cylc script using click since it seems to be one of the possible future framework for building the command set.

Note that it's not optimal currently as for example click provides some tools to handle abbreviated commands.

On a side note, did you consider using pre-commit for checking/fixing code before committing ? There are some nice tools like black, bandit, flake8, etc. that could be run directly on the developer machine and might help keeping the code base tidy without having to redo commits.

@kinow
Copy link
Member Author

kinow commented Oct 10, 2019

Might be a good idea the pre commit hooks. JupyterHub and/or pandad I think had some. Feel free to create an issue/PR for it :)

@kinow
Copy link
Member Author

kinow commented Oct 10, 2019

(Working on the frontend today, but will take a look at the branch as soon as I'm done over there. Thanks @sgaist!!!)

@hjoliver
Copy link
Member

hjoliver commented Oct 13, 2019

@sgaist - good idea on the pre-commit hooks. We do run bandit, and style checking, etc., via codacy and github, but errors revealed there do require "redoing commits" as you say. 👍

@hjoliver
Copy link
Member

@sgaist - I took a quick look at your branch. Initial comments:

  • nice
  • "refactor script and simplify it using click" - wow!
  • the result after pip install seems fully functional (I ran a suite, which tests quite a lot).

Some current CLI help functionality is broken compared to master, but presumably easy to fix (or potentially, we could change to a new way):

  • cylc --help doesn't show the top level command help
  • cylc CATEGORY COMMAND --help shows category help instead of command help (I think we decided a long time ago though, that our command categories are not very useful).

Some commands (at least one) missing?

  • cylc jobscript - "unknown utility"

Sounds like you could use click more extensively? (The small bit in the main script looks nice).

Seems like very minor problems.

@cylc/core - if others agree, shall we suggest @sgaist puts up a PR from his branch?

  • last commit "black"s all scripts - need to agree on that, but I imagine we would agree?
  • click seems really nice, with good documentation; and it has a good pedigree - shall we just use it? (Do we really need an exhaustive evaluation of all CLI arg parser tools - I vote not?)

@kinow
Copy link
Member Author

kinow commented Oct 14, 2019

Great work @sgaist !

On the package name, what about cylc.flow.cli, instead of cylc.flow.scripts?

last commit "black"s all scripts - need to agree on that, but I imagine we would agree?

I think it would be better to postpone it. We had discussed black before, and also the 100 characters limit. black's default is 88, we are using 80, the pep had been updated to 100... I think in the end we disagreed on the length and decided to keep things as-is. Discussion on Riot

The black changes could cause issues if we have to revert changes. But I agree adopting it could potentially improve our code quality, reducing maintenance and helping contributors.

click seems really nice, with good documentation; and it has a good pedigree - shall we just use it? (Do we really need an exhaustive evaluation of all CLI arg parser tools - I vote not?)

+1 from me.

@kinow
Copy link
Member Author

kinow commented Oct 14, 2019

Just tested the code, cylc itself has a different output, but my old workflow five worked like a charm with cylc run --no-detach --verbose --debug five. Amazing work @sgaist !

Soon I might be able to invoke some of the Cylc subcommands from Jupyter notebooks :)

@hjoliver
Copy link
Member

On the package name, what about cylc.flow.cli, instead of cylc.flow.scripts?

👍

@TomekTrzeciak
Copy link
Contributor

click seems really nice, with good documentation; and it has a good pedigree - shall we just use it? (Do we really need an exhaustive evaluation of all CLI arg parser tools - I vote not?)

+1 from me.

Click is nice, but we have recently decided for Clize in metoppv/improver#955 for our CLIs if you want to see how it looks like. The main attraction for us is integration with Sphinx (needed some hacking to support Google-style docstrings though).

@oliver-sanders
Copy link
Member

Cool, please raise a PR when ready.

last commit "black"s all scripts - need to agree on that, but I imagine we would agree?

Not keen on this personally but if you're all convinced, ignore me.
(if we really want to do this it would be best done in a quiet time when fewer conflicts would arise)

click seems really nice, with good documentation; and it has a good pedigree - shall we just use it? (Do we really need an exhaustive evaluation of all CLI arg parser tools - I vote not?)

As far as I understand click is just syntatic sugar to optparse so fine. Just two requirements I'd like to flush out first:

  • Need to be able to build a CLI programmatically (which I think we can since its really just optparse).
  • Need Sphinx integration.

On the package name, what about cylc.flow.cli, instead of cylc.flow.scripts?

👍 and shift cylc.flow.terminal and cylc.flow.option_parser (if not removed) there too.

@oliver-sanders
Copy link
Member

The main attraction for us is integration with Sphinx (needed some hacking to support Google-style docstrings though).

I was about to ask just that, how did you get Google-style docstrings to work?

epsy/clize#3

@TomekTrzeciak
Copy link
Contributor

The main attraction for us is integration with Sphinx (needed some hacking to support Google-style docstrings though).

I was about to ask just that, how did you get Google-style docstrings to work?

epsy/clize#3

By subclassing ClizeHelp to insert docstring conversion in the right place:
https://github.com/neilCrosswaite/improver/blob/5645635/lib/improver/cli/__init__.py#L56-L95

@oliver-sanders
Copy link
Member

By subclassing ClizeHelp to insert docstring conversion in the right place

Nice, you should push that to epsy/clize.

@hjoliver
Copy link
Member

hjoliver commented Oct 15, 2019

Arg parser comparitive notes: https://clize.readthedocs.io/en/stable/alternatives.html

After an admittedly pretty superficial look, I vote for click. The decorators are cool. It is also more active as a project. A quick google for sphinx integration found this: https://github.com/click-contrib/sphinx-click (@oliver-sanders or @sadielbartholomew better to judge than me though, on this).

@matthewrmshin
Copy link
Contributor

(So much for The Zen of Python.)

There should be one-- and preferably only one --obvious way to do it.

@hjoliver
Copy link
Member

I guess that only applies to the core language 🤣

@TomekTrzeciak
Copy link
Contributor

After an admittedly pretty superficial look, I vote for click.

Might work well as you probably have different goals for Cylc. One of the things we wanted to achieve is 1-2-1 mapping between our top level python functions APIs and CLIs. This wouldn't work if we can't use "CLI functions" as normal python functions at the same time (i.e., pass non-string arguments to them).

@hjoliver
Copy link
Member

@sgaist - if you can reassure us on the two check-box concerns discussed above #2802 (comment) then please go ahead and post a PR, but back out your commit of black-processed files (which is still not agreed for Cylc). Also, if it is the case that click requires very minimal coding (as appears to be the case on your branch) we could easily swap it out again if necessary, before Cylc-8 is released (mid 2020), so that's not a big deal I think.

@sgaist
Copy link
Contributor

sgaist commented Oct 15, 2019

About the missing commands, there currently are three:

  • graph-diff
  • jobscript
  • scp-transfert

These are shell scripts that must be ported back to python but I wanted to have a proof of concept working before doing the port.

I haven't tested click-sphinx but it looks exactly what would be needed.

One thing I'd like to also note is that the current cylc script is just forwarding calls further mostly like the original shell script did. I have done it like that for the proof of concept.

If click is deemed the way to go, all the other commands could be ported to click as well and then plug in as sub-commands. That way help and parameters handling would be forwarded to the correct command. It will also allow for other cylc projects to add new command without having to modify the main package to support them.

For example, the command_list generated at beginning of the script can be avoided when all commands are implemented through click.

@hjoliver
Copy link
Member

These are shell scripts that must be ported back ...

Ah, roger that.

If click is deemed the way to go, ...

That would indeed be good (and what I was getting at with "Sounds like you could use click more extensively?" above) - can be done as a follow-up to the initial PR.

@sgaist sgaist mentioned this issue Oct 16, 2019
6 tasks
@oliver-sanders
Copy link
Member

Closed by #3413

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants