Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task messaging ssh #4119

Merged
merged 12 commits into from
Mar 23, 2021
Merged

Task messaging ssh #4119

merged 12 commits into from
Mar 23, 2021

Conversation

datamel
Copy link
Contributor

@datamel datamel commented Mar 11, 2021

Task communication ssh added.

New contact file fields: SCHEDULER_SSH_COMMAND, SCHEDULER_CYLC_PATH, SCHEDULER_USE_LOGIN_SHELL
Removed unused contact file field: COMMS_PROTOCOL_2

These changes close #3327
In light of discussions about comms_timeout at latest Cylc Video Conference, this relate to #4112 (comms_timeout added to ssh comms in this PR)
Requirements check-list

  • I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • Contains logically grouped changes (else tidy your branch by rebase).
  • Appropriate tests are included (unit and/or functional).
  • Already covered by existing tests. (ssh platform added by
  • Appropriate change log entry included.
  • (master branch) I have opened a documentation PR at Update task communication documentation cylc-doc#214.
  • No dependency changes.

@datamel datamel self-assigned this Mar 11, 2021
@datamel datamel added this to the cylc-8.0b1 milestone Mar 11, 2021
@datamel
Copy link
Contributor Author

datamel commented Mar 11, 2021

There is currently an issue with GH actions - looks like the ssh keys are having problems authenticating. Getting a Host key verification failed return from the ssh. I'm looking into this now.

@datamel
Copy link
Contributor Author

datamel commented Mar 12, 2021

To update, the ssh comms is currently untestable on our CI build with the current GH actions setup (we are unable to ssh from the remote to the scheduler for security reasons). Hence the codecov/patch diff is low.

Tests should run locally, with the addition of

#[[_remote_background_indep_ssh]]
uncommented and added to your global-tests.cylc file.

ssh tests can then be run with:
$ etc/bin/run-functional-tests -j 4 -p _remote_background_indep_ssh $(git grep --name-only 'REQUIRE_PLATFORM' tests/functional/)

@datamel
Copy link
Contributor Author

datamel commented Mar 12, 2021

There is currently an issue with GH actions - looks like the ssh keys are having problems authenticating. Getting a Host key verification failed return from the ssh. I'm looking into this now.

Incidentally, @oliver-sanders, I got to the bottom of the tests/functional/platforms/02-host-to-platform-upgrade.t failure - the upgrader was returning a platform with task communication =ssh since it does not take this into consideration.
Failure now solved, I had left some of the ssh platforms uncommented in the global.cylc. Although this has drawn my attention to the need for site global.cylc to be configured correctly and ensuring the correct ordering of platforms - the upgrader can return a platform with a different communication method depending on order. Wondering if this will need to be communicated to users who are continue with using old remote/host configuration but who can not use ZMQ?

@oliver-sanders
Copy link
Member

oliver-sanders commented Mar 12, 2021

To add to Mel's comments on why we can't use the Docker images to test SSH comms at the moment.

This issue proposes [optionally] running the tests themselves in docker (the test host not just the remote hosts) which should enable Scheduler -> Remote -> Scheduled SSH comms (as well as clearing up much of the configuration junk).

#4007

(also adds Python and Bash versions to the test matrix)

Copy link
Member

@hjoliver hjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, tested as working 👍 🎉

cylc/flow/network/client_factory.py Outdated Show resolved Hide resolved
Copy link
Member

@oliver-sanders oliver-sanders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, working on getting the tests passing on local platforms...

Two commands have not been converted to use the new get_client interface:

  • cylc scan.
  • cylc tui.

Neither are important (as they are unlikely to be used in remote jobs), however, it would be good to convert them to match the other commands. cylc scan cannot be converted at the moment as is requies the async_request interface which is not yet implemented. No biggie, have punted to another issue #4129, should be simple, can tackle later.

Could do with addressing the first comment bellow.

cylc/flow/network/client.py Outdated Show resolved Hide resolved
cylc/flow/network/ssh_client.py Outdated Show resolved Hide resolved
cylc/flow/remote.py Outdated Show resolved Hide resolved
@oliver-sanders
Copy link
Member

Ran tests against local platforms, all good, added these two to my regular test matrix:

  • _remote_background_shared_ssh
  • _remote_background_indep_ssh

@oliver-sanders
Copy link
Member

Two conflicts with the cylc pause changes.

datamel and others added 11 commits March 22, 2021 13:53
Add env variable TASK_COMMS_METHOD.
Update contact file with additional fields for ssh task comm.
Add functional test for ssh comms
Make error message more informative in ssh_client.py
Update cylc/flow/network/client_factory.py

Co-authored-by: Hilary James Oliver <hilary.j.oliver@gmail.com>

Update cylc/flow/network/ssh_client.py

Co-authored-by: Oliver Sanders <oliver.sanders@metoffice.gov.uk>
@datamel
Copy link
Contributor Author

datamel commented Mar 22, 2021

Two conflicts with the cylc pause changes.

Now resolved.

Copy link
Member

@oliver-sanders oliver-sanders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rerun tests, all passing 👍!

@oliver-sanders oliver-sanders modified the milestones: cylc-8.0b1, cylc-8.0b0 Mar 23, 2021
@oliver-sanders oliver-sanders merged commit 3ae64fa into cylc:master Mar 23, 2021
@datamel datamel deleted the task-messaging-ssh branch March 23, 2021 13:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

zmq: implement SSH tunnelling
3 participants