Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve comm #2582

Merged
merged 7 commits into from
Apr 19, 2018
Merged

Improve comm #2582

merged 7 commits into from
Apr 19, 2018

Conversation

matthewrmshin
Copy link
Contributor

@matthewrmshin matthewrmshin commented Feb 21, 2018

cylc client: new command to call network API client.

When SSH messaging is already set up for a task remote, other client
commands will pick up the set up automatically without having to use
--use-ssh and related options. Client logic will run locally, with
only network logic done via cylc client --use-ssh.

cylc message:

  • Expect suite name and task job (cycle/task/submit) as arguments.
  • Can now send multiple messages at different severities.
  • API method put_messages (replaces put_message).
    • Multiple messages in one call.
    • Event time as separate argument.
    • Task job (cycle/task/submit) instead of ID (task.cycle).
  • Backward compatible if called with 1 or 2 arguments.
    • Assume new interface when being called with 3 or more arguments.

Server program will ignore task messages from old submits.

Reduce job failed message to just say failed/SIGNAL, etc.

  • Easier to parse, and more similar to started and succeeded messages.

cylc remote-init: now handles SSH comm set up. Job files no longer
contain SSH comm settings.

Rationalise cylc.task_message and cylc.network.httpclient. Location
of network logic and retry can now be found in the latter.

Create a suite UUID on start up of a new run. This is stored in the
suite runtime database file and written to the contact file. Restart will
load and use the same UUID. The UUID is exported to the environment
of a job script. In a job environment, the client will only attempt a
connection to the suite if the UUID value in the environment matches
with that of the contact file.

Close #439. Close #2214. Close #2502. Close #2528. Tick final box in and close #2302.

@matthewrmshin matthewrmshin added this to the next release milestone Feb 21, 2018
@matthewrmshin matthewrmshin self-assigned this Feb 21, 2018
@matthewrmshin
Copy link
Contributor Author

matthewrmshin commented Feb 21, 2018

(May close #2505 as well. Need some testing.)

Edited: not really. This change will only deal with the cylc message part of the issue raised in #2505. More work is required to solve that issue fully.

@matthewrmshin
Copy link
Contributor Author

(Trying to figure out how to write an automated test for the suite UUID change. Otherwise, this PR is good for review.)

@matthewrmshin
Copy link
Contributor Author

(Re-based.)

@matthewrmshin
Copy link
Contributor Author

(Re-based. Codacy failure is nonsense.)

@matthewrmshin matthewrmshin force-pushed the comm-ssh branch 2 times, most recently from afadadb to 881762a Compare March 1, 2018 10:17
@matthewrmshin
Copy link
Contributor Author

(Codacy failure is nonsense.)

Copy link
Member

@oliver-sanders oliver-sanders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still working my way through this but looks pretty good so far.

if func_name == 'put_messages':
results = self._call_server(func_name, payload=payload)
elif func_name == 'put_message': # API 1, 7.5.0 compat
cycle, name = payload['task_job'].split('/')[0:2]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be easier to provide these as separate dictionary keys to avoid having to split the value in multiple places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The task job or task id is used here for consistency with the other API methods that deal with tasks. We should definitely improve this in the future.

@cherrypy.expose
@cherrypy.tools.json_in()
@cherrypy.tools.json_out()
def put_messages(self, task_job=None, event_time=None, messages=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these be args rather than kwargs (else we could end up passing 'None' strings into the scheduler or does it make little difference anyway?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes no difference here, because the data actually come in via the payload JSON, not really via the arguments.

command.append(user_at_host)

# Pass cylc version through.
command += ['env', quote(r'CYLC_VERSION=%s' % CYLC_VERSION)]
if 'CYLC_UTC' in os.environ:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a change in behaviour?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is functionality ported from the old SSH functionality of cylc.task_message. It is beneficial to other commands as well.

@matthewrmshin
Copy link
Contributor Author

(Sorry to keep re-basing here. Still trying to get Travis CI to pass.)

@matthewrmshin
Copy link
Contributor Author

(Travis CI has finally passed, but still reporting as in progress for some reason.)

bin/cylc-client Outdated
suite, options.owner, options.host, options.port,
options.comms_timeout, my_uuid=options.set_uuid,
print_uuid=options.print_uuid, auth=auth)
kwargs = json.load(sys.stdin)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whilst cylc client is an internal command it would be nice if we could call it without providing stdin if it is not required i.e.:

cylc client get_suite_state_summary <reg>

Rather than:

cylc client get_suite_state_summary <reg> <<< '{}'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

json.load raises a ValueError on an empty file. Although the error message can be used to deduce an empty file, the message may change in the future and break out logic here.

I have now introduced a -n option for those API functions that do not expect extra keyword arguments. It is not as clever, but is good enough.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This solution might work https://stackoverflow.com/a/3763257.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only as a solution between interactive versus batch usage. Not quite the intention here.

bin/cylc-client Outdated
print_uuid=options.print_uuid, auth=auth)
kwargs = json.load(sys.stdin)
sys.stdin.close()
sys.stdout.write(json.dumps(getattr(pclient, func)(**kwargs), indent=4))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for pretty printing!

@matthewrmshin
Copy link
Contributor Author

(Re-based and resolved conflicts.)

Copy link
Member

@hjoliver hjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(started reviewing; I'll try to finish tomorrow.)

bin/cylc-help Outdated
@@ -270,6 +270,8 @@ control_commands['set-verbosity'] = ['set-verbosity']
control_commands['broadcast'] = ['broadcast', 'bcast']
control_commands['ext-trigger'] = ['ext-trigger', 'external-trigger']
control_commands['checkpoint'] = ['checkpoint']
control_commands['message'] = ['message', 'task-message']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'message' is already categorized as a task_command

@matthewrmshin
Copy link
Contributor Author

I have added some doc strings to the method where Codacy is reporting the security issues related to usage of subprocess.Popen - to justify that there should be no issue. (When #2503 is done, we should move this new Popen call from cylc.network.httpclient to use the new remote_cylc_cmd function in cylc.remote?)

@matthewrmshin
Copy link
Contributor Author

(Re-based + Resolved Conflicts.)

@hjoliver
Copy link
Member

move this new Popen call from cylc.network.httpclient to use the new remote_cylc_cmd function in cylc.remote?

yep

Copy link
Member

@hjoliver hjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good, no problems found. I still need to do a bit of testing...

typeset run_err_script="$3"
shift 3
typeset run_err_script="$2"
shift 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments documenting these arguments needs updating (in the comments just above).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

self.owner = arg.replace('--user=', '')
elif arg.startswith('--host='):
self.host = arg.replace('--host=', '')
elif arg == '--ssh-cylc=':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ssh command generated for ssh task messaging is screwed up. This line is the problem - it should use startswith() not ==.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

@hjoliver hjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me now.

Generate new suite UUID on normal run. Load previous value on restart.

In a task job environment, check that the environment value of the suite
UUID matches the one in the suite contact file. Don't send message on
mismatch.
cylc client: new command to call network API client.
* Replace usage of `curl` in tests with this command.

When SSH messaging is already set up for a task remote, other client
commands will pick up the set up automatically without having to use
`--use-ssh` and related options. Client logic will run locally, with
only network logic done via `cylc client --use-ssh`.

cylc message:
* Expect suite name and task job (cycle/task/submit) as arguments.
* Can now send multiple messages at different severities.
* API method `put_messages`.
  * Multiple messages in one call.
  * Event time as separate argument.
  * Task job (cycle/task/submit) instead of ID (task.cycle).

Server program will ignore task messages from old submits.

Reduce job failed message to just say `failed/SIGNAL`.

cylc remote-init: now handles SSH comm set up. Job files no longer
contain SSH comm settings.

Rationalise `cylc.task_message` and `cylc.network.httpclient`. Location
of network logic and retry can now be found in the latter.
@matthewrmshin
Copy link
Contributor Author

Branch re-based. Conflicts resolved. Now use new logic of #2503 for SSH cylc client call.

Copy link
Collaborator

@sadielbartholomew sadielbartholomew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review in progress - on priority so will complete tomorrow. General proposal as I have understood it so far seems very sensible. Reporting a few possible minor issues I have spotted at this stage.

bin/cylc-client Outdated
#!/usr/bin/env python

# THIS FILE IS PART OF THE CYLC SUITE ENGINE.
# Copyright (C) 2008-2016 NIWA
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy New Year x2: 2016 2018

bin/cylc-client Outdated
if remrun():
sys.exit(0)

import cylc.flags
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused import (assuming the standardcylc.flags.debug catch in the main() function is not necessary/advised as this command is only for internal use)?

bin/cylc-message Outdated
failure. Applications run by task jobs can use this command to report messages
and to report registered task outputs.

Messages can be specified as arguments. A '-' indicates that messages the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something is amiss with the second sentence: '...that messages the...' seems likely.

# * Exhausted number of tries.
# * Contact info file not found, suite probably not running.
# Don't bother with retry, suite restart will poll any way.
if i >= max_tries or isinstance(exc, ClientInfoError):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ClientInfoError is raised separately prior to it's base exception class ClientError on 287:288, so I believe the second condition will never be True as it would have already have been caught?

Fix copyright year.
Fix doc for new commands.
Fix bad exception handling.
Remove pointless warning message.
@matthewrmshin
Copy link
Contributor Author

Minor issues and comments addressed.

Copy link
Collaborator

@sadielbartholomew sadielbartholomew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review complete. Test battery passes locally for me, with new tests suitable. Feedback from first sub-review addressed appropriately. I have several new comments, though they are largely minor so can be addressed separately once this has been merged if this would help ease the PR 'logjam', hence approving.

"""Call server via "cylc client --use-ssh".

Call "cylc client --use-ssh" using `subprocess.Popen`. Payload and
arguments of the API method are serialized as JSON and is written to a
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few tiny docstring issues that we could ignore as it still makes sense but ideally we could change: '...is are written...' here & then 'even gets a chance' on line 534.

json.dump(kwargs, stdin)
stdin.seek(0)
else:
stdin = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stdin = open(os.devnull) instead unless there is a reason that it is unnecessary or wrong in this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now handled by remote_cylc_cmd, which will set this to open(os.devnull) and add -n to the SSH command line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add some extra comments.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, sorry - my bad! By all means add some comments if you think others might not see this but having had a second look it seems fairly obvious.

# Check mismatch suite UUID
env_suite = os.getenv(self.srv_files_mgr.KEY_NAME)
env_uuid = os.getenv(self.srv_files_mgr.KEY_UUID)
if (self.suite and env_suite and env_uuid and
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would I think be easier to comprehend the list of conditions if env_suite == self.suite came before the env_uuid truth condition so that line 628 is a related group (or an re-ordering along a similar line).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. It proves that our brains are wired differently. I'll make the change to keep you happy.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need, it's your code & obviously an inconsequential change. It seemed a much more natural grouping to me personally but I can only use my own brain - I can only guess how others might regard things :)

@cherrypy.tools.json_in()
@cherrypy.tools.json_out()
def put_messages(self, task_job=None, event_time=None, messages=None):
"""Put task messages.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This description should be improved with a bit more context.

@@ -81,21 +85,24 @@ def remote_cylc_cmd(cmd, user=None, host=None, capture=False,
# The command is read from the site/user global config file, but we check
# above that it ends in 'cylc', and in any case the user could execute
# any such command directly via ssh.
proc = Popen(command, stdout=stdout, stdin=open(os.devnull))
if stdin is None:
stdin = open(os.devnull)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could just set this at the end of the block on line 54 (unless it is particularly important that the code there deals entirely with the commands & the stdin/out in the subsequent code).

job_log_name = os.path.join(
glbl_cfg().get_derived_host_item(suite, 'suite job log directory'),
'job')
job_status_file = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary? And if so shouldn't it get set to an empty file & not None so that if not caught as an IOError the write() method isn't called on a non-file object on lines 111 etc?

if job_id > 1:
# If os.getppid() returns 1, the original job process
# is likely killed already
job_status_file.write('%s=%s\n' % (CYLC_JOB_PID, job_id))
Copy link
Collaborator

@sadielbartholomew sadielbartholomew Apr 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems quite unwieldy to keep applying the string formatting template '%s=%s\n' so could we assign it as a regex to a constant to apply?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is actually clearer in the context here to use a string literal for format strings instead of introducing a constant.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough! I was mainly thinking it would make it much simpler if we ever wanted to adjust the 'template' but would probably make the code a little less clear, yes.

# is likely killed already
job_status_file.write('%s=%s\n' % (CYLC_JOB_PID, job_id))
job_status_file.write('%s=%s\n' % (CYLC_JOB_INIT_TIME, event_time))
elif message == TASK_OUTPUT_SUCCEEDED:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This elif statements use the same format & variables to write to job_status_file as the following two, with only the 'message' changing, so 112:127 could definitely be consolidated (at some point in the future at least).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not worth doing as part of this change, given that this is not really new code (- just re-indented), and it is unlikely that we can reduce the line count by consolidation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

@@ -56,6 +58,11 @@ def remote_init(uuid_str, rund):
tarhandle.close()
finally:
os.chdir(oldcwd)
if indirect_comm:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if indirect_comm is not None given default is None?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a boolean, so can be set to False as well.

env no_proxy=* curl -A "${AGENT}" -v --cacert "${SRV_D}/ssl.cert" \
--digest -u "cylc:$(<"${SRV_D}/passphrase")" \
"https://${HOST}:${PORT}/get_latest_state"
cylc client -n --set-uuid="${UUID}" 'get_latest_state' "${SUITE_NAME}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could (eventually) create a function for a run_ok test of incremental name calling this exact command (as is done throughout this test file) after sleep 1 or other activity to make this test a bit more intuitive & tidy .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed especially if we end up doing more of these.

And removal of some unnecessary logic.
@matthewrmshin
Copy link
Contributor Author

All review issues fixed or addressed.

@sadielbartholomew
Copy link
Collaborator

Re-approving after looking at the one response commit. Advised to merge given trivial nature of changes in the commits subsequent to review from @hjoliver who is now on short-term leave.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants