Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return execution run status from Request Experiment Execution #257

Open
mwes opened this issue Sep 23, 2020 · 1 comment
Open

Return execution run status from Request Experiment Execution #257

mwes opened this issue Sep 23, 2020 · 1 comment
Assignees
Labels
medium priority "Need to do" or "Should do" new feature New feature
Milestone

Comments

@mwes
Copy link
Contributor

mwes commented Sep 23, 2020

The current request experiment execution path should return a JSON response that follows this template:

{
	"message": "The request was successful",
	"result": {
		"executionId": "7LYepwB1ewLb3",
		"msg": {
			"container_search_string": [...]
			"default_parameters": { 
				...
		}
	},
	"status": "success",
	"version": "1.6.1"
}

What this is saying is that the request was received and accepted by the reactor, and assigned an execution id 7LYepwB1ewLb3 which you can see under the result field.

This does not provide visibility into any possible errors in that execution. To do that, we need to check the execution status. A GET request to the following:

https://api.sd2e.org/actors/v2/control-annotator.prod/executions/7LYepwB1ewLb3?x-nonce=$NONCE

will retrieve that. The nonce has been provided to you in a side-channel. This returns JSON as well:

{
  "message": "Actor execution retrieved successfully.", 
  "result": {
    "cpu": 18671696018, 
    "exitCode": 1, 
    "finalState": {
      "Dead": false, 
      "Error": "", 
      "ExitCode": 1, 
      "FinishedAt": "2020-09-23T17:12:46.826Z", 
      "OOMKilled": false, 
      "Paused": false, 
      "Pid": 0, 
      "Restarting": false, 
      "Running": false, 
      "StartedAt": "2020-09-23T17:12:40.026Z", 
      "Status": "exited"
    }, 
    "finishTime": "2020-09-23T17:12:46.826Z", 
    "id": "7LYepwB1ewLb3", 
    "io": 17517, 
    "messageReceivedTime": "2020-09-23T17:12:39.147Z", 
    "runtime": 7, 
    "startTime": "2020-09-23T17:12:39.552Z", 
    "status": "COMPLETE", 
    "workerId": "7KMApj3jrNg5k"
  }, 
  "status": "success", 
  "version": "1.6.1"
}

Note the exit_code and status fields:

  • A non-zero exit code indicates an execution error
  • status can be of: ["SUBMITTED", "COMPLETE"]

after being submitted for execution, the reactor will process and transition to the completed state when done.

When status is "COMPLETE" the exit_code will be valid.

For non-zero exit codes, we can pull (and show) logs for the execution via:

https://api.sd2e.org/actors/v2/control-annotator.prod/executions/7LYepwB1ewLb3/logs?x-nonce=$NONCE

{
  "message": "Logs retrieved successfully.", 
  "result": {
    "logs": "..."
  }, 
  "status": "success", 
  "version": "1.6.1"
}

This will allow visibility/clarity into reactor executions that succeed or fail, and if they fail, what the nature of the error was.

@tramyn tramyn self-assigned this Sep 23, 2020
@tramyn tramyn added new feature New feature medium priority "Need to do" or "Should do" labels Sep 23, 2020
@tramyn tramyn added this to the 2.10 milestone Sep 28, 2020
@tramyn
Copy link
Collaborator

tramyn commented Oct 21, 2020

IP has been updated to use new TACC endpoint to address #252. This issue, however, will require more changes based on the Slack conversation that went on between @mwes and @mwvaughn on 10/21/2020. As mentioned in the conversation, #252 is in a good state for @mwes to use for milestone 2.10. @mwes will continue to help other users debug the state of an experiment execution until #257 is resolved.

New workflow that this issue will need to build off of:

  • An ER document can have multiple execution id assigned to an experiment. IP will need to make a request to TACC's endpoint for getting a list of request_id that matches to a experiment_reference_url_for_xplan.
  • Depending on the request_id selected, IP will need to map request_id -> execution_id
  • The corresponding execution_id can then get passed into TACCGoAccessor.get_status_of_experiment to get the status of an experiment execution. Information returned from this function, should be reported back to the user in a human readable form.

@tramyn tramyn modified the milestones: 2.10, 2.11 Oct 21, 2020
@tramyn tramyn modified the milestones: 3.1, 3.2, 3.3 Jan 28, 2021
@jakebeal jakebeal modified the milestones: 3.4, 4.0 Jun 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
medium priority "Need to do" or "Should do" new feature New feature
Projects
None yet
Development

No branches or pull requests

3 participants