Return execution run status from Request Experiment Execution #257

mwes · 2020-09-23T20:42:00Z

The current request experiment execution path should return a JSON response that follows this template:

{
	"message": "The request was successful",
	"result": {
		"executionId": "7LYepwB1ewLb3",
		"msg": {
			"container_search_string": [...]
			"default_parameters": { 
				...
		}
	},
	"status": "success",
	"version": "1.6.1"
}

What this is saying is that the request was received and accepted by the reactor, and assigned an execution id 7LYepwB1ewLb3 which you can see under the result field.

This does not provide visibility into any possible errors in that execution. To do that, we need to check the execution status. A GET request to the following:

https://api.sd2e.org/actors/v2/control-annotator.prod/executions/7LYepwB1ewLb3?x-nonce=$NONCE

will retrieve that. The nonce has been provided to you in a side-channel. This returns JSON as well:

{
  "message": "Actor execution retrieved successfully.", 
  "result": {
    "cpu": 18671696018, 
    "exitCode": 1, 
    "finalState": {
      "Dead": false, 
      "Error": "", 
      "ExitCode": 1, 
      "FinishedAt": "2020-09-23T17:12:46.826Z", 
      "OOMKilled": false, 
      "Paused": false, 
      "Pid": 0, 
      "Restarting": false, 
      "Running": false, 
      "StartedAt": "2020-09-23T17:12:40.026Z", 
      "Status": "exited"
    }, 
    "finishTime": "2020-09-23T17:12:46.826Z", 
    "id": "7LYepwB1ewLb3", 
    "io": 17517, 
    "messageReceivedTime": "2020-09-23T17:12:39.147Z", 
    "runtime": 7, 
    "startTime": "2020-09-23T17:12:39.552Z", 
    "status": "COMPLETE", 
    "workerId": "7KMApj3jrNg5k"
  }, 
  "status": "success", 
  "version": "1.6.1"
}

Note the exit_code and status fields:

A non-zero exit code indicates an execution error
status can be of: ["SUBMITTED", "COMPLETE"]

after being submitted for execution, the reactor will process and transition to the completed state when done.

When status is "COMPLETE" the exit_code will be valid.

For non-zero exit codes, we can pull (and show) logs for the execution via:

https://api.sd2e.org/actors/v2/control-annotator.prod/executions/7LYepwB1ewLb3/logs?x-nonce=$NONCE

{
  "message": "Logs retrieved successfully.", 
  "result": {
    "logs": "..."
  }, 
  "status": "success", 
  "version": "1.6.1"
}

This will allow visibility/clarity into reactor executions that succeed or fail, and if they fail, what the nature of the error was.

The text was updated successfully, but these errors were encountered:

tramyn · 2020-10-21T22:32:00Z

IP has been updated to use new TACC endpoint to address #252. This issue, however, will require more changes based on the Slack conversation that went on between @mwes and @mwvaughn on 10/21/2020. As mentioned in the conversation, #252 is in a good state for @mwes to use for milestone 2.10. @mwes will continue to help other users debug the state of an experiment execution until #257 is resolved.

New workflow that this issue will need to build off of:

An ER document can have multiple execution id assigned to an experiment. IP will need to make a request to TACC's endpoint for getting a list of request_id that matches to a experiment_reference_url_for_xplan.
Depending on the request_id selected, IP will need to map request_id -> execution_id
The corresponding execution_id can then get passed into TACCGoAccessor.get_status_of_experiment to get the status of an experiment execution. Information returned from this function, should be reported back to the user in a human readable form.

tramyn self-assigned this Sep 23, 2020

tramyn added new feature New feature medium priority "Need to do" or "Should do" labels Sep 23, 2020

tramyn added this to the 2.10 milestone Sep 28, 2020

tramyn mentioned this issue Oct 15, 2020

Update intent parser execution target #252

Closed

tramyn modified the milestones: 2.10, 2.11 Oct 21, 2020

tramyn modified the milestones: 3.1, 3.2, 3.3 Jan 28, 2021

jakebeal modified the milestones: 3.4, 4.0 Jun 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Return execution run status from Request Experiment Execution #257

Return execution run status from Request Experiment Execution #257

mwes commented Sep 23, 2020 •

edited

Loading

tramyn commented Oct 21, 2020

Return execution run status from Request Experiment Execution #257

Return execution run status from Request Experiment Execution #257

Comments

mwes commented Sep 23, 2020 • edited Loading

tramyn commented Oct 21, 2020

mwes commented Sep 23, 2020 •

edited

Loading