Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clear Job#finished_time when it is opened (#32605) #32755

Merged

Conversation

benwtrent
Copy link
Member

When a job is opened, a new action is taken now where the cluster state is updated, such that the job that was opened has its finished_time set to null.

Verified this works locally and through new integration test.

Copy link
Contributor

@dimitris-athanasiou dimitris-athanasiou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Left a few comments.

// Step 6. Clear job finished time once the job is started and respond
ActionListener<OpenJobAction.Response> clearJobFinishTime = ActionListener.wrap(
response -> {
if(response.isAcknowledged()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: space after if

@@ -574,6 +592,35 @@ public void onTimeout(TimeValue timeout) {
});
}

private void clearJobFinishedTime(String jobId, ActionListener<OpenJobAction.Response> listener) {
clusterService.submitStateUpdateTask("clearing_job_finish_time [" + jobId + "]", new ClusterStateUpdateTask() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it seems that in most places we use hyphens to name the cluster state update tasks. It would be more consistent to rename this to clear-finished-time-for-job-{job_id}.

clusterService.submitStateUpdateTask("clearing_job_finish_time [" + jobId + "]", new ClusterStateUpdateTask() {
@Override
public ClusterState execute(ClusterState currentState) {
XPackPlugin.checkReadyForXPackCustomMetadata(currentState);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is unnecessary as if a job is being opened we are certain there is ML Metadata already installed.


@Override
public void onFailure(String source, Exception e) {
listener.onFailure(e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am tempted to handle failure differently here. Even if the cluster state update failed, the job has successfully opened. It might be more misleading to return an error to the user, as they may think the job has not been opened. I wonder if logging the error should be enough. Any thoughts @droberts195 ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I too considered just logging the issue and still letting the end user know that it succeeded. Definitely a contentious decision that needs discussion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should just log the error. As @dimitris-athanasiou said, if it fails here the job will have a status of open. So anyone who reacted to the failure to open by trying again would be bemused to find out that they couldn't open the job the second time because it was already open.

@@ -0,0 +1,57 @@
package org.elasticsearch.xpack.ml.integration;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing license header

import static org.hamcrest.CoreMatchers.nullValue;
import static org.hamcrest.Matchers.is;

public class ReopenJobResetsFinishedTimeIT extends MlNativeAutodetectIntegTestCase {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! :-)

@davidkyle
Copy link
Member

The build failed with checkstyle

[ant:checkstyle] [ERROR] /var/lib/jenkins/workspace/elastic+elasticsearch+pull-request/x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/action/TransportOpenJobAction.java:58:8: Unused import - org.elasticsearch.xpack.core.ml.action.FinalizeJobExecutionAction. [UnusedImports]

If you want to run this check locally use ./gradlew precommit

@colings86
Copy link
Contributor

@benwtrent Can you please add labels to this PR, including version, area and change type labels

@colings86 colings86 added the :ml Machine learning label Aug 10, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core

@@ -611,7 +607,8 @@ public ClusterState execute(ClusterState currentState) {

@Override
public void onFailure(String source, Exception e) {
listener.onFailure(e);
logger.error(source + "Failed to clear finished_time due to [" + e.getMessage() + "]", e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We tend to start the messages with the job_id. So, this could become:

logger.error("[" + jobId + "] Failed to clear finished_time; source [" + source + "]", e);

@@ -593,10 +590,9 @@ public void onTimeout(TimeValue timeout) {
}

private void clearJobFinishedTime(String jobId, ActionListener<OpenJobAction.Response> listener) {
clusterService.submitStateUpdateTask("clearing_job_finish_time [" + jobId + "]", new ClusterStateUpdateTask() {
clusterService.submitStateUpdateTask("clearing-job-finish-time [" + jobId + "]", new ClusterStateUpdateTask() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for insisting on that but so that task names are consistent (should make it easier to search through them) could you change to "clearing-finish-time-for-job-" + jobId?

Copy link
Contributor

@dimitris-athanasiou dimitris-athanasiou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@benwtrent benwtrent merged commit b08416b into elastic:master Aug 10, 2018
benwtrent added a commit that referenced this pull request Aug 10, 2018
* Clear Job#finished_time when it is opened (#32605)

* not returning failure when Job#finished_time is not reset

* Changing error log string and source string
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Aug 13, 2018
…listeners

* elastic/master: (58 commits)
  [ML] Partition-wise maximum scores (elastic#32748)
  [DOCS] XContentBuilder#bytes method removed, using BytesReference.bytes(docBuilder) (elastic#32771)
  HLRC: migration get assistance API (elastic#32744)
  Add a task to run forbiddenapis using cli (elastic#32076)
  [Kerberos] Add debug log statement for exceptions (elastic#32663)
  Make x-pack core pull transport-nio (elastic#32757)
  Painless: Clean Up Whitelist Names (elastic#32791)
  Cat apis: Fix index creation time to use strict date format (elastic#32510)
  Clear Job#finished_time when it is opened (elastic#32605) (elastic#32755)
  Test: Only sniff host metadata for node_selectors (elastic#32750)
  Update scripted metric docs to use `state` variable (elastic#32695)
  Painless: Clean up PainlessCast (elastic#32754)
  [TEST] Certificate NONE not allowed in FIPS JVM (elastic#32753)
  [ML] Refactor ProcessCtrl into Autodetect and Normalizer builders (elastic#32720)
  Access build tools resources (elastic#32201)
  Tests: Disable rolling upgrade tests with system key on fips JVM (elastic#32775)
  HLRC: Ban LoggingDeprecationHandler (elastic#32756)
  Fix test reproducability in AbstractBuilderTestCase setup (elastic#32403)
  Only require java<version>_home env var if needed
  Tests: Muted ScriptDocValuesDatesTests.testJodaTimeBwc
  ...
@benwtrent benwtrent deleted the bug/clear-finished_time-on-job-open branch November 26, 2018 16:40
@jimczi jimczi added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants