-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uncouple persistent task state and status #31031
Conversation
Pinging @elastic/es-distributed |
Requiring a first review from Boaz, then I'd like to have Martijn and Zachary to have a look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. I left some minor suggestions.
private final P params; | ||
@Nullable | ||
private final Status status; | ||
private final @Nullable P params; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
params are not nullable any more - please make sure this doesn't revive it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
@@ -403,8 +397,9 @@ public XContentBuilder toXContent(XContentBuilder builder, ToXContent.Params xPa | |||
if (params != null) { | |||
builder.field("params", params, xParams); | |||
} | |||
if (status != null) { | |||
builder.field("status", status, xParams); | |||
if (state != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we never use xcontent for cross node communication. I think we can use state and be tolerant of reading status when parsing? (until 7.0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think it should work. Rollup/ML jobs can be snapshotted as part of the global cluster state, but if it still parses status
as a deprecated field it should be OK.
@@ -215,8 +215,8 @@ public void onFailure(Exception e) { | |||
} | |||
} | |||
|
|||
|
|||
public static class Status implements Task.Status { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need this one? How bad would it be to remove it and let the task implementation add one if they need to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the default status implementation for allocated persistent task and it just shows the internal state (started/completed/pending cancel) of the persistent task.
I think that we don't really need it and I'd prefer to see it removed. But this change was already big, and removing this class means to take care of some backward compatibility (this status is used as part of the TaskInfo object which is serialized over the network when using the Tasks API and also potentially persisted in TaskResult).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left one more bwc comment
@@ -94,8 +92,8 @@ public PersistentTasksCustomMetaData(long lastAllocationId, Map<String, Persiste | |||
ObjectParser<TaskDescriptionBuilder<PersistentTaskParams>, String> parser = new ObjectParser<>("named"); | |||
parser.declareObject(TaskDescriptionBuilder::setParams, | |||
(p, c) -> p.namedObject(PersistentTaskParams.class, c, null), new ParseField("params")); | |||
parser.declareObject(TaskDescriptionBuilder::setStatus, | |||
(p, c) -> p.namedObject(Status.class, c, null), new ParseField("status")); | |||
parser.declareObject(TaskDescriptionBuilder::setState, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need the status too for bwc? can you see what it takes to have a BWC test which shuts down a cluster with a persistent task and then does a full restart and check it started again with the right state? This can be a follow up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's already here with new ParseField("state", "status")
? It produces a deprecation warning but allows to parses status
as it was state
. Since this is a named object, status
will be parsed a as PersistentTaskState
object and this is not an issue.
can you see what it takes to have a BWC test which shuts down a cluster with a persistent task and then does a full restart and check it started again with the right state?
Sure. In addition to this, I'd like to add a basic bwc unit test for this status/state renaming and the associated deprecation warning in 6.x.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's already here with new ParseField("state", "status")?
Heh. I don't know how I missed it. Sorry for the noise.
Sure. In addition to this, I'd like to add a basic bwc unit test for this status/state renaming and the associated deprecation warning in 6.x.
I think we can do this as a follow up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -94,8 +92,8 @@ public PersistentTasksCustomMetaData(long lastAllocationId, Map<String, Persiste | |||
ObjectParser<TaskDescriptionBuilder<PersistentTaskParams>, String> parser = new ObjectParser<>("named"); | |||
parser.declareObject(TaskDescriptionBuilder::setParams, | |||
(p, c) -> p.namedObject(PersistentTaskParams.class, c, null), new ParseField("params")); | |||
parser.declareObject(TaskDescriptionBuilder::setStatus, | |||
(p, c) -> p.namedObject(Status.class, c, null), new ParseField("status")); | |||
parser.declareObject(TaskDescriptionBuilder::setState, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's already here with new ParseField("state", "status")?
Heh. I don't know how I missed it. Sorry for the noise.
Sure. In addition to this, I'd like to add a basic bwc unit test for this status/state renaming and the associated deprecation warning in 6.x.
I think we can do this as a follow up?
5906cb7
to
67154c3
Compare
67154c3
to
5b7bc08
Compare
This pull request removes the relationship between the state of persistent task (as stored in the cluster state) and the status of the task (as reported by the Task APIs and used in various places) that have been confusing for some time (#29608). In order to do that, a new PersistentTaskState interface is added. This interface represents the persisted state of a persistent task. The methods used to update the state of persistent tasks are renamed: updatePersistentStatus() becomes updatePersistentTaskState() and now takes a PersistentTaskState as a parameter. The Task.Status type as been changed to PersistentTaskState in all places were it make sense (in persistent task customs in cluster state and all other methods that deal with the state of an allocated persistent task).
* master: 992c788 Uncouple persistent task state and status (#31031) 8c6ee7d Describe how to add a plugin in Dockerfile (#31340) 1c5cec0 Remove http status code maps (#31350) 87a676e Do not set vm.max_map_count when unnecessary (#31285) e5b7137 TEST: getCapturedRequestsAndClear should be atomic (#31312) 0324103 Painless: Fix bug for static method calls on interfaces (#31348) d6d0727 QA: Fix resolution of default distribution (#31351) fcf1e41 Extract common http logic to server (#31311) 6dd81ea Build: Fix the license in the pom zip and tar (#31336) 8f886cd Treat ack timeout more like a publish timeout (#31303) 9b29327 [ML] Add description to ML filters (#31330) f7a0caf SQL: Fix build on Java 10 375d09c [TEST] Fix RemoteClusterClientTests#testEnsureWeReconnect 4877cec More detailed tracing when writing metadata (#31319) bbfe1ec [Tests] Mutualize fixtures code in BaseHttpFixture (#31210)
* 6.x: 6d711fa (origin/6.x, 6.x) Uncouple persistent task state and status (#31031) f0f16b7 [TEST] Make SSL restrictions update atomic (#31050) 652193f Describe how to add a plugin in Dockerfile (#31340) bb568ab TEST: getCapturedRequestsAndClear should be atomic (#31312) 6f94914 Do not set vm.max_map_count when unnecessary (#31285) c12f3c7 Painless: Fix bug for static method calls on interfaces (#31348) 21128e2 QA: Fix resolution of default distribution (#31351) df17a83 Build: Fix the license in the pom zip and tar (#31336) 3e76b15 Treat ack timeout more like a publish timeout (#31303)
Thanks @bleskes. I'll open a follow up PR with the tests you asked for. |
This pull request adds a full cluster restart test for a Rollup job. The test creates and starts a Rollup job on the cluster and checks that the job already exists and is correctly started on the upgraded cluster. This test allows to test that the persistent task state is correctly parsed from the cluster state after the upgrade, as the status field has been renamed to state in #31031. The test undercovers a ClassCastException that can be thrown in the RollupIndexer when the timestamp as a very low value that fits into an integer. When it's the case, the value is parsed back as an Integer instead of Long object and (long) position.get(rollupFieldName) fails.
This pull request adds a full cluster restart test for a Rollup job. The test creates and starts a Rollup job on the cluster and checks that the job already exists and is correctly started on the upgraded cluster. This test allows to test that the persistent task state is correctly parsed from the cluster state after the upgrade, as the status field has been renamed to state in #31031. The test undercovers a ClassCastException that can be thrown in the RollupIndexer when the timestamp as a very low value that fits into an integer. When it's the case, the value is parsed back as an Integer instead of Long object and (long) position.get(rollupFieldName) fails.
This pull request adds a full cluster restart test for a Rollup job. The test creates and starts a Rollup job on the cluster and checks that the job already exists and is correctly started on the upgraded cluster. This test allows to test that the persistent task state is correctly parsed from the cluster state after the upgrade, as the status field has been renamed to state in #31031. The test undercovers a ClassCastException that can be thrown in the RollupIndexer when the timestamp as a very low value that fits into an integer. When it's the case, the value is parsed back as an Integer instead of Long object and (long) position.get(rollupFieldName) fails.
This pull request removes the relationship between the state of persistent task (as stored in the cluster state) and the status of the task (as reported by the Task APIs and used in various places) that have been confusing for some time (#29608).
In order to do that, a new
PersistentTaskState
interface is added. This interface represents the persisted state of a persistent task. The methods used to update the state of persistent tasks are renamed:updatePersistentStatus()
becomesupdatePersistentTaskState()
and now takes aPersistentTaskState
as a parameter. TheTask.Status
type as been changed toPersistentTaskState
in all places were it make sense (in persistent task customs in cluster state and all other methods that deal with the state of an allocated persistent task).Since the task status and task state are both named writeables, the change should be backward compatible and does not require extra serialization/deserialization logic. A node could deserialize the state of a persistent task as a named writable
xpack/ml/job
of classTask.Status
while a more recent node will deserialize the same object as a named writablexpack/ml/job
of classPersistentTaskState
.This applies to all existing persistent tasks: RollUpJobTask, DatafeedTask, JobTask and upcoming ShardFollowNodeTask.
Machine Learning
Luckily the persistent tasks in ML never exposed their custom states (
DatafeedState
,JobTaskStatus
) in the Task APIs. It means that the status reported in the Task API through the TaskInfo object has always been (and is still) the defaultPersistentTasksNodeService.Status
. No change is needed, andDatafeedState
andJobTaskStatus
have been changed to implementPersistentTaskState
instead ofTask.Status
.JobTaskStatus
is also renamed toJobTaskState
.RollUp
RollUpJobTask uses the same RollupJobStatus for its status and its persisted state. Since it is about to be released in 6.3.0 it's too late to change this. Therefore
RollupJobStatus
has been changed to implement both Task.Status and PersistentTaskState and declared as such in named writebale registry, making it possible to be serialized/deserialized under both interfaces. It means that it does not require anything for bwc for now, but we maybe want to change that in the future as the role of RollupJobStatus is still confusing.