Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Make APIs cancellable for the ML and Transforms plugin #88010

Open
19 of 30 tasks
benwtrent opened this issue Jun 24, 2022 · 11 comments
Open
19 of 30 tasks

[ML] Make APIs cancellable for the ML and Transforms plugin #88010

benwtrent opened this issue Jun 24, 2022 · 11 comments
Labels
:ml/Transform Transform :ml Machine learning Team:ML Meta label for the ML team

Comments

@benwtrent
Copy link
Member

benwtrent commented Jun 24, 2022

Many of the ML APIs should be cancellable as they should not continue executing once the caller disconnects the HTTP connection.

Here is the list:

* Denotes more important APIs to get done as they are heavier requests used by many automatic processes

For ML

For transforms

@benwtrent benwtrent added :ml Machine learning :ml/Transform Transform labels Jun 24, 2022
@elasticmachine elasticmachine added the Team:ML Meta label for the ML team label Jun 24, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

benwtrent added a commit that referenced this issue Jun 28, 2022
Many machine learning stats APIs make multiple searches per call. Making them cancellable allows for those searches to be cancelled if the HTTP connection is closed.

This improves scalability and performance.

Relates #88010
@droberts195 droberts195 added the good first issue low hanging fruit label Jun 7, 2023
@pequolau
Copy link

pequolau commented Aug 9, 2023

Hi, can I work on this part?

  • Estimate model memory

@droberts195
Copy link
Contributor

@pequolau sure, if you could work on the estimate model memory endpoint that would be great.

@DaveCTurner
Copy link
Contributor

Seems like a good plan, but if these things are heavy enough to need to be cancellable then they should also not be running on SAME (i.e. a transport worker, NB #97916). Also it's not enough to just add a RestCancellableNodeClient wrapper, you also have to explicitly check whether the task is cancelled throughout the execution. I looked at a handful of the changes linked from the OP and I didn't see any that did either of these things.

@droberts195
Copy link
Contributor

but if these things are heavy enough to need to be cancellable then they should also not be running on SAME (i.e. a transport worker

Is that true even if the processing on SAME is to immediately call another async API? Many of these APIs will start off by doing a search. Even though the search may take a while, initiating it should be fast. And then the thread that initiated the search should be freed up again.

Also it's not enough to just add a RestCancellableNodeClient wrapper, you also have to explicitly check whether the task is cancelled throughout the execution.

If a task is cancelled does it automatically cancel its child tasks?

The idea behind a lot of these changes was that they boil down to a series of searches. So if a parent task that's cancelled automatically cancels its child tasks, and the search action explicitly checks whether it's cancelled throughout its execution, then this change would tie the dropping of the HTTP connection that called the ML endpoint to the cancellation functionality that already exists in search.

It's possible that we do still have a problem. I guess a cancelled task results in the onFailure handler of listeners being called? But there are a few places in the ML actions where we proceed to the next action in a chain of actions even if the onFailure handler of one action is called. In these cases we probably need to be checking whether the task is cancelled before proceeding to the next (now pointless) action in the chain and just clean up instead.

@DaveCTurner
Copy link
Contributor

Is that true even if the processing on SAME is to immediately call another async API?

It Depends™ - often that's not going to cause noticeable problems, but there can still be nontrivial work involved within the initial phases of the next API called (which also doesn't fork if called via NodeClient, because #97916). I think the reasoning should be the other way round: we should reserve SAME just for those cases where the overhead of forking is quantifiably too much, and by default we should always pick a different pool.

If a task is cancelled does it automatically cancel its child tasks?

Ah good point, yes, as long as those tasks are explicitly registered as its child tasks. Which in many cases they are.

I came here from #98333 which definitely has both of these problems tho.

@pequolau

This comment was marked as off-topic.

@DaveCTurner

This comment was marked as off-topic.

@akagra
Copy link

akagra commented Sep 6, 2023

Hey,
I have started contributing to open source recently and I can start on this part as this problem contains "good first issue" tag. Should I start on this problem?

@akagra
Copy link

akagra commented Sep 6, 2023

Hey @droberts195 shall I work on some of its feature? I am first time contributor and looking for a gate to enter into the open source contribution.

@vimalJD
Copy link

vimalJD commented Sep 25, 2023

Is this project is ongoing? Let me know further (DOCS) to contribute on it as "good first issue".

@droberts195 droberts195 removed the good first issue low hanging fruit label Sep 26, 2023
prwhelan added a commit to prwhelan/elasticsearch that referenced this issue Jul 16, 2024
Validate will initiate a search request.  In the event that the search
request needs to be cancelled, rather than manually stopping the task,
cancelling the Validate task will now propagate the cancel request to
the Search task.

Relate elastic#88010
prwhelan added a commit that referenced this issue Jul 18, 2024
Validate will initiate a search request.  In the event that the search
request needs to be cancelled, rather than manually stopping the task,
cancelling the Validate task will now propagate the cancel request to
the Search task.

Relate #88010

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
ioanatia pushed a commit to ioanatia/elasticsearch that referenced this issue Jul 22, 2024
Validate will initiate a search request.  In the event that the search
request needs to be cancelled, rather than manually stopping the task,
cancelling the Validate task will now propagate the cancel request to
the Search task.

Relate elastic#88010

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
salvatore-campagna pushed a commit to salvatore-campagna/elasticsearch that referenced this issue Jul 23, 2024
Validate will initiate a search request.  In the event that the search
request needs to be cancelled, rather than manually stopping the task,
cancelling the Validate task will now propagate the cancel request to
the Search task.

Relate elastic#88010

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
salvatore-campagna pushed a commit to salvatore-campagna/elasticsearch that referenced this issue Jul 23, 2024
Validate will initiate a search request.  In the event that the search
request needs to be cancelled, rather than manually stopping the task,
cancelling the Validate task will now propagate the cancel request to
the Search task.

Relate elastic#88010

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:ml/Transform Transform :ml Machine learning Team:ML Meta label for the ML team
Projects
None yet
Development

No branches or pull requests

7 participants