Avoid Request Locks With Task Lag #2244

WH77 · 2021-11-16T21:39:28Z

This PR prevents request locks from being acquired by lower priority possibly long running components while the request has late pending tasks, to allow higher priority components like the offer scheduler to acquire them. It does not preempt lower priority components, so lag for a request may still build up while they're holding the lock.

List of components using the request lock:

Somewhat concerned about this change introducing difficult bugs, because previously lock acquisition would wait for the lock forever instead of short circuiting. Not sure if it would be better to throw an error instead of logging+returning.

cc - @ssalinas, @pschoenfelder

ssalinas

I'm with you on being worried about these pollers getting stuck for too long. What if each poller also kept track of how long it had been since it last ran, and if too long -> grab high priority lock instead. Too long here could be in the 10m + range for things like persisters, maybe less for some of the others

ssalinas · 2021-11-17T14:20:53Z

SingularityService/src/main/java/com/hubspot/singularity/helpers/TaskLagGuardrail.java

+
+  private void updateLateTasksByRequestId() {
+    long now = System.currentTimeMillis();
+    if (now - lastUpdate > 1000L * configuration.getRequestCacheTtl()) {


Do we need to care about request cache ttl at all here? Shouldn't this be running on the leader only?

This was mostly me not wanting to add another configuration option, and thinking that request cache ttl was an appropriate value for this.

ssalinas · 2021-11-17T14:24:07Z

SingularityService/src/main/java/com/hubspot/singularity/helpers/TaskLagGuardrail.java

+    updateLateTasksByRequestId();
+    return lateTasksByRequestId.containsKey(requestId);


I'm a bit worried about this being called in a super tight loop. I know it's mostly going to no-op on the if statement. But, does it make sense to be updating this on a thread that is trying to grab a lock vs just doing it in the background? Then we can be sure every check is just a map.containsKey and never has to worry about heavier calculations on an active path

My original thinking was that a scheduled thread would be overkill for this, but you're right that this is going to be a hot method and that I should add a background thread specifically for this.

WH77 · 2021-11-19T18:35:08Z

@ssalinas - I opted to allow low priority lock requests to go through if the request lock isn't currently locked to avoid pollers getting stuck due to task lag that is unrelated to request lock usage.

Do you think that I should still force them to become high priority after enough delay, or is this change enough?

pschoenfelder · 2021-11-29T17:03:09Z

Looks good. I think your suggestion about allowing low priority to run when initially unlocked makes sense.

I may have missed some convo on this last week — has anything been discussed about Steve's suggestion to have each poller determine priority based on time since last run? Not necessary, just curious. Maybe a good future improvement?

ssalinas

🚢

William Hou added 4 commits November 16, 2021 14:21

create task lag guardrail for request lock users

bd32562

optional task lag sensitive request lock method

7a68ee5

set explicit low priority request lock usage

e5262eb

explicit guice binding for task lag guardrail

dafed78

ssalinas reviewed Nov 17, 2021

View reviewed changes

William Hou added 3 commits November 17, 2021 11:55

scheduled executor for updating requests with task lag

9e00880

remove unneeded imports

72221b7

allow low priority lock acquisition if request is not locked

50d68d4

William Hou added 2 commits November 29, 2021 15:17

upstreamchecker use low priority request locks

1aa4382

add unit test for low priority request locks

893e924

ssalinas reviewed Dec 1, 2021

View reviewed changes

WH77 merged commit 8e4a548 into master Dec 1, 2021

WH77 deleted the task-lag-guardrail branch December 1, 2021 19:19

ssalinas added this to the 1.5.0 milestone May 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid Request Locks With Task Lag #2244

Avoid Request Locks With Task Lag #2244

WH77 commented Nov 16, 2021 •

edited

Loading

ssalinas left a comment

ssalinas Nov 17, 2021

WH77 Nov 17, 2021

ssalinas Nov 17, 2021

WH77 Nov 17, 2021

WH77 commented Nov 19, 2021

pschoenfelder commented Nov 29, 2021

ssalinas left a comment

		updateLateTasksByRequestId();
		return lateTasksByRequestId.containsKey(requestId);

Avoid Request Locks With Task Lag #2244

Avoid Request Locks With Task Lag #2244

Conversation

WH77 commented Nov 16, 2021 • edited Loading

ssalinas left a comment

Choose a reason for hiding this comment

ssalinas Nov 17, 2021

Choose a reason for hiding this comment

WH77 Nov 17, 2021

Choose a reason for hiding this comment

ssalinas Nov 17, 2021

Choose a reason for hiding this comment

WH77 Nov 17, 2021

Choose a reason for hiding this comment

WH77 commented Nov 19, 2021

pschoenfelder commented Nov 29, 2021

ssalinas left a comment

Choose a reason for hiding this comment

WH77 commented Nov 16, 2021 •

edited

Loading