Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: stop after client disconnect #7939

Merged
merged 13 commits into from
May 13, 2020

Conversation

langmartin
Copy link
Contributor

@langmartin langmartin commented May 12, 2020

stop after client disconnected

part 2, the scheduling side. See also #7793

  • heartbeat mechanism already waits until heartbeat_grace has expired
    before marking the node as down
  • lost allocs that should be delayed:
    • create a reschedule delay
    • new wait eval to reschedule by handleDelayedReschedules
    • original alloc updated to stop/lost
    • wait eval causes new alloc at the end of the block period
  • job validiation knows you can't yeet system jobs
  • lost allocs with configured migration can't be migrated, so it's
    safe to just treat them as lost

closes #2185

Meta map[string]string
Services []*Service
ShutdownDelay *time.Duration `mapstructure:"shutdown_delay"`
StopAfterClientDisconnect *time.Duration `mapstructure:"stop_after_client_disconnect"`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only new field, the rest are whitespace differences

@langmartin langmartin force-pushed the f-server-stop-after-client-disconnect branch from 3d65429 to 97cf331 Compare May 12, 2020 21:46
@langmartin langmartin marked this pull request as ready for review May 12, 2020 21:49
@langmartin langmartin requested a review from tgross May 12, 2020 21:49
Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM. The tests are great!

If we can get rid of that stringly-typed field I think this would be solid.

return true
}

// WaitClientStop uses the reschedule delay mechanism to block rescheduling until
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forget whether this was in the RFC or not, but either way this is a good catch. We would have had some really wild behavior without this.

Also, like how this API gives us a time.Time rather than ticking the clock over; not having to wait in tests is 👍

@@ -6512,6 +6512,13 @@ func (t *Template) Warnings() error {
return mErr.ErrorOrNil()
}

// AllocState records a single event that changes the state of the whole allocation
type AllocState struct {
Field string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There doesn't seem to be any value in use for Field other than "ClientStatus". I imagine this was intended for future extensibility. But given we'll always control the values, I think it'd be better to have it be an enum-style const so that we have type-checking around this field?

1. a global AllocStates to track status changes with timestamps. We
   need this to track the time at which the alloc became lost
   originally.

2. ShouldClientStop() and WaitClientStop() to actually do the math
This was setup to only update allocs to lost if the DesiredStatus had
already been set by the scheduler. It seems like the intention was to
update the status from any non-terminal state, and not all lost allocs
have been marked stop or evict by now
@langmartin langmartin force-pushed the f-server-stop-after-client-disconnect branch from 7b7c3f4 to a53af87 Compare May 13, 2020 15:25
Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@langmartin langmartin removed this from the 0.11.2 milestone May 13, 2020
@langmartin langmartin merged commit cd6d344 into master May 13, 2020
@langmartin langmartin deleted the f-server-stop-after-client-disconnect branch May 13, 2020 20:39
@github-actions
Copy link

github-actions bot commented Jan 6, 2023

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 6, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Kill Allocations when client is disconnected from servers
3 participants