Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor blocking/migrating allocations #3007

Merged
merged 12 commits into from
Aug 15, 2017
Merged

Conversation

schmichael
Copy link
Member

@schmichael schmichael commented Aug 11, 2017

Two major components:

  1. Rip all blocked/migrating logic out of client/client.go
  2. Stick as much as possible into a new single purpose struct alloc_watcher.go

AllocRunner now handles the blocking/migrating logic by calling into the new alloc_watcher code. This allows the Client's add/remove alloc code to be much simpler (and less locking!).

(Speaking of locking, the new locking in alloc runner around the blocked and migrating bools is really verbose and annoying, but I think we can remove it in the future when we offer a better mechanism to expose alloc states.)

interface has 3 implementations:

1. local for blocking and moving data locally
2. remote for blocking and moving data from another node
3. noop for allocs that don't need to block
Fixup some TODOs and formatting left from new prevAllocWatcher code.
Add test for that case, add comments, remove debug logging
@schmichael schmichael changed the title [WIP] Refactor blocking/migrating allocations Refactor blocking/migrating allocations Aug 12, 2017
}

// Migrate from previous local alloc dir to destination alloc dir.
func (p *localPrevAlloc) Migrate(ctx context.Context, dest *allocdir.AllocDir) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function returns an error, but below only nil is returned. Should errors be returned as well as logged?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question. I think I'm going to change this to return the error from allocdir.Move but always call allocdir.Destroy since it's a cleanup method we always want to call.

Since the caller logs errors from this method and continues nothing will change functionally but it will make this local Migrate method work like the remote Migrate method.


tg := alloc.Job.LookupTaskGroup(alloc.TaskGroup)

if prevAR != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might make sense to have two different constructors with names indicating the difference in what kind of AllocWatcher they create (or add comments explaining this).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this conditional is ugly, it is intentionally put inside this unified constructor so that the same logic doesn't have to be duplicated in client.go. There's only 2 places that call into this new func, but I'd rather just let them ignore these conditions and leave it up to this file to handle it all.

Definitely up for debate...

// terminated (and therefore won't send updates to the listener)
prevStatus terminated

// prevWaitCh is closed when the previous alloc is GC'd which is a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GC'd -> garbage collected

@@ -80,7 +80,14 @@ type AllocRunner struct {
vaultClient vaultclient.VaultClient
consulClient ConsulServiceAPI

otherAllocDir *allocdir.AllocDir
prevAlloc prevAllocWatcher
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment

@@ -188,6 +191,11 @@ func (d *AllocDir) Snapshot(w io.Writer) error {

// Move other alloc directory's shared path and local dir to this alloc dir.
func (d *AllocDir) Move(other *AllocDir, tasks []*structs.Task) error {
if !d.built {
// Enfornce the invariant that Build is called before Move
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling

@@ -297,16 +297,14 @@ func (a *AllocGarbageCollector) MakeRoomFor(allocations []*structs.Allocation) e
}

// MarkForCollection starts tracking an allocation for Garbage Collection
func (a *AllocGarbageCollector) MarkForCollection(ar *AllocRunner) error {
if ar == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any insight on why you removed this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is called in a very limited number of places, all of which handle nil AllocRunners.


return resp.Node, nil
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😍

}

r.waitingLock.Lock()
r.blocked = false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment in watcher. Would like to not have this

type prevAllocWatcher interface {
// Wait for previous alloc to terminate
Wait(context.Context) error

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add IsWaiting and IsMigrating methods and the alloc runner can just call into the watcher so we can skip storing that state there and remove the locks

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved. Still pretty obnoxiously ugly code that would be nice to roll into more holistic (sub-)state handling someday, but at least it's tucked away out of sight in the hopefully rarely viewed alloc_watcher.go file.

}
}

if done() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just return ctx.Err()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call

// migrate a remote alloc dir to local node
func (p *remotePrevAlloc) migrateAllocDir(ctx context.Context, nodeAddr string) (*allocdir.AllocDir, error) {
// Create the previous alloc dir
prevAllocDir := allocdir.NewAllocDir(p.logger, filepath.Join(p.config.AllocDir, p.prevAllocID))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What cleans this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method does on errors otherwise the caller does. Clarified in the docstring.

Since alloc runner just logs these errors and continues there's no
reason not to return it.
@schmichael schmichael merged commit c459b6a into master Aug 15, 2017
@schmichael schmichael deleted the b-blocking-refactor-2 branch August 15, 2017 17:57
@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 26, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants