Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persisted shared ports during inplace update #9736

Merged
merged 1 commit into from
Jan 8, 2021

Conversation

drewbailey
Copy link
Contributor

Ensure that allocation AllocatedResources Shared Ports are not dropped during inplace updates.

When performing in place updates, the scheduler creates a shallow copy of the allocation and updates a few fields. It's not immediately clear to me why we do a shallow copy instead of something like allocation.Copy(). The AllocatedSharedResources Ports were not copied over causing issues after the plan was applied.

As a side effect, removing or updating a task group service stanza, resulting in an inplace update would error and prevent a service from being registered or deregistered during an inplace update. This issue was reported in #9360

This error was displayed in the logs like so:

    2021-01-06T12:49:47.127-0500 [ERROR] client.alloc_runner: error running update hooks: alloc_id=13ff640d-7426-4763-1be1-427a3594dec3 error="1 error occurred:
        * update hook "group_services" failed: error getting address for check "service: \"example-service\" check": invalid port "api": port label not found
"

fixes #9360
fixes #9735

@vercel vercel bot temporarily deployed to Preview – nomad-storybook-and-ui January 6, 2021 18:03 Inactive
@vercel vercel bot temporarily deployed to Preview – nomad January 6, 2021 18:03 Inactive
Copy link
Member

@nickethier nickethier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

@vercel vercel bot temporarily deployed to Preview – nomad January 7, 2021 14:58 Inactive
@vercel vercel bot temporarily deployed to Preview – nomad-storybook-and-ui January 7, 2021 14:58 Inactive
DiskMB: int64(update.TaskGroup.EphemeralDisk.SizeMB),
DiskMB: int64(update.TaskGroup.EphemeralDisk.SizeMB),
Ports: update.Alloc.AllocatedResources.Shared.Ports,
Networks: update.Alloc.AllocatedResources.Shared.Networks,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nickethier pining if you happen to know, but noticed networks was dropped too, I'm guessing we want to carry over all fields

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, but I think we'd want a copy of it. I wonder if we should just set Shared on line 730 to update.Alloc.AllocatedResources.Shared.Copy()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, its not clear to me why DiskMB is update.TaskGroup.EphemeralDisk.SizeMB instead of update.Alloc.AllocatedResources.Shared.DiskMB

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, above Ports and Networks are set from the existing alloc because those cannot be updated inplace and is guarded by tasksUpdated.
The same is happening when setting the DiskMB here. So I think we can leave as is but would still add a .Copy() to the Networks.

@vercel vercel bot temporarily deployed to Preview – nomad-storybook-and-ui January 7, 2021 20:17 Inactive
@vercel vercel bot temporarily deployed to Preview – nomad January 7, 2021 20:17 Inactive
AllocatedSharedResources were not being copied over to the new
allocation struct the scheduler makes during inplace updates. This
caused downstream issues after the plan was applied, namely the shared
ports were dropped causing issues with service
registration/deregistration.

test that shared ports are preserved

change log, also carry over shared network

copy networks
@drewbailey drewbailey force-pushed the b/persist-allocated-resources branch from a481ae0 to 1cc88d7 Compare January 8, 2021 13:43
@vercel vercel bot temporarily deployed to Preview – nomad-storybook-and-ui January 8, 2021 13:43 Inactive
@vercel vercel bot temporarily deployed to Preview – nomad January 8, 2021 13:43 Inactive
@drewbailey drewbailey merged commit 9c3ce6b into master Jan 8, 2021
@drewbailey drewbailey deleted the b/persist-allocated-resources branch January 8, 2021 14:00
@tgross tgross added this to the 1.0.2 milestone Jan 11, 2021
backspace pushed a commit that referenced this pull request Jan 22, 2021
AllocatedSharedResources were not being copied over to the new
allocation struct the scheduler makes during inplace updates. This
caused downstream issues after the plan was applied, namely the shared
ports were dropped causing issues with service
registration/deregistration.

test that shared ports are preserved

change log, also carry over shared network

copy networks
@github-actions
Copy link

github-actions bot commented Dec 4, 2022

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 4, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allocated Resources Shared ports dropped on inplace update system job leaks service registration
4 participants