Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stale allocation data leads to incorrect (and even negative) metrics #5637

Merged
merged 2 commits into from
May 7, 2019

Conversation

cgbaker
Copy link
Contributor

@cgbaker cgbaker commented May 1, 2019

client: was not using up-to-date client state in determining which alloc count towards allocated resources

this is the second issue where using AllocRunner.Alloc() instead of AllocRunner.AllocState() meant that the ClientState had the wrong data. @schmichael previously noted how nefarious this is, that we should look for a solution. i haven't done anything to address the underlying issue with this PR; I've just made the simplest fix to correct the metrics calculation. in fact, the call to AllocState() may be a little more expensive than is necessary for this; we only need to know whether the allocation is still resident.

fixes #5570

@cgbaker
Copy link
Contributor Author

cgbaker commented May 1, 2019

per the comment above about using AllocState(), could maybe get the info we need like this instead:

// Compute the ClientStatus
if ar.state.ClientStatus != "" {
// The client status is being forced
a.ClientStatus, a.ClientDescription = ar.state.ClientStatus, ar.state.ClientDescription
} else {
a.ClientStatus, a.ClientDescription = getClientStatus(taskStates)
}

client/client.go Outdated
@@ -2766,6 +2766,8 @@ func (c *Client) allAllocs() map[string]*structs.Allocation {
allocs := make(map[string]*structs.Allocation, len(ars))
for _, ar := range ars {
a := ar.Alloc()
as := ar.AllocState()
a.ClientStatus = as.ClientStatus
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is safe as you're modifying ar.alloc. I think you need to copy it first.

This also gives allAllocs pretty special behavior. It returns a view of each alloc not available anywhere else. At a minimum we should comment this unusual special case, but since there only appears to be a single caller I think we should probably scrap this method entirely and inline the behavior in the caller (or write a one-off function we can unit test).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed with the criticism and the suggestion.

Copy link
Contributor Author

@cgbaker cgbaker May 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@notnoop notnoop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Thanks for digging into this!

client/client_test.go Outdated Show resolved Hide resolved
client/client_test.go Outdated Show resolved Hide resolved
Copy link
Contributor

@langmartin langmartin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, fwiw

CpuShares: 768,
},
Memory: structs.AllocatedMemoryResources{
MemoryMB: 768,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured out right here why you were doubling the resource settings through this test, and I like it. Maybe worth a comment up near the top just to draw attention to it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i actually started to write a comment and it felt like i was ruining the joke ;)

client/client.go Outdated Show resolved Hide resolved
Copy link
Member

@schmichael schmichael left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such unfortunately subtle code. Thanks for cleaning this up!

Co-Authored-By: cgbaker <cgbaker@hashicorp.com>
@cgbaker cgbaker merged commit 4b54e27 into master May 7, 2019
@cgbaker cgbaker deleted the b-5570-negative-metrics branch May 9, 2019 12:24
@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 11, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Negative metrics in 0.9.0 for service tasks.
4 participants