Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timing-related fixes #2029

Merged
merged 5 commits into from
Oct 31, 2019
Merged

Timing-related fixes #2029

merged 5 commits into from
Oct 31, 2019

Conversation

ssalinas
Copy link
Member

@ssalinas ssalinas commented Oct 29, 2019

Few that will be in this PR

  • When skip healthchecks expires, any tasks that were initially marked healthy due to that should remain as healthy. Accomplishing this with a dummy healthcheck result, since pulling in request history to check at the time of startup would pull a dep on mysql on the critical scheduling path
  • Fix handling of baragon request already in queue with different params. This should not result in zero upstreams when rolling back a deploy (odd failover race condition)
  • Fix race condition where pending deploy gets stuck even though it already has a result in zk (odd failover mode)

if ((method != LoadBalancerMethod.CHECK_STATE && method != LoadBalancerMethod.PRE_ENQUEUE) &&
result.state == BaragonRequestState.FAILED
&& result.message.orElse("").contains("is already enqueued with different parameters")) {
LOG.info("Baragon request {} already in the queue, will fetch current state instead", loadBalancerRequestId.getId());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bit of a nit, but is there any cleaner way to check for this state other than matching the against the message string? Or if we do have to do a string match, can we put the string into a constant somewhere?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the longer term way would be to make baragon return a different status code. Right now this is the only thing that differentiates it from other non-specific failures. Can definitely put it in a constant for the time being

@baconmania
Copy link
Contributor

🚢

@ssalinas ssalinas added this to the 1.1.0 milestone Oct 31, 2019
@ssalinas ssalinas merged commit 91462c9 into master Oct 31, 2019
@ssalinas ssalinas deleted the more_race_conditions branch October 31, 2019 14:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants