Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle stale jobs more carefully before purging them. #4615

Merged
merged 1 commit into from
Feb 11, 2020

Conversation

jezdez
Copy link
Member

@jezdez jezdez commented Feb 6, 2020

What type of PR is this? (check all applicable)

  • Refactor
  • Feature
  • Bug Fix
  • New Query Runner (Data Source)
  • New Alert Destination
  • Other

Description

In case of a dead worker, the ended_at value could be None, preventing
the purge task from running successfully and leading to the purge never
running successfully.

Related Tickets & Documents

Mobile & Desktop Screenshots/Recordings (if there are UI changes)

@jezdez jezdez requested review from arikfr and rauchy February 6, 2020 10:56
stale_jobs = []
for failed_job in failed_jobs:
# the job may not actually exist anymore in Redis
if not failed_job:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we just compact these?

# the job could have an empty ended_at value in case
# of a worker dying before it can save the ended_at value,
# in which case we also consider them stale
if not failed_job.ended_at:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels more like an or conditional and less like a multi-branch statement to me.

for job in stale_jobs:
job.delete()
stale_jobs = []
for failed_job in failed_jobs:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you share my thoughts on the other couple of comments, this whole block might be better represented by a filter on failed_jobs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow, what do you mean with "a filter on failed_jobs"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just mean that it feels like stale_jobs is just a sub-list of failed_jobs that satisfies a predicate. Something like:

is_stale = lambda job: job.ended_at is None or
                       (datetime.utcnow() - job.ended_at).seconds > settings.JOB_DEFAULT_FAILURE_TTL
stale_jobs = filter(is_stale, compact(failed_jobs))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While you may be right that this is another way to write it, I don't see this as more readable. But it's up to you, feel free to change the patch the way you like it better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with keeping @jezdez's implementation as is, as it leaves room for explaining the different steps.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Don't feel strongly either way, but generally I'm more in the camp of having descriptive variable / function / lambda names instead of comments (i.e. is_stale = worker_died or too_old). They just expire slower than comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants