Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(cron): Log long running jobs #45804

Merged
merged 1 commit into from
Jun 12, 2024
Merged

Conversation

ChristophWurst
Copy link
Member

@ChristophWurst ChristophWurst commented Jun 12, 2024

  • Resolves: production instance with database transactions spanning for days. One theory is that it's a stuck cron job.

Summary

If cron jobs take a very long time to complete they will start to run in parallel. That's because jobs are only reserved for 12h. Afterwards we just assume that the jobs failed and start the job again. In faulty situations that can lead to more and more server load.

Here is an example:

Bildschirmfoto vom 2024-06-12 09-40-13
Bildschirmfoto vom 2024-06-12 09-40-30
Bildschirmfoto vom 2024-06-12 09-40-35
Bildschirmfoto vom 2024-06-12 09-40-38
Bildschirmfoto vom 2024-06-12 09-40-43
Bildschirmfoto vom 2024-06-12 09-40-48

It looks like a job executes an expensive query over and over. After 12h the query time doubles, after 24h it triplicates, after 36h it quadruples, etc. It's not clear if that is what is really happening.

I've tried to be reasonable with the log level so we don't spam the logs too much:

  • Job executes longer than 5m -> debug
  • Job executes longer than 20m -> info
  • Job executes longer than 1h20m -> warning
  • Job executes longer than 5h20m -> error
  • Job executes longer than 10h40m -> fatal

TODO

  • Add logging

Checklist

Signed-off-by: Christoph Wurst <christoph@winzerhof-wurst.at>
@ChristophWurst ChristophWurst merged commit 8e3a049 into master Jun 12, 2024
164 checks passed
@ChristophWurst ChristophWurst deleted the fix/cron/log-long-running-jobs branch June 12, 2024 10:07
@ChristophWurst
Copy link
Member Author

/backport to stable29

@ChristophWurst
Copy link
Member Author

/backport to stable28

@ChristophWurst
Copy link
Member Author

/backport to stable27

@ChristophWurst
Copy link
Member Author

/backport to stable27

@AndyScherzinger
Copy link
Member

/backport to stable28

@marinofaggiana
Copy link
Member

@AndyScherzinger can we have a back port for 25 ?

@AndyScherzinger
Copy link
Member

/backport to stable26

@AndyScherzinger
Copy link
Member

/backport to stable25

@AndyScherzinger
Copy link
Member

@marinofaggiana I don't know, let's try and see if the bot can create them 🤞

@AndyScherzinger
Copy link
Member

So PR could be created but is incomplete according to the bot #46706 @marinofaggiana - best you align with @ChristophWurst to have them wrapped up for 25 and 26

@marinofaggiana
Copy link
Member

ok

@blizzz blizzz mentioned this pull request Jul 24, 2024
@ChristophWurst
Copy link
Member Author

Let's try porting from 27, where I have already had to resolve conflicts: #45855 (comment).

@ChristophWurst
Copy link
Member Author

Worked. @AndyScherzinger @marinofaggiana if you need more backports use stable27 as base

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3. to review Waiting for reviews bug
Projects
Development

Successfully merging this pull request may close these issues.

4 participants