Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Massive load on agents table since 2.7 #4030

Closed
3 tasks done
fernandrone opened this issue Aug 13, 2024 · 0 comments · Fixed by #4031
Closed
3 tasks done

Massive load on agents table since 2.7 #4030

fernandrone opened this issue Aug 13, 2024 · 0 comments · Fixed by #4031
Labels
bug Something isn't working
Milestone

Comments

@fernandrone
Copy link
Contributor

Component

server, other

Describe the bug

Since updating to 2.7 we've observed a massive spike on the database load. Most of it seems to come from updates to the agents table. More specifically, the last_work field of several agents is updated multiple times per second.

This is greatly impacting database performance (which seems to feed into #3999)

I haven't done a full profiling but if I had to guess this seems to come from this change https://github.com/woodpecker-ci/woodpecker/pull/3844/files#diff-0f4ca4733649eb6707a0dd7e0ca0083cdc587b5cdced5b3ac051fc32cc9353cbR361-R368. If I understand correctly at every time a log line is persisted to the database, the respective agent is updated. The frequency of these updates seem quite high.

For context we were running Woodpecker 2.3 with a PostgreSQL database on Amazon RDS db.t4g.small before just fine (2 vCPU and 2 GiB RAM). After 2.7 we had to update to a db.t4g.xlarge (4 vCPU and 8 GiB RAM) and it's still struggling on the CPU.

We're running on Kubernetes with 10 agents and up to 10 workflows for agent.

Steps to reproduce

  1. Install Woodpecker 2.7
  2. Run multiple Workflows
  3. Observe multiple updates on the agents table

Expected behavior

From #3844 this seems to be the intended behavior. However it's clearly coming at a cost.

Maybe we could only update the agents every X minutes instead of every log line/every second (not sure how it's implemented right now, need to look deeper). Possibly the same could be said for log_entries update, their frequency might be just a tad too high. Of course surely there are risks here (i.e. losing logs).

We will test an internal version with the last work update on every log disabled and see how that goes.

System Info

Woodpecker 2.7
Kubernetes installation

Additional context

Here's our Amazon Performance Insights from the database. On Friday 9th evening we updated Woodpecker from 2.3 to 2.7 but we barely had any pipelines ran then nor during the weekend. Then on Monday you can see how load is many times higher than the previous week.

image

Most of it is from INSERTs coming from log_entries and agents however a more detailed analysis shows that in query volume, log_entries updates have not increased much after the update. However agents have increased tremendously.

On Monday we increased the database from a t4g.small to t4g.xlarge which helped but did not solve the issue.

Validations

  • Read the docs.
  • Check that there isn't already an issue that reports the same bug to avoid creating a duplicate.
  • Checked that the bug isn't fixed in the next version already [https://woodpecker-ci.org/faq#which-version-of-woodpecker-should-i-use]
@fernandrone fernandrone added the bug Something isn't working label Aug 13, 2024
@qwerty287 qwerty287 added this to the 3.0.0 milestone Aug 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants