Massive load on agents table since 2.7 #4030

fernandrone · 2024-08-13T17:49:44Z

Component

server, other

Describe the bug

Since updating to 2.7 we've observed a massive spike on the database load. Most of it seems to come from updates to the agents table. More specifically, the last_work field of several agents is updated multiple times per second.

This is greatly impacting database performance (which seems to feed into #3999)

I haven't done a full profiling but if I had to guess this seems to come from this change https://github.com/woodpecker-ci/woodpecker/pull/3844/files#diff-0f4ca4733649eb6707a0dd7e0ca0083cdc587b5cdced5b3ac051fc32cc9353cbR361-R368. If I understand correctly at every time a log line is persisted to the database, the respective agent is updated. The frequency of these updates seem quite high.

For context we were running Woodpecker 2.3 with a PostgreSQL database on Amazon RDS db.t4g.small before just fine (2 vCPU and 2 GiB RAM). After 2.7 we had to update to a db.t4g.xlarge (4 vCPU and 8 GiB RAM) and it's still struggling on the CPU.

We're running on Kubernetes with 10 agents and up to 10 workflows for agent.

Steps to reproduce

Install Woodpecker 2.7
Run multiple Workflows
Observe multiple updates on the agents table

Expected behavior

From #3844 this seems to be the intended behavior. However it's clearly coming at a cost.

Maybe we could only update the agents every X minutes instead of every log line/every second (not sure how it's implemented right now, need to look deeper). Possibly the same could be said for log_entries update, their frequency might be just a tad too high. Of course surely there are risks here (i.e. losing logs).

We will test an internal version with the last work update on every log disabled and see how that goes.

System Info

Woodpecker 2.7
Kubernetes installation

Additional context

Here's our Amazon Performance Insights from the database. On Friday 9th evening we updated Woodpecker from 2.3 to 2.7 but we barely had any pipelines ran then nor during the weekend. Then on Monday you can see how load is many times higher than the previous week.

Most of it is from INSERTs coming from log_entries and agents however a more detailed analysis shows that in query volume, log_entries updates have not increased much after the update. However agents have increased tremendously.

On Monday we increased the database from a t4g.small to t4g.xlarge which helped but did not solve the issue.

Validations

Read the docs.
Check that there isn't already an issue that reports the same bug to avoid creating a duplicate.
Checked that the bug isn't fixed in the next version already [https://woodpecker-ci.org/faq#which-version-of-woodpecker-should-i-use]

The text was updated successfully, but these errors were encountered:

fernandrone added the bug Something isn't working label Aug 13, 2024

qwerty287 added this to the 3.0.0 milestone Aug 13, 2024

anbraten mentioned this issue Aug 14, 2024

Only update agent.LastWork if not done recently #4031

Merged

lafriks closed this as completed in #4031 Aug 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Massive load on agents table since 2.7 #4030

Massive load on agents table since 2.7 #4030

fernandrone commented Aug 13, 2024

Massive load on agents table since 2.7 #4030

Massive load on agents table since 2.7 #4030

Comments

fernandrone commented Aug 13, 2024

Component

Describe the bug

Steps to reproduce

Expected behavior

System Info

Additional context

Validations