Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shrinking databse/Blocking database opreations Give False Downtime #2470

Open
2 tasks done
JacksonChen666 opened this issue Dec 25, 2022 · 20 comments
Open
2 tasks done
Labels
area:core issues describing changes to the core of uptime kuma bug Something isn't working

Comments

@JacksonChen666
Copy link

⚠️ Please verify that this bug has NOT been raised before.

  • I checked and didn't find similar issue

🛡️ Security Policy

Description

One of my monitors said it was down because: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?

I was deleting a monitor, probably with a lot of data (i had history set for 365 previously. insane, i know), so deleting took a long time, which then caused the monitor being down.

The issue can also be caused by a manually triggered shrink database operation.

Related

👟 Reproduction steps

  1. Have some monitors (any should be fine, just fast enough like at most 20 seconds)
  2. Have a large database (say >512MB)
  3. Shrink database (Settings > Monitor History > Shrink database)
  4. Experience behavior

👀 Expected behavior

The monitors will continue as "up" and saving the correct data later (if needed).

😓 Actual Behavior

The monitors are considered "down" because a blocking database operation is happening.

🐻 Uptime-Kuma Version

1.19.0

💻 Operating System and Arch

macOS 13.1

🌐 Browser

LibreWolf 108.0.1-1

🐋 Docker Version

No response

🟩 NodeJS Version

v16.18.1

📝 Relevant log output

Dec 25 12:36:43 laptop-server npm[1618271]: 2022-12-25T11:36:43Z [RATE-LIMIT] INFO: remaining requests: 20
Dec 25 12:37:06 laptop-server npm[1618271]: 2022-12-25T11:37:06Z [MONITOR] WARN: Monitor #6 'mastodon/mcrblgng (micro.)': Pending: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?
| Max retries: 12 | Retry: 1 | Retry Interval: 60 seconds | Type: keyword
Dec 25 12:37:10 laptop-server npm[1618271]: 2022-12-25T11:37:10Z [MONITOR] WARN: Monitor #7 'peertube (videos.)': Pending: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call? | Max re
tries: 12 | Retry: 1 | Retry Interval: 30 seconds | Type: keyword
Dec 25 12:37:10 laptop-server npm[1618271]: 2022-12-25T11:37:10Z [MONITOR] WARN: Monitor #40 'conduit (conduit.hazmat.)': Pending: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?
| Max retries: 15 | Retry: 1 | Retry Interval: 60 seconds | Type: keyword
Dec 25 12:37:11 laptop-server npm[1618271]: 2022-12-25T11:37:11Z [MONITOR] WARN: Monitor #36 'unbound DNS server (telemetry)': Pending: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) c
all? | Max retries: 2 | Retry: 1 | Retry Interval: 30 seconds | Type: keyword
Dec 25 12:37:12 laptop-server npm[1618271]: Trace: KnexTimeoutError: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call?
Dec 25 12:37:12 laptop-server npm[1618271]:     at Client_SQLite3.acquireConnection (/home/uptime/uptime-kuma/node_modules/knex/lib/client.js:305:26)
Dec 25 12:37:12 laptop-server npm[1618271]:     at async Runner.ensureConnection (/home/uptime/uptime-kuma/node_modules/knex/lib/execution/runner.js:259:28)
Dec 25 12:37:12 laptop-server npm[1618271]:     at async Runner.run (/home/uptime/uptime-kuma/node_modules/knex/lib/execution/runner.js:30:19)
Dec 25 12:37:12 laptop-server npm[1618271]:     at async RedBeanNode.findOne (/home/uptime/uptime-kuma/node_modules/redbean-node/dist/redbean-node.js:515:19)
Dec 25 12:37:12 laptop-server npm[1618271]:     at async Function.handleStatusPageResponse (/home/uptime/uptime-kuma/server/model/status_page.js:23:26)
Dec 25 12:37:12 laptop-server npm[1618271]:     at async /home/uptime/uptime-kuma/server/routers/status-page-router.js:16:5 {
Dec 25 12:37:12 laptop-server npm[1618271]:   sql: undefined,
Dec 25 12:37:12 laptop-server npm[1618271]:   bindings: undefined
Dec 25 12:37:12 laptop-server npm[1618271]: }
Dec 25 12:37:12 laptop-server npm[1618271]:     at process.<anonymous> (/home/uptime/uptime-kuma/server/server.js:1779:13)
Dec 25 12:37:12 laptop-server npm[1618271]:     at process.emit (node:events:513:28)
Dec 25 12:37:12 laptop-server npm[1618271]:     at emit (node:internal/process/promises:140:20)
Dec 25 12:37:12 laptop-server npm[1618271]:     at processPromiseRejections (node:internal/process/promises:274:27)
Dec 25 12:37:12 laptop-server npm[1618271]:     at processTicksAndRejections (node:internal/process/task_queues:97:32)
Dec 25 12:37:13 laptop-server npm[1618271]: If you keep encountering errors, please report to https://github.com/louislam/uptime-kuma/issues
Dec 25 12:37:13 laptop-server npm[1618271]: 2022-12-25T11:37:13Z [MONITOR] WARN: Monitor #44 'prometheus (prometheus.)': Pending: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call? |
 Max retries: 12 | Retry: 1 | Retry Interval: 60 seconds | Type: keyword
Dec 25 12:37:14 laptop-server npm[1618271]: 2022-12-25T11:37:14Z [AUTH] INFO: Successfully logged in user jackson. IP=176.241.52.131
Dec 25 12:37:15 laptop-server npm[1618271]: 2022-12-25T11:37:15Z [RATE-LIMIT] INFO: remaining requests: 20
Dec 25 12:37:19 laptop-server npm[1618271]: 2022-12-25T11:37:19Z [MONITOR] WARN: Monitor #34 'ntfy localhost': Failing: Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call? | Interval:
 20 seconds | Type: http | Down Count: 0 | Resend Interval: 15
Dec 25 12:37:43 laptop-server npm[1618271]: 2022-12-25T11:37:43Z [RATE-LIMIT] INFO: remaining requests: 20
@JacksonChen666 JacksonChen666 added the bug Something isn't working label Dec 25, 2022
@louislam
Copy link
Owner

louislam commented Jan 1, 2023

Due to the limitation of SQLite, it may be a unsolvable bug.

Maybe I will change to MySQL in 2.0.0.

@JacksonChen666
Copy link
Author

@louislam if you're considering supporting other databases, i would personally suggest considering postgresql and mysql and the pros/cons

i don't have mysql on my server, because nothing uses it. pretty much everything i run (peertube, mastodon, synapse (matrix homeserver)) are using postgres.

anyways, here's some already existing issues/comments:

@NeckBeardPrince

This comment was marked as spam.

@kosssi
Copy link
Contributor

kosssi commented Jan 19, 2023

I'm in the same configuration as @JacksonChen666.
I really hope that Uptime Kuma 2.0 will not be limited to MySQL and will be compatible with Postgres.
But given the ideas for version 2 on your dashboard, I'm afraid that's not the case.
Did you make a decision on it @louislam and if so could you explain why?

PS: Thank you very much for the creation and maintenance of this tool, I deploy it as part of the CHATONS in France.

@manuelkamp
Copy link

Is there an update on that issue? Or any workaround, for me after some months of flawless usage, it is now unusable, because for every action a lot of monitors shows wrong thins error as down.

@CommanderStorm
Copy link
Collaborator

@JacksonChen666 This will be resolved in the 1.23-release given that #2800 and #3380 were merged.

@harryzcy
Copy link
Contributor

harryzcy commented Sep 15, 2023

MariaDB support is merged already. And I'm submitting Postgres support in #3748

@CommanderStorm CommanderStorm added the area:core issues describing changes to the core of uptime kuma label Dec 7, 2023
@cypa

This comment was marked as off-topic.

@CommanderStorm

This comment has been minimized.

@CommanderStorm

This comment has been minimized.

@Saibamen
Copy link
Contributor

Saibamen commented Feb 19, 2024

I think this is already resolved in the 1.23-release given that #2800 and #3380 were merged

@CommanderStorm: No. Today I read your comment here and click on Shrink database button. After few minutes I was able to see frontend page without backend-connection bug, list of monitors was still loading 3-4 minutes (I only have 53 Monitors) and after this, our Slack was spammed by 🔴 DOWN messages for all Monitors. My Customer almost got heart attack because of this...

DB size before shrink: 860 MB
DB size after shrink: 838.4 MB
History data retention: 30 days (I changed it today from 90)
Kuma Version: 1.23.11

@CommanderStorm
Copy link
Collaborator

Yea, indeed shrinking is not the same as deleting monitors. Should have read more carefully, sorry about that

@Saibamen
Copy link
Contributor

Saibamen commented Feb 20, 2024

FYI: Today, after clear-old-data job (reminder: I was changed history retention from 90 to 30 days), my DB size changed from ~838.4 MB to 780.6 MB

And I wonder if this text is correct:

Trigger database VACUUM for SQLite. If your database is created after 1.10.0, AUTO_VACUUM is already enabled and this action is not needed.

But I checked changes in 1.10.0 (here), and await R.exec("PRAGMA auto_vacuum = FULL"); was added in PR #794 for connect() function here, so I think recreating database from scratch is not needed, because AUTO_VACUUM will be added right after starting backend (or after every Kuma version update).

Please correct me if I'm wrong

@chakflying
Copy link
Collaborator

The description is not entirely accurate, I don't remember exactly but there is some slight difference between a manually triggered VACUUM and AUTO_VACUUM. But I also don't know how to write a better description so it is how it is.

@Saibamen
Copy link
Contributor

I've just checked database -> PRAGMA auto_vacuum; returns 2, so this is set to INCREMENTAL, no need to recreate database file as you may think after reading this description under Shrink Database button.

@Saibamen
Copy link
Contributor

But I also don't know how to write a better description so it is how it is.

Maybe just this for now:

Trigger database VACUUM for SQLite. AUTO_VACUUM is already enabled and this action is not needed in most cases.


OK, in documentation (https://www.sqlite.org/pragma.html#pragma_auto_vacuum) we have this:

Auto-vacuum does not defragment the database nor repack individual database pages the way that the VACUUM command does. In fact, because it moves pages around within the file, auto-vacuum can actually make fragmentation worse.

IMO, we can write:

Trigger VACUUM for SQLite to defragment and repack database. Remember, AUTO_VACUUM is already enabled, but this does not defragment the database nor repack individual database pages.

or just:

Trigger database VACUUM for SQLite. AUTO_VACUUM is already enabled but this does not defragment the database nor repack individual database pages the way that the VACUUM command does.

@chakflying
Copy link
Collaborator

The last one sounds pretty good to me.

@Saibamen
Copy link
Contributor

OK, I will create PR to change en.js for this - this night or tomorrow

@Saibamen
Copy link
Contributor

Done in #4508

@k-matti
Copy link

k-matti commented Dec 26, 2024

Knex: Timeout acquiring a connection. The pool is probably full. Are you missing a .transacting(trx) call? still occurs for me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:core issues describing changes to the core of uptime kuma bug Something isn't working
Projects
None yet
Development

No branches or pull requests