Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Push monitor status parameter overrides retries #3208

Closed
2 tasks done
dallyger opened this issue May 30, 2023 · 3 comments · Fixed by #3814
Closed
2 tasks done

Push monitor status parameter overrides retries #3208

dallyger opened this issue May 30, 2023 · 3 comments · Fixed by #3814
Labels
bug Something isn't working

Comments

@dallyger
Copy link

⚠️ Please verify that this bug has NOT been raised before.

  • I checked and didn't find similar issue

🛡️ Security Policy

Description

Hi,

thank you for this project. Helps a lot.

We have a couple scripts, where we push the result to push monitors.
If they fail, we push with status=down and attach the error message via the msg= parameter.
This works beautifully.

However, this will ignore the "Retry" option and immediately mark the monitor as "Down", causing notifications to go off. But most of the time, this is a temporary issue, which is gone in a minute or two.
This makes those notifications less impactful, as users ignore them thinking "Oh it's most likely just a temporary issue" or worse, silencing them.

It's not clear if this is a bug or intended behavior (we explicitly say state=down).
But even if this is a bug, someone will probably rely on this behavior.

I think the best action would be to add a third state (up, down and pending) for opting into the retry functionality in push monitors (and update docs in the UI).

Maybe someone has a scenario like this and therefore relies on the current behavior:

  • Custom script monitoring something and sends a heartbeat
  • Retry set to prevent frequent "No heartbeat in time window" notifications
  • Send heartbeat with state=down to get notified immediately

👟 Reproduction steps

  • Create Uptime-Kuma instance
docker run --rm -it --name uk -P --env=NODE_ENV=development louislam/uptime-kuma:1
  • Visit https://localhost:49153 (Port may differ)
  • Create User
  • Create Push Monitor with "Retries" set to positive number like 5
  • Send heartbeat with status=down
curl 'http://localhost:49153/api/push/dOS7AMWHLg?status=down&msg=OK&ping='

Reproduction environment (required fields represent the production server):
Uptime-Kuma: 1.21.3
OS: Pop!_OS 22.04 LTS (x86_64)
Docker: 20.10.21, build 20.10.21-0ubuntu1~22.04.3

👀 Expected behavior

Monitor switching state to "Pending", as the retry option is not surpassed.

😓 Actual Behavior

Monitor switching state to "Down", ignoring the retry option.
Switching from "Down" to "Pending" if no heartbeats are sent after.

🐻 Uptime-Kuma Version

1.21.3

💻 Operating System and Arch

Linux 5.10.0-19-amd64 #1 SMP Debian 5.10.149-2 (2022-10-21) x86_64 GNU/Linux

🌐 Browser

Firefox Developer Edition 114.0b9

🐋 Docker Version

20.10.5+dfsg1, build 55c4c88

🟩 NodeJS Version

v16.20.0 (from docker image)

📝 Relevant log output

2023-05-30T13:00:56Z [AUTH] INFO: Successfully logged in user uk. IP=172.17.0.1
2023-05-30T13:00:56Z [SETTINGS] DEBUG: Get Setting: initServerTimezone: null
2023-05-30T13:00:56Z [SERVER] DEBUG: emit initServerTimezone
2023-05-30T13:00:56Z [GENERALSOCKETHANDLER] DEBUG: Timezone: Europe/Berlin
2023-05-30T15:00:56+02:00 [SETTINGS] DEBUG: Get Setting (cache): primaryBaseURL: null
2023-05-30T15:00:56+02:00 [SETTINGS] DEBUG: Get Setting: serverTimezone: Europe/Berlin
2023-05-30T15:01:12+02:00 [SETTINGS] DEBUG: Get Setting (cache): trustProxy: null
2023-05-30T15:01:12+02:00 [ENTRY] DEBUG: Request Domain: 127.0.0.1
2023-05-30T15:01:13+02:00 [SETTINGS] DEBUG: Cache Cleaner is just started.
2023-05-30T15:01:24+02:00 [MANAGE] INFO: Resume Monitor: 1 User ID: 1
2023-05-30T15:01:24+02:00 [MONITOR] INFO: Added Monitor: undefined User ID: 1
2023-05-30T15:01:34+02:00 [ROUTER] DEBUG: /api/push/ called at 2023-05-30 15:01:34.090
2023-05-30T15:01:34+02:00 [ROUTER] DEBUG: PreviousStatus: 0
2023-05-30T15:01:34+02:00 [ROUTER] DEBUG: Current Status: 0
2023-05-30T15:01:34+02:00 [UPTIMECACHELIST] DEBUG: clearCache: 1
2023-05-30T15:01:34+02:00 [UPTIMECACHELIST] DEBUG: addUptime: 1 24
2023-05-30T15:01:34+02:00 [UPTIMECACHELIST] DEBUG: addUptime: 1 720
2023-05-30T15:01:40+02:00 [ROUTER] DEBUG: /api/push/ called at 2023-05-30 15:01:40.029
2023-05-30T15:01:40+02:00 [ROUTER] DEBUG: PreviousStatus: 0
2023-05-30T15:01:40+02:00 [ROUTER] DEBUG: Current Status: 1
2023-05-30T15:01:40+02:00 [UPTIMECACHELIST] DEBUG: clearCache: 1
2023-05-30T15:01:40+02:00 [UPTIMECACHELIST] DEBUG: addUptime: 1 24
2023-05-30T15:01:40+02:00 [UPTIMECACHELIST] DEBUG: addUptime: 1 720
2023-05-30T15:01:41+02:00 [ROUTER] DEBUG: /api/push/ called at 2023-05-30 15:01:41.683
2023-05-30T15:01:41+02:00 [ROUTER] DEBUG: PreviousStatus: 1
2023-05-30T15:01:41+02:00 [ROUTER] DEBUG: Current Status: 0
2023-05-30T15:01:41+02:00 [UPTIMECACHELIST] DEBUG: clearCache: 1
2023-05-30T15:01:41+02:00 [UPTIMECACHELIST] DEBUG: addUptime: 1 24
2023-05-30T15:01:41+02:00 [UPTIMECACHELIST] DEBUG: addUptime: 1 720
2023-05-30T15:01:42+02:00 [ROUTER] DEBUG: /api/push/ called at 2023-05-30 15:01:42.574
2023-05-30T15:01:42+02:00 [ROUTER] DEBUG: PreviousStatus: 0
2023-05-30T15:01:42+02:00 [ROUTER] DEBUG: Current Status: 0
2023-05-30T15:01:42+02:00 [UPTIMECACHELIST] DEBUG: clearCache: 1
2023-05-30T15:01:42+02:00 [UPTIMECACHELIST] DEBUG: addUptime: 1 24
2023-05-30T15:01:42+02:00 [UPTIMECACHELIST] DEBUG: addUptime: 1 720
2023-05-30T15:01:43+02:00 [ROUTER] DEBUG: /api/push/ called at 2023-05-30 15:01:43.737
2023-05-30T15:01:43+02:00 [ROUTER] DEBUG: PreviousStatus: 0
2023-05-30T15:01:43+02:00 [ROUTER] DEBUG: Current Status: 1
2023-05-30T15:01:43+02:00 [UPTIMECACHELIST] DEBUG: clearCache: 1
2023-05-30T15:01:43+02:00 [UPTIMECACHELIST] DEBUG: addUptime: 1 24
2023-05-30T15:01:43+02:00 [UPTIMECACHELIST] DEBUG: addUptime: 1 720
2023-05-30T15:02:12+02:00 [SETTINGS] DEBUG: Get Setting (cache): trustProxy: null
2023-05-30T15:02:12+02:00 [ENTRY] DEBUG: Request Domain: 127.0.0.1
2023-05-30T15:02:13+02:00 [SETTINGS] DEBUG: Cache Cleaner is just started.
2023-05-30T15:02:13+02:00 [SETTINGS] DEBUG: Cache Cleaner deleted: title
2023-05-30T15:02:13+02:00 [SETTINGS] DEBUG: Cache Cleaner deleted: dnsCache
2023-05-30T15:02:13+02:00 [SETTINGS] DEBUG: Cache Cleaner deleted: entryPage
2023-05-30T15:02:13+02:00 [SETTINGS] DEBUG: Cache Cleaner deleted: cloudflaredTunnelToken
2023-05-30T15:02:13+02:00 [SETTINGS] DEBUG: Cache Cleaner deleted: checkUpdate
2023-05-30T15:02:13+02:00 [SETTINGS] DEBUG: Cache Cleaner deleted: checkBeta
2023-05-30T15:02:13+02:00 [SETTINGS] DEBUG: Cache Cleaner deleted: trustProxy
2023-05-30T15:02:13+02:00 [SETTINGS] DEBUG: Cache Cleaner deleted: primaryBaseURL
2023-05-30T15:02:13+02:00 [SETTINGS] DEBUG: Cache Cleaner deleted: disableAuth
2023-05-30T15:02:13+02:00 [SETTINGS] DEBUG: Cache Cleaner deleted: serverTimezone
@dallyger dallyger added the bug Something isn't working label May 30, 2023
@chakflying
Copy link
Collaborator

Related: #1863

Thinking about how to fix this: Since the push route handling and the heartbeat loop are separate, there is currently no way to track the number of retries in the push route. What is the best way to get this information? We can recursively query the database for the last beat until we find the last successful beat, but it will build up a stack and get slow if the number of retries is high.

I guess we could query the last successful beat, then based on the returned ID use a second query to get all the beats since then.

There is also a problem that this way retries would persist across restart, which would be different to other monitors. We could mitigate this by saving the server restart time and adding that to the query, but yeah another point of complexity.

@chakflying
Copy link
Collaborator

chakflying commented Jun 1, 2023

I like your idea of being able to set pending for the push route, but there is an issue as the heartbeat loop would be unaware of this, so you can get more than your set number of "retries" before the timer set it as down.

@ebricca
Copy link

ebricca commented Aug 28, 2023

even a push calling "status=pending" would be ok in my case ..
it would allow me to represent services which are degraded / or busy with a msg notice ..

(the calling script itself could be made to tell after x times to set it to down ..)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants