-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add invalid captcha and messageChannel sync status health monitoring #10029
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Summary
This PR implements health monitoring for message channel sync status and invalid captcha attempts, with metrics exposed for Prometheus integration.
- Added
HealthCacheService
with time-windowed counters (5-minute aggregation window) for tracking message channel sync jobs and invalid captcha attempts - Added new endpoints
/healthz/message-channel-sync-job-by-status
and/healthz/invalid-captcha
inHealthController
for Prometheus metrics - Integrated health monitoring in
MessageChannelSyncStatusService
to track sync status transitions (ONGOING, ACTIVE, FAILED_UNKNOWN, FAILED_INSUFFICIENT_PERMISSIONS) - Added
EngineHealth
namespace toCacheStorageNamespace
enum with 10-minute TTL for health metrics storage
12 file(s) reviewed, 6 comment(s)
Edit PR Review Bot Settings | Greptile
packages/twenty-server/src/engine/core-modules/health/health.controller.ts
Show resolved
Hide resolved
packages/twenty-server/src/engine/core-modules/health/health.controller.ts
Outdated
Show resolved
Hide resolved
packages/twenty-server/src/engine/core-modules/health/constants/time-window.constant.ts
Outdated
Show resolved
Hide resolved
packages/twenty-server/src/engine/core-modules/health/health-cache.service.ts
Show resolved
Hide resolved
packages/twenty-server/src/engine/core-modules/health/health-cache.service.ts
Outdated
Show resolved
Hide resolved
packages/twenty-server/src/engine/core-modules/health/health-cache.service.ts
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall seems good to me, but I wonder if the counter is the right strategy: do we have a stable enough traffic to know how to juge absolute numbers? (absolute numbers of invalid captcha for instance)
What will prometheus do with these numbers ? Compare them?
packages/twenty-server/src/engine/core-modules/captcha/captcha.guard.ts
Outdated
Show resolved
Hide resolved
packages/twenty-server/src/engine/core-modules/health/constants/time-window.constant.ts
Outdated
Show resolved
Hide resolved
packages/twenty-server/src/engine/core-modules/health/health.controller.ts
Show resolved
Hide resolved
packages/twenty-server/src/engine/core-modules/health/health.controller.ts
Outdated
Show resolved
Hide resolved
packages/twenty-server/src/engine/core-modules/health/health.controller.ts
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great !! looks cool
This is great! We can merge like this but FYI we probably could have done something more efficient if it was more tightly coupled with Redis (we have an abstraction layer but decided to get rid of other implementations so we might as well couple it now...). We don't need to optimize for high throughput scenario but I think if we had a very high volume then the number here would be wrong (you do a read then a write so since Node runs on multiple threads you could have two concurrent read/write operations on the same number). Redis reference for increments: And with a different implementation we could have used sorted sets with each timestamp as an entry In any case, it doesn't really matter because we'd still get the good order of magnitude even if there was a concurrency issue! Great work!!! We should definitely display all those metrics in the Server Admin Panel |
Context :
We want to implement some counters to monitor server health. First counters will track : messageChannel sync status during job execution and invalid captcha.
How :
Counters are stored in cache and grouped by one-minute windows. Controllers are created for each metric, aggregating counter over a five-minutes window.
Endpoints are public and will be queried by Prometheus.
closes twentyhq/core-team-issues#55