Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Clickhouse #412

Merged
merged 4 commits into from
May 13, 2024
Merged

Added Clickhouse #412

merged 4 commits into from
May 13, 2024

Conversation

xogoodnow
Copy link
Contributor

Added alerts for Clickhosue

_data/rules.yml Show resolved Hide resolved
dist/rules/clickhouse/embedded-exporter.yml Show resolved Hide resolved
severity: critical
- name: ClickHouse No Available Replicas
description: No available replicas in ClickHouse.
query: "ClickHouseErrorMetric_NO_AVAILABLE_REPLICA == 1"
Copy link
Owner

@samber samber May 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would add a for: 3m as well

a short unavailability (restart?) seems ok for me

same thing in the query below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this could help avoid false positives as well

_data/rules.yml Outdated
severity: critical
- name: ClickHouse High Network Traffic
description: Network traffic is unusually high, may affect cluster performance.
query: "ClickHouseMetrics_NetworkSend > 1000 or ClickHouseMetrics_NetworkReceive > 1000"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this default value, seems very high to me, but my own cluster is not in production right now, so I cannot compare

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number could be deemed reasonable for a mid-to-large sized production level cluster but i stated in a comment to others would adjust this value based on their specifications.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can decrease it so it would be useful for almost all clusters right out of the box and in case of a larger cluster, the value could be raised.

_data/rules.yml Outdated
- name: ClickHouse Authentication Failures
description: Authentication failures detected, indicating potential security issues or misconfiguration.
query: "increase(ClickHouseErrorMetric_AUTHENTICATION_FAILED[5m]) > 0"
severity: critical
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would use "warning" or "info" instead

A critical alert must be checked early, this rule is "just" a security notification.

WDYT ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, critical might sound a little excessive here :)

@samber
Copy link
Owner

samber commented May 12, 2024

Thanks for your first contribution!

I made some comments ;)

Added reasonable time periods for each query to avoid false positives and in some cased give the system a short window to try to solve the issue.
Also changed the severity level of authentication alerts from critical to info which seems more appropriate
Copy link
Contributor Author

@xogoodnow xogoodnow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made excellent points.
Cheers

xogoodnow added 2 commits May 13, 2024 11:36
I made a few adjustments in time periods.
See if they seem reasonable or not
IMHO, replication alerts must be sent right away.
@xogoodnow
Copy link
Contributor Author

I made some changes to the alerts.
Thank you for your time.
Cheers

@xogoodnow xogoodnow requested a review from samber May 13, 2024 08:11
@samber samber merged commit 2547288 into samber:master May 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants