Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: health-checks for Elasticsearch #13322

Merged
merged 2 commits into from
May 28, 2023
Merged

Conversation

ebuildy
Copy link
Contributor

@ebuildy ebuildy commented Apr 23, 2023

argocd application go to "degraged" state as soon as elasticsearch cluster health go to yellow state. Which is wrong, yellow state should not be considered as a broken state:

"when a new index is created, the cluster will go to yellow state, because replicas are not ready"

This can flood notifications with wrong alerts, because elasticsearch can create / move indices all the time (we got more than 10 wrong alerts per day).

To me, this is a bug on elasticsearch side (I mean cluster health check should wait a little time before changing color), but according discussion with elastic team at https://discuss.elastic.co/t/cluster-health-wrong-yellow-spikes-because-new-index/330124/6, this is normal, and yellow "spike" should not be considered as error.

The current argocd health check implementation is reporting error as soon as elasticsearch cluster health changes to yellow, I edit it to consider yellow state as "progressing" app state.

A better solution will be to add a kind a buffer, in order to give some time before the argocd application degrades.

Relate:

@codecov
Copy link

codecov bot commented Apr 23, 2023

Codecov Report

Patch coverage has no change and project coverage change: -0.01 ⚠️

Comparison is base (edf9916) 49.01% compared to head (d86ae69) 49.01%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #13322      +/-   ##
==========================================
- Coverage   49.01%   49.01%   -0.01%     
==========================================
  Files         247      247              
  Lines       42685    42685              
==========================================
- Hits        20924    20921       -3     
- Misses      19645    19647       +2     
- Partials     2116     2117       +1     

see 2 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

ebuildy added 2 commits April 23, 2023 19:20
Signed-off-by: Thomas Decaux <ebuildy@gmail.com>
Signed-off-by: ebuildy <ebuildy@gmail.com>
Signed-off-by: ebuildy <ebuildy@gmail.com>
@ebuildy ebuildy marked this pull request as ready for review April 24, 2023 00:19
@drpaneas
Copy link
Contributor

I agree, the yellow state in an Elasticsearch cluster should not be considered an error state.

In Elasticsearch, cluster health is represented by three states: green, yellow, and red. A green state means that all primary and replica shards are active and allocated in the cluster. A yellow state means that all primary shards are active and allocated, but some replica shards are either missing or initializing. A red state means that one or more primary shards are missing or failed to allocate, indicating a serious problem.

While a yellow state indicates that there may be some issues with replica shard allocation, it does not necessarily mean that there is a critical error in the cluster. Depending on your specific use case and requirements, a yellow state may be acceptable and normal.

However, if the yellow state persists for an extended period of time or is accompanied by other issues, it may be worth investigating further to ensure the stability and reliability of the Elasticsearch cluster.

That said I am fine with this PR.

/lgtm

Copy link
Contributor

@ashutosh16 ashutosh16 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@crenshaw-dev crenshaw-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @ebuildy!

@crenshaw-dev crenshaw-dev merged commit d01d67b into argoproj:master May 28, 2023
yyzxw pushed a commit to yyzxw/argo-cd that referenced this pull request Aug 9, 2023
* fix: health-checks for Elasticsearch

Signed-off-by: Thomas Decaux <ebuildy@gmail.com>
Signed-off-by: ebuildy <ebuildy@gmail.com>

* fix tests

Signed-off-by: ebuildy <ebuildy@gmail.com>

---------

Signed-off-by: Thomas Decaux <ebuildy@gmail.com>
Signed-off-by: ebuildy <ebuildy@gmail.com>
tesla59 pushed a commit to tesla59/argo-cd that referenced this pull request Dec 16, 2023
* fix: health-checks for Elasticsearch

Signed-off-by: Thomas Decaux <ebuildy@gmail.com>
Signed-off-by: ebuildy <ebuildy@gmail.com>

* fix tests

Signed-off-by: ebuildy <ebuildy@gmail.com>

---------

Signed-off-by: Thomas Decaux <ebuildy@gmail.com>
Signed-off-by: ebuildy <ebuildy@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants