Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a counter to node stat api to track shard going from idle to non-idle #12737

Closed
wants to merge 16 commits into from

Conversation

ruai0511
Copy link
Contributor

@ruai0511 ruai0511 commented Mar 18, 2024

Description

Shards automatically refresh every second, but when a shard doesn't receive search requests for over 30 seconds, it goes into an idle state to improve performance by suspending the implicit index refresh (Mode information on search idle feature here). However, this introduces a problem: After a shard does idle, the next search request must force a refresh to reflect the latest data. This extra step increases latency.

We want to monitor how often idle shards are reactivated. This PR introduces a counter called search_idle_waken_up_total and exports it in the node stat api.

Related Issues

Resolves #12678

Check List

  • New functionality includes testing.
  • All tests pass
  • New functionality has been documented.
  • New functionality has javadoc added
  • Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…o non-idle

Signed-off-by: Ruirui Zhang <mariazrr@amazon.com>
@github-actions github-actions bot added enhancement Enhancement or improvement to existing feature or request Search Search query, autocomplete ...etc labels Mar 18, 2024
@ruai0511 ruai0511 changed the title Add a counter to node stat api to track when a shard goes from idle t… Add a counter to node stat api to track shard going from idle to non-idle Mar 18, 2024
Signed-off-by: Ruirui Zhang <mariazrr@amazon.com>
Copy link
Contributor

github-actions bot commented Mar 18, 2024

Compatibility status:

Checks if related components are compatible with change ea595d2

Incompatible components

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/flow-framework.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/sql.git]

Signed-off-by: Ruirui Zhang <mariazrr@amazon.com>
Signed-off-by: Ruirui Zhang <mariazrr@amazon.com>
Copy link
Contributor

❌ Gradle check result for 975b082: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for b6e9385: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 7912bec:

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 259e193:

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 6544064:

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 0fa5f0f:

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 523ab42:

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

ruai0511 and others added 12 commits March 19, 2024 10:11
Signed-off-by: Ruirui Zhang <mariazrr@amazon.com>
…rch-project#12672)

* Simplify remote directory cleanup after snapshot delete to avoid concurrent cleanup task runs for same shard.

Signed-off-by: Harish Bhakuni <hbhakuni@amazon.com>

* Address PR Comments.

Signed-off-by: Harish Bhakuni <hbhakuni@amazon.com>

---------

Signed-off-by: Harish Bhakuni <hbhakuni@amazon.com>
Co-authored-by: Harish Bhakuni <hbhakuni@amazon.com>
…pensearch-project#12577)

* handle unexpected exception on success callback of translog upload

Signed-off-by: Varun Bansal <bansvaru@amazon.com>
…pensearch-project#8776)

* BaseGatewayShardAllocator changes for Assigning the batch of shards

Signed-off-by: Gaurav Chandani <chngau@amazon.com>
Co-authored-by: Aman Khare <amkhar@amazon.com>
…2642)

* Adds support for asynchronous gauge metric type

Signed-off-by: Gagan Juneja <gjjuneja@amazon.com>

* Adds change log

Signed-off-by: Gagan Juneja <gjjuneja@amazon.com>

* incorporate pr comments

Signed-off-by: Gagan Juneja <gjjuneja@amazon.com>

* fixes build errors

Signed-off-by: Gagan Juneja <gjjuneja@amazon.com>

---------

Signed-off-by: Gagan Juneja <gjjuneja@amazon.com>
Signed-off-by: Gagan Juneja <gagandeepjuneja@gmail.com>
Co-authored-by: Gagan Juneja <gjjuneja@amazon.com>
…ator (opensearch-project#12653)

* [Tiered caching] Supporting removal function on EhcacheDiskCache iterator

Signed-off-by: Sagar Upadhyaya <sagar.upadhyaya.121@gmail.com>

* Minor refactoring in unit test

Signed-off-by: Sagar Upadhyaya <sagar.upadhyaya.121@gmail.com>

---------

Signed-off-by: Sagar Upadhyaya <sagar.upadhyaya.121@gmail.com>
…p for shallow snapshot deletion. (opensearch-project#12701)

Signed-off-by: Harish Bhakuni <hbhakuni@amazon.com>
* Decouple remote state configuration

Signed-off-by: Sooraj Sinha <soosinha@amazon.com>
Copy link
Contributor

❌ Gradle check result for ea595d2: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❕ Gradle check result for 4fb4185: UNSTABLE

  • TEST FAILURES:
      2 org.opensearch.cluster.coordination.AwarenessAttributeDecommissionIT.testConcurrentDecommissionAction

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Search Search query, autocomplete ...etc
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request] Add metrics to track latency issue due to search idle feature