Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Observability plugin calling createIndex from onNodeStarted method in 2.x branches #1883

Open
shwetathareja opened this issue Nov 11, 2024 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@shwetathareja
Copy link
Member

What is the bug?

Observability plugin is implementing ClusterPlugin interface

class ObservabilityPlugin : Plugin(), ActionPlugin, ClusterPlugin, SystemIndexPlugin {

and onNodeStarted it calls

which calls createIndex

override fun afterStart() {
// create default index
createIndex()

ClusterPlugin - onNodeStarted method is called from Node Bootstrap (Node.java) and it causing UnhandledException in the main thread

    pluginsService.filterPlugins(ClusterPlugin.class).forEach(plugin -> plugin.onNodeStarted(clusterService.localNode()));

Node is not initialized. Cluster is not formed, why is it trying to createIndex from here?
The main branch doesn't have same code as well? why is there divergence?

OpenSearchTimeoutException[java.util.concurrent.TimeoutException: Timeout waiting for task.]; nested: TimeoutException[Timeout waiting for task.];
Likely root cause: java.util.concurrent.TimeoutException: Timeout waiting for task.
        at org.opensearch.common.util.concurrent.BaseFuture$Sync.get(BaseFuture.java:257)
        at org.opensearch.common.util.concurrent.BaseFuture.get(BaseFuture.java:82)
        at org.opensearch.common.util.concurrent.FutureUtils.get(FutureUtils.java:94)
        at org.opensearch.action.support.AdapterActionFuture.actionGet(AdapterActionFuture.java:79)
        at org.opensearch.action.support.AdapterActionFuture.actionGet(AdapterActionFuture.java:68)
        at org.opensearch.observability.index.ObservabilityIndex.createIndex(ObservabilityIndex.kt:108)
        at org.opensearch.observability.index.ObservabilityIndex.afterStart(ObservabilityIndex.kt:91)
        at org.opensearch.observability.ObservabilityPlugin.onNodeStarted(ObservabilityPlugin.kt:84)
        at org.opensearch.plugins.ClusterPlugin.onNodeStarted(ClusterPlugin.java:111)
        at org.opensearch.node.Node.lambda$start$37(Node.java:1768)
        at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)
        at org.opensearch.node.Node.start(Node.java:1768)
        at org.opensearch.bootstrap.Bootstrap.start(Bootstrap.java:339)
        at org.opensearch.bootstrap.Bootstrap.init(Bootstrap.java:413)
        at org.opensearch.bootstrap.OpenSearch.init(OpenSearch.java:181)
        at org.opensearch.bootstrap.OpenSearch.execute(OpenSearch.java:172)
        at org.opensearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:104)
        at org.opensearch.cli.Command.mainWithoutErrorHandling(Command.java:138)
        at org.opensearch.cli.Command.main(Command.java:101)
        at org.opensearch.bootstrap.OpenSearch.main(OpenSearch.java:138)
        at org.opensearch.bootstrap.OpenSearch.main(OpenSearch.java:104)

How can one reproduce the bug?
If ClusterManager election takes more time, it will result in UnhandledException in main thread.

@mengweieric
Copy link
Collaborator

Involving @YANG-DB to take a look as seems like the change was introduced by him.

@YANG-DB
Copy link
Member

YANG-DB commented Nov 26, 2024

@shwetathareja @mengweieric we need to further investigate on how to solve this
I'll update once we have a better insight ...

@cwperks
Copy link
Member

cwperks commented Nov 26, 2024

@YANG-DB This is how security solves the same problem: https://github.com/opensearch-project/security/blob/main/src/main/java/org/opensearch/security/configuration/ConfigurationRepository.java#L188-L192

The security index is initialized within onNodeStarted, but it waits for the cluster state to be ready

@YANG-DB
Copy link
Member

YANG-DB commented Nov 26, 2024

@YANG-DB This is how security solves the same problem: https://github.com/opensearch-project/security/blob/main/src/main/java/org/opensearch/security/configuration/ConfigurationRepository.java#L188-L192

The security index is initialized within onNodeStarted, but it waits for the cluster state to be ready

Thanks @cwperks I'll look into it soon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants