[Extensions] Add ClusterStateRequest parameter to cluster state transport request #7066

dbwiddis · 2023-04-10T05:35:21Z

Companion PR on SDK: opensearch-project/opensearch-sdk-java#668

Description

Adds the ability to send a ClusterStateRequest parameter when requesting Cluster State, to limit the values returned (and reduce transport bandwidth). To get the old behavior, use new ClusterStateRequest().all() as the parameter.

Issues Resolved

Fixes SDK opensearch-project/OpenSearch#354

Check List

New functionality includes testing.
- All tests pass
New functionality has been documented.
- New functionality has javadoc added
Commits are signed per the DCO using --signoff
~~Commit changes are listed out in CHANGELOG.md file (See: Changelog)~~

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

github-actions · 2023-04-10T05:59:15Z

Gradle Check (Jenkins) Run Completed with:

RESULT: FAILURE ❌
URL: https://build.ci.opensearch.org/job/gradle-check/13729/
CommitID: 50f0569
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green.
Is the failure a flaky test unrelated to your change?

shwetathareja · 2023-04-10T08:17:31Z

@dbwiddis : Looking for more context on the change here.
Different API requests support filtering of response depending on the request param. ClusterStateRequest was one which you handled explicitly. To take an example, NodeStatsRequest is another such request.

So with extensions, each request has to be explicitly added in the ExtensionsTransportHandler. What is the thought process here?

dbwiddis · 2023-04-10T21:48:20Z

@dbwiddis : Looking for more context on the change here. Different API requests support filtering of response depending on the request param. ClusterStateRequest was one which you handled explicitly. To take an example, NodeStatsRequest is another such request.

So with extensions, each request has to be explicitly added in the ExtensionsTransportHandler. What is the thought process here?

Great question, @shwetathareja, and there may be a more general approach needed eventually for "transport actions" in general, but I'm not sure it's clear yet what that approach is.

In Extensions, our first choice is to use existing clients for these requests. Most API requests are in the specification and automatically generate the appropriate Request/Response classes in the various clients, such as opensearch-java. However, this is not one such request:

It is not listed in the Cluster APIs which are the set of APIs currently handled by the clients.
There is a REST API for it, the RestClusterStateAction but the routes starting with /_cluster/state are not handled by any existing clients.
- Further, it's really really hard to even find documentation for this API on the OpenSearch website. Entering "/_cluster/state" into the search box on the OpenSearch website yields many pages, none of which list that API. I had to resort to a Google Search which doesn't point to any OpenSearch documentation which includes that API. The one OpenSearch page linked doesn't include it; there are multiple third-party sites and one for AOS listing it as a common command but not telling how to use it.
- The clients do provide a performRequest() functionality to send an arbitrary REST request, and I tried to do this. See Why not to get cluster state with ClusterStateRequest using RestClient opensearch-sdk-java#667 for the result of a few hours of my effort prior to submitting this PR. It was possible to recreate the needed /_cluster/state API to send the request. The problem was that the result of that request is just JSON that needed to be parsed back into a ClusterStateResponse object.
  - And this is simply not practical (again see that draft PR for how far I got and where I gave up). There are plenty of toXContent methods converting that response into XContent, but very few fromXContent methods for parsing the Response JSON generated in performRequest() into an actual ClusterStateResponse object. I won't say it's impossible, but I will say that the effort is just not worth it. The correct answer if we want to allow deserialization of the JSON is to implement fromXContent() parsing in the ClusterStateResponse and all its subordinate objects (including any which implement the Custom interface which would mean adding another interface method there requiring such parsing be implemented on all existing Custom implementations... which would be a breaking change)
  - It would be nice if the clients implemented this, which I assume requires adding this API to the Cluster API spec. I don't know why it's not there already. This is probably the right answer. But doesn't unblock our plugin migration and extension development efforts.
So this takes us back to the only option really being using the Transport APIs. The ClusterStateRequest and ClusterStateResponse implement Writeable and it's easy to convert them back and forth into the byte streams needed to send them to a handler. This opens up two current possibilities:
- Directly send them as they are in this PR, and handle them explicitly as individual exceptions to our general rule of using clients, since (a) there's no client implementation and (b) the manual parsing is untenable.
- Send them using the Extension ProxyAction capability (implemented as ExtensionTransportAction for plugins and RemoteExtensionTransportAction for extensions) which is currently designed to serialize requests/responses like this for execution on another extension. It's not too much work to extend that code to execute OpenSearch actions, and this is a possibility. I made a few draft attempts at integrating with that code before submitting this PR, but they started to get overly complex.
I don't think this is going to be the last such request, and I do think there's probably room for a more generic handling of multiple TransportActions since all the involved classes (ActionRequest/ActionResponse) are serializable.
- There's some middle ground between single use cases here, and overly broad "we accept anything you can serialize" like the ExtensionTransportAction setup. I do expect that as we find more use cases we'll end up combining them at some reasonable point. I don't want to design that now until I see how big the problem is going to be.

I hope this explains my thought process. I'm sure @saratvemulapalli probably has some comments on this as he's trying to integrate the transport handling with protobuf.

Signed-off-by: Daniel Widdis <widdis@gmail.com>

github-actions · 2023-04-10T23:34:07Z

Gradle Check (Jenkins) Run Completed with:

RESULT: SUCCESS ✅
URL: https://build.ci.opensearch.org/job/gradle-check/13811/
CommitID: ff4380c

codecov-commenter · 2023-04-10T23:38:06Z

Codecov Report

Merging #7066 (ff4380c) into main (3ba333b) will decrease coverage by 0.15%.
The diff coverage is 53.40%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@             Coverage Diff              @@
##               main    opensearch-project/OpenSearch#7066      +/-   ##
============================================
- Coverage     70.77%   70.63%   -0.15%     
- Complexity    59367    59447      +80     
============================================
  Files          4825     4852      +27     
  Lines        284369   285165     +796     
  Branches      41021    41112      +91     
============================================
+ Hits         201263   201423     +160     
- Misses        66603    67182     +579     
- Partials      16503    16560      +57

Impacted Files	Coverage Δ
...search/client/SearchPipelineRequestConverters.java	`0.00% <0.00%> (ø)`
...eline/common/SearchPipelineCommonModulePlugin.java	`0.00% <0.00%> (ø)`
...ion/admin/cluster/node/info/NodesInfoResponse.java	`3.03% <0.00%> (-0.10%)`	⬇️
...search/action/search/GetSearchPipelineRequest.java	`0.00% <0.00%> (ø)`
...earch/action/search/GetSearchPipelineResponse.java	`0.00% <0.00%> (ø)`
.../org/opensearch/client/support/AbstractClient.java	`31.10% <0.00%> (-0.66%)`	⬇️
...search/cluster/service/ClusterManagerTaskKeys.java	`0.00% <ø> (ø)`
...pensearch/common/settings/FeatureFlagSettings.java	`50.00% <ø> (ø)`
...a/org/opensearch/extensions/ExtensionsManager.java	`46.83% <0.00%> (-0.19%)`	⬇️
...a/org/opensearch/plugins/SearchPipelinePlugin.java	`0.00% <0.00%> (ø)`
... and 39 more

... and 469 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

shwetathareja · 2023-04-11T09:10:56Z

@dbwiddis Thanks for all the details. It helped me understand better.

The correct answer if we want to allow deserialization of the JSON is to implement fromXContent() parsing in the ClusterStateResponse and all its subordinate objects

Yes, if we want to expose the whole of cluster state to users, then implementing it via fromXContent() is the right way forward.

But, I wonder if we want to expose all _cluster/state API constructs to clients. I foresee lot of challenges:

To start with "custom" that you mentioned above. Plugins can add any custom data in cluster state and it would be breaking change if fromXContent is enforced. Also, plugins can choose not to send certain data in API response and mark it private.
Today, these internal data structures are updated just thinking about server side backward compatibility. Now changing anything in those structures could potentially break clients and would result in too much overhead.
If you look at the toXContent method of some of the classes you will notice, not all fields are serialized, only a subset which might not be sufficient to deserialize the object back to current ClusterState data structures. e.g. anywhere Index is serialized in routing information, only its name is serialized not Index UUID. Metadata classes have fromXContent already implemented as it is used to serde to and from disk as well.

I see this API more of helping in debugging and providing it as official Cluster APIs in client could cause lot of maintenance overhead. Until community feels the need with real use cases, we should refrain from exposing it.

I also want to warn that the response of this API could be huge in the order of MBs (for large cluster it could easily be > 100MB depending on their mappings). This could cause significant overhead in terms of cpu, memory and keeping transport threads busy if it is being accessed a lot over the transport across extensions. Today, plugins can access the direct in-memory object from ClusterService as opposed to making transport calls. Also if each extension end up storing the de-serialized cluster state at its end, then overall it could be a significant overhead in terms of memory. I would like to brainstorm/ discuss more on access of cluster state across extensions @dbwiddis / @saratvemulapalli

There's some middle ground between single use cases here, and overly broad "we accept anything you can serialize" like the ExtensionTransportAction setup. I do expect that as we find more use cases we'll end up combining them at some reasonable point. I don't want to design that now until I see how big the problem is going to be.

Yes, i would also like to understand the use cases better here before we design a generic solution here for exposing any TransportAction.

dbwiddis · 2023-04-11T15:19:53Z

Yes, i would also like to understand the use cases better here before we design a generic solution here for exposing any TransportAction.

Thanks for all the great insight, @shwetathareja ... some of that belongs on the bug opensearch-project/documentation-website#3784 I filed as well.

We had previously (see the diff) returned the state() from the ClusterService as you mentioned.

As for the specific use case of this request, we are migrating the Anomaly Detection plugin to an extension and have encountered this code in the existing plugin:

        ClusterStateRequest clusterStateRequest = new ClusterStateRequest()
            .clear()
            .indices(AnomalyDetectionIndices.ALL_AD_RESULTS_INDEX_PATTERN)
            .metadata(true)
            .local(true)
            .indicesOptions(IndicesOptions.strictExpand());

        adminClient.cluster().state(clusterStateRequest, ActionListener.wrap(clusterStateResponse -> {

The ClusterStateResponse from the ClusterService does not allow filtering by indices, and as you indicated, the response can be quite large for large clusters with many mappings.

How would you suggest filtering the ClusterService response to implement this use case?

shwetathareja · 2023-04-12T08:54:51Z

Thanks @dbwiddis for sharing the AnomalyDetection Plugin use case.
So, it looks like it was always making local transport call to the node to fetch the AD indices metadata. With your current change, for AD plugin it will be status quo. It will not add any extra overhead.

How would you suggest filtering the ClusterService response to implement this use case?

For now, for AD we can keep the logic as is.

The bigger question for later is that the plugins which had direct access to ClusterService and were using ClusterState, now would need to resort to transport calls.

dblock · 2023-04-12T20:21:34Z

It looks like we are trying to expose cluster state for an extension to know whether an index exists. This seems like a valid concern. Extensions should be able to create/update/delete/check for existence of indices. It should be a first class API.

Cluster state is an internal construct of a cluster-based system. Cluster state or nodes wouldn't exist in a serverless environment, for example, which tells me that is not an API that should not need to be exposed to extensions.

dbwiddis · 2023-04-12T20:25:21Z

Good point, @dblock ... we should probably just use the Index API for this particular use case. I was trying to migrate existing code (Minimizing the diff helps migration) but this looks like a case where we're digging too deep into the internals and we should use a better (and supported) API for it.

shwetathareja · 2023-04-13T05:01:06Z

@dbwiddis : If you are planning to use TransportAction for get index API (use local=true for that as well), this API doesn't return system indices by default. AD index would be a system index. There is a way around using headers, you should first check (could be tricky).

Also, how is AD using the response as of today. There could be difference in cluster state API response and get index API for an index.

@dblock This is an interesting discussion how system indices should be accessed by extensions. Should it be same as any other customer indices. Ideally no, otherwise customer can mess it up, are we relying on access control alone?
Also is it only indices that extension would access or any other metadata from cluster state?

dbwiddis · 2023-04-13T05:35:14Z

Based on feedback about sending cluster state over transport, I'm closing this PR in favor of targeted API requests for the information we need.

dbwiddis · 2023-04-13T05:41:29Z

Also, how is AD using the response as of today. There could be difference in cluster state API response and get index API.

@shwetathareja there's two uses in AD extension.

The one cited in this thread which this PR was intended to implement was a user-provided index.

The system index use case is used via the cluster service state() method (that this PR was intended to replace!) which probably gives way too much information but apparently does do system indices. I'm sure we can figure out a way to make those calls more efficient. See opensearch-project/opensearch-sdk-java#674

saratvemulapalli · 2023-04-13T16:52:54Z

hope this explains my thought process. I'm sure @saratvemulapalli probably has some comments on this as he's trying to integrate the transport handling with protobuf.

Cluster state is used by AD to find index information. +1 to @dblock an API to get an index should solve the problem.
If it is a system index @peternied has a proposal to expose access to system indices for extensions[1].

I would like to brainstorm/ discuss more on access of cluster state across extensions @dbwiddis / @saratvemulapalli

I am sure there could be other cases where extensions would need to access cluster state, but we haven't seen anything other than AD for now. Until that day comes with all the feedback here, exposing it as an API is not worth it.

[1]opensearch-project/security#2530

dbwiddis requested review from reta, anasalkouz, andrross, Bukhtawar, CEHENKLE, dblock, gbbafna, setiah, kartg, kotwanikunal, mch2, nknize, owaiskazi19, Rishikesh1159, ryanbogan, saratvemulapalli, shwetathareja, dreamer-89, tlfeng, VachaShah and xuezhou25 as code owners April 10, 2023 05:35

dbwiddis mentioned this pull request Apr 10, 2023

Add ClusterStateRequest parameter to cluster state transport request opensearch-project/opensearch-sdk-java#668

Closed

dbwiddis added the skip-changelog label Apr 10, 2023

dbwiddis mentioned this pull request Apr 10, 2023

[FEATURE] Add support in the SDK to retrieve IndexingPressure information from OpenSearch opensearch-project/opensearch-sdk-java#655

Open

dbwiddis added 2 commits April 10, 2023 15:30

Add ClusterStateRequest parameter to cluster state transport request

73b65ac

Signed-off-by: Daniel Widdis <widdis@gmail.com>

Fix tests to account for no-op client

ff4380c

Signed-off-by: Daniel Widdis <widdis@gmail.com>

dbwiddis force-pushed the cluster-state-request branch from 50f0569 to ff4380c Compare April 10, 2023 23:01

dbwiddis mentioned this pull request Apr 14, 2023

[BUG] Cluster State API is undocumented and unavailable in clients opensearch-project/documentation-website#3784

Open

dbwiddis mentioned this pull request Apr 12, 2023

[FEATURE] Replace calls to SDKClusterService state() with more targeted calls opensearch-project/opensearch-sdk-java#674

Open

dbwiddis closed this Apr 13, 2023

dbwiddis mentioned this pull request May 19, 2023

[PROPOSAL] Eliminate all Transport calls from Extensions to OpenSearch opensearch-project/opensearch-sdk-java#767

Open

dbwiddis mentioned this pull request Oct 12, 2023

[Discussion] Support Sycamore as a Python extension opensearch-project/opensearch-sdk-py#62

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Extensions] Add ClusterStateRequest parameter to cluster state transport request #7066

[Extensions] Add ClusterStateRequest parameter to cluster state transport request #7066

dbwiddis commented Apr 10, 2023

github-actions bot commented Apr 10, 2023

shwetathareja commented Apr 10, 2023

dbwiddis commented Apr 10, 2023 •

edited

Loading

github-actions bot commented Apr 10, 2023

codecov-commenter commented Apr 10, 2023

shwetathareja commented Apr 11, 2023

dbwiddis commented Apr 11, 2023 •

edited

Loading

shwetathareja commented Apr 12, 2023

dblock commented Apr 12, 2023

dbwiddis commented Apr 12, 2023

shwetathareja commented Apr 13, 2023 •

edited

Loading

dbwiddis commented Apr 13, 2023

dbwiddis commented Apr 13, 2023

saratvemulapalli commented Apr 13, 2023 •

edited

Loading

[Extensions] Add ClusterStateRequest parameter to cluster state transport request #7066

[Extensions] Add ClusterStateRequest parameter to cluster state transport request #7066

Conversation

dbwiddis commented Apr 10, 2023

Description

Issues Resolved

Check List

github-actions bot commented Apr 10, 2023

Gradle Check (Jenkins) Run Completed with:

shwetathareja commented Apr 10, 2023

dbwiddis commented Apr 10, 2023 • edited Loading

github-actions bot commented Apr 10, 2023

Gradle Check (Jenkins) Run Completed with:

codecov-commenter commented Apr 10, 2023

Codecov Report

shwetathareja commented Apr 11, 2023

dbwiddis commented Apr 11, 2023 • edited Loading

shwetathareja commented Apr 12, 2023

dblock commented Apr 12, 2023

dbwiddis commented Apr 12, 2023

shwetathareja commented Apr 13, 2023 • edited Loading

dbwiddis commented Apr 13, 2023

dbwiddis commented Apr 13, 2023

saratvemulapalli commented Apr 13, 2023 • edited Loading

dbwiddis commented Apr 10, 2023 •

edited

Loading

dbwiddis commented Apr 11, 2023 •

edited

Loading

shwetathareja commented Apr 13, 2023 •

edited

Loading

saratvemulapalli commented Apr 13, 2023 •

edited

Loading