[FEATURE] Add support in the SDK to retrieve IndexingPressure information from OpenSearch #655

joshpalis · 2023-04-06T20:43:34Z

Is your feature request related to a problem?

The ADResultBulkTransportAction handles bulk indexing requests to OpenSearch to index Anomaly Results. This is used for multi-entity detectors.

This particular transport action requires an object of type IndexingPressure to be injected via guice which tracks the incoming index requests per shard/node in the cluster and provides memory accounting. Anomaly Detection uses this indexing pressure here to calculate an indexing pressure limit. This calculation is then used to determine whether to continue indexing Anomaly Results or to queue them for later.

What solution would you like?

The SDK should provide a mechanism to request the state of the IndexingPressure instantiated in OpenSearch and enable extensions to retrieve this information from the ExtensionRunner to perform these claculations. This workflow should follow a similar design to the SDK's sendClusterStateRequest.

Here is a high level overview of what the workflow should look like :

The SDK's extensionRunner should define a method called sendIndexingPressureRequest(Transportservice) that instantiates a IndexingPressureResponseHandler and uses the transport service to send a request to the ExtensionsManager.
The ExtensionsManager should include the IndexingPressureService as a class field and provide a method to set this field after the IndexingPressureService is instantiated here in Node.java
A request handler should be registered to handle an indexing pressure request, which receives the request and provides the following information from the IndexingPressureService's ShardIndexingPressure object,

getCurrentCombinedCoordinatingAndPrimaryBytes()

getCurrentCoordinatingBytes()

getCurrentPrimaryBytes()

getCurrentReplicaBytes()

the response should be an object with the data mentioned above

Do you have any additional context?

Within Node.java, the IndexingPressureService is instantiated here and is then bound to guice here. Internally this includes an object of type ShardIndexingPressure here that extends the IndexingPressure class.

The text was updated successfully, but these errors were encountered:

dbwiddis · 2023-04-12T22:54:29Z

This is available in the Nodes API:

GET /_nodes/<node_id>/stats/<metric>/<index_metric>

Should be a quick add of the appropriate client call in SDK client similar to Field Mappings.

dbwiddis · 2023-04-12T23:35:46Z

See example

      "indexing_pressure" : {
        "memory" : {
          "current" : {
            "combined_coordinating_and_primary_in_bytes" : 0,
            "coordinating_in_bytes" : 0,
            "primary_in_bytes" : 0,
            "replica_in_bytes" : 0,
            "all_in_bytes" : 0
          },
          "total" : {
            "combined_coordinating_and_primary_in_bytes" : 40256,
            "coordinating_in_bytes" : 40256,
            "primary_in_bytes" : 45016,
            "replica_in_bytes" : 0,
            "all_in_bytes" : 40256,
            "coordinating_rejections" : 0,
            "primary_rejections" : 0,
            "replica_rejections" : 0
          },
          "limit_in_bytes" : 53687091
        }
      },

dbwiddis · 2023-04-13T20:14:12Z

Have tried and failed a few approaches, recording here for posterity:

RestHighLevelClient doesn't have a built in for this.
OpenSearchClient (Java Client) has a built-in for Node Stats but the response object doesn't include indexing_pressure.

This leaves two options:

Use performRequestAsync with the endpoint GET /_nodes/stats/indexing_pressure/ and manually parse the JSON return (above, looks pretty easy to parse)
Do like we do with cluster state and pull the IndexingPressure object from OpenSearch.
- Big Problem: I think it's node-local, and a transport request will just return whichever node handles the request, so getting it for all nodes (like we can with REST) isn't ideal.

So assuming I go with option 1, I'm thinking to create an SDK-side Transport Action that does the request.

However, that leads to the follow-on question, "what are we going to do with this information"? In the AD application, the indexing pressure is considered on the node which is currently executing the request. In the AD Extension we're processing data on a remote node and sending it back via REST calls, which we don't know which node will handle it (Hello Hash Ring?).

SO I'm thinking this issue needs to be paused pending Hash Ring implementation. Thoughts, anyone?

dbwiddis · 2023-04-13T21:20:00Z

Useful blog post, I think this is the way forward: https://opensearch.org/blog/shard-indexing-backpressure-in-opensearch

Current code is local node based and just measures memory as a signal whether to index.

Instead we should query the REST APIs documented in this and use the unthrottled/soft limit thresholds as our critereon.

dbwiddis · 2023-04-13T22:08:37Z

Putting this issue back into the backlog for now. Future plans:

In the AD extension where indexing pressure is called, just skip the check for now. We can do our initial performance tests without any throttling. It's entirely possible that allowing OpenSearch to distribute its requests on its own may provide better performance than the highly localized per-node throttling that currently exists. If we end up with too many requests, we can collect data on when/where/why to better address the problem.
Eventually we should add this capability but it should be based on the "primary metrics" leading indicators of node health per the blog post.

owaiskazi19 · 2023-04-13T23:10:00Z

Isn't this issue a blocker for multi entity detectors?
cd: @joshpalis

joshpalis · 2023-04-13T23:14:18Z

Yes, Indexing pressure is needed for the ADResultBulkTransportAction. This action is executed by the MultiEntityResultHandler to bulk index AD results for HCAD

dbwiddis · 2023-04-13T23:14:40Z

Not if throttling is not needed.

If it is we can look at the above api to determine what thresholds to use.

Existing code assumes you are on the node doing the work, not injecting via api.

dbwiddis · 2023-04-13T23:53:10Z

Summary: replicating AD code has no meaning on an extension. we may need a different api call to decide whether to skip some bulk indexing but I don't know what thresholds we should use with the new call. We should try performance testing without any limit. If we end up needing to add it then the test will help us know what to use.

joshpalis added enhancement New feature or request untriaged labels Apr 6, 2023

joshpalis mentioned this issue Apr 6, 2023

Migrate AnomalyResultAction/TransportAction to SDK using the SDKClient #626

Closed

23 tasks

joshpalis added good first issue Good for newcomers CCI Part of the College Contributor Initiative more-challenging Good issues for new contributors that present a small challenge and removed untriaged labels Apr 7, 2023

dbwiddis removed good first issue Good for newcomers CCI Part of the College Contributor Initiative more-challenging Good issues for new contributors that present a small challenge labels Apr 12, 2023

dbwiddis self-assigned this Apr 12, 2023

dbwiddis mentioned this issue Apr 13, 2023

[FEATURE] Add support for indexing pressure stats opensearch-project/opensearch-java#453

Open

dbwiddis mentioned this issue Apr 14, 2023

Implement TransportNodesAction equivalent for SDK #683

Open

4 tasks

owaiskazi19 mentioned this issue May 1, 2023

[META] Multi node support for AD Extension #720

Open

4 tasks

owaiskazi19 mentioned this issue Jun 29, 2023

[FEATURE] Pending work items to launch extensions #848

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Add support in the SDK to retrieve IndexingPressure information from OpenSearch #655

[FEATURE] Add support in the SDK to retrieve IndexingPressure information from OpenSearch #655

joshpalis commented Apr 6, 2023 •

edited

Loading

dbwiddis commented Apr 12, 2023

dbwiddis commented Apr 12, 2023

dbwiddis commented Apr 13, 2023

dbwiddis commented Apr 13, 2023

dbwiddis commented Apr 13, 2023

owaiskazi19 commented Apr 13, 2023

joshpalis commented Apr 13, 2023

dbwiddis commented Apr 13, 2023

dbwiddis commented Apr 13, 2023

[FEATURE] Add support in the SDK to retrieve IndexingPressure information from OpenSearch #655

[FEATURE] Add support in the SDK to retrieve IndexingPressure information from OpenSearch #655

Comments

joshpalis commented Apr 6, 2023 • edited Loading

Is your feature request related to a problem?

What solution would you like?

Do you have any additional context?

dbwiddis commented Apr 12, 2023

dbwiddis commented Apr 12, 2023

dbwiddis commented Apr 13, 2023

dbwiddis commented Apr 13, 2023

dbwiddis commented Apr 13, 2023

owaiskazi19 commented Apr 13, 2023

joshpalis commented Apr 13, 2023

dbwiddis commented Apr 13, 2023

dbwiddis commented Apr 13, 2023

joshpalis commented Apr 6, 2023 •

edited

Loading