Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PROPOSAL] Eliminate all Transport calls from Extensions to OpenSearch #767

Open
dbwiddis opened this issue May 19, 2023 · 5 comments
Open
Labels
enhancement New feature or request RFC Issues requesting major changes

Comments

@dbwiddis
Copy link
Member

dbwiddis commented May 19, 2023

What/Why

What are you proposing?

Extensions presently rely on transport calls to OpenSearch for both initialization and ongoing actions. This requires the extension to maintain an open (and secure) network connection with OpenSearch.

Originating Transport connections from OpenSearch to an extension is a reasonable option (but see #746 for a discussion of making this an optional alternative to REST or other (gRPC?) communication layer). However, an Extension making a transport call back to OpenSearch introduces multiple complexities such as:

  • It needs a host and port to connect to, and these are often/likely secured inside a private network behind a load balancer, bastion host, or other security layer
  • It only connects to one of multiple cluster managers, so the Extension's state (registered actions, registered REST handlers, etc.) are only updated on one cluster manager
  • It requires the extension to identify itself and introduces the complexity of sharing certificates/keys/etc.

All of the initialization calls can be replaced by a response to the initialization call.

Other uses of transport should be replaced by equivalent REST requests.

What users have asked for this feature?

This issue was highlighted during performance testing setup (#652) and has prompted still-unfixed bugs associated with the lack of valid OpenSearch IP address (#761) and collisions with transport service headers (#771). A continuous connection also is incompatible with a serverless design (#746).

What problems are you trying to solve?

  • Additional complexity with security, and a "back door" potential for disrupting the cluster outside a load balancer
  • The need to maintain (and monitor and recover from failure of) a connection between Extensions and OpenSearch
  • Resource exhaustion/resource leaks with a connection port per extension instance
  • The single-point-of-failure with extensions only communicating with a single OpenSearch cluster manager and the need to replicate any changes
  • The need to maintain a state, which blocks development of serverless solutions

What is the developer experience going to be?

SDK developers will need to rely on the REST layer for any communications originated to OpenSearch

OpenSearch will need a new REST endpoint for extensions to communicate with; however the byte streams used can be the same "Writeable" or protobuf bytes, just in the body of an HTTP request.

Extension developers will see no change; all extension communication was already intended to be over REST.

Are there any security considerations?

This should reduce the security considerations of having both REST and Transport, making the communications only via REST.

Are there any breaking changes to the API

We are still pre-release behind a feature flag so it's in line with the continuously changing breaking development.

What is the user experience going to be?

Users will not have to know any of the complex details of where nodes live and how they are reachable. They simply provide a single endpoint, probably a load balancer, and the SDK handles the rest.

Are there breaking changes to the User Experience?

No users have experienced it yet!

Why should it be built? Any reason not to?

It's a good solution to a wide range of problems that would avoid many other patchwork fixes.

What will it take to execute?

Replace existing Transport calls originating from Extensions with alternatives.

The following call occurs during initialization and should be replaced by including the information into the Initialization Request class:

  • sendEnvironmentSettingsRequest()

The following calls occur during initialization and should be replaced by including the information they communicate into the initialization Response class:

  • sendRegisterRestActionsRequest()
  • sendRegisterCustomSettingsRequest()
  • sendRegisterTransportActionsRequest()

The following calls are presently unimplemented, planned for the future:

  • sendExtensionDependencyRequest() could be included as part of initialization request
  • sendClusterSettingsRequest() could be easily replaced by a REST call

The following calls occur as part of calls from the SDKClient and its ecosystem. These implementations would be the hardest part of this proposal.

Any remaining open questions?

This should likely be part of a larger-scale integrated design for serverless which will employ a similar transport/REST/gRPC equivalent in the other direction.

@dbwiddis
Copy link
Member Author

This diagram summarizes the constraints driving this request.

Currently only "Extension B" is a tenable solution. "Extension A" is unreachable from the internet without some extra setup/roles/permissions/etc. on the account, which is not trivial. "Extension C" has no way to initiate transport connections with OpenSearch without some sort of middle layer. That middle layer already exists as the REST API accessible via load balancer. To enable transport we would have to add our own intermediate Bastion Host to forward requests. Possible, but unnecessary.

Screenshot 2023-05-25 at 3 52 50 PM

@stephen-crawford
Copy link

Hi @dbwiddis, after taking a cursory look at your write up and diagram, I like the idea.

From a Security perspective I see a couple points that we will need to consider. The biggest security matter I see arising is verification of the identity of the extension. From what I know of the SDK, my understanding is that the current security implementation is handled by the TLS changes that @cwperks introduced. With these changes, a cluster manager can verify that an extension is who they say they are by verifying the certificates. Swapping to an entirely REST based system will require a different approach. To my knowledge, there is nothing in place that would allow extensions to make REST requests which bypass the generally implemented REST security features of the cluster. That means that if the extension wants to make a REST request, and you have Security enabled, the extension will need to provide the same credentials as would be expected of any other REST request.

Fortunately, I believe that the Service Account concept which was introduced in tho the Security Plugin will help address this requirement. The Service Account can be used to generate auth tokens for the extension which can be used to auth its REST requests. However, the Service Account is only created on extension registration and does not have any of the immediate safeguards the TLS system has. You cannot use the information associated with the Service Account for initialization because it will not yet exist. We will not have any mechanism for verifying an extension is who they say they are on first installation.

@dbwiddis
Copy link
Member Author

dbwiddis commented Jun 2, 2023

The biggest security matter I see arising is verification of the identity of the extension.

Whatever identity certificates are needed could be sent with the original/initiating REST request as part of the body.

Swapping to an entirely REST based system will require a different approach.

We're not going entirely REST; we're just limiting the Transport connections to those originated from OpenSearch.

@cwperks
Copy link
Member

cwperks commented Jun 5, 2023

It needs a host and port to connect to, and these are often/likely secured inside a private network behind a load balancer, bastion host, or other security layer

How does the discovery process currently work when a node joins the cluster? Would that be the same problem as a new node wanting to join a cluster?

It only connects to one of multiple cluster managers, so the Extension's state (registered actions, registered REST handlers, etc.) are only updated on one cluster manager

The security plugin has solved a similar problem where the security configuration needs to be sync'ed on all the nodes. If a new user is created then all nodes need to be aware of the new user.

The way the security plugin solves this is that it stores the configuration in a system index and on any modification to the index it sends a TransportAction to all nodes of the cluster to tell it to reload the security configuration from the index into memory. There is some time needed for propagation, but its how all nodes of the cluster become synchronized to changes in the security index.

@dbwiddis
Copy link
Member Author

dbwiddis commented Jun 5, 2023

How does the discovery process currently work when a node joins the cluster?

The node sends out a unicast ping to the discovery seed host, and says "Hi, can I join your cluster?" See https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/discovery/

Would that be the same problem as a new node wanting to join a cluster?

As long as the node's in the subnet, it can send info to the discovery seed host.

However, for an extension outside the subnet, it cannot send a message to the internal address.

The security plugin has solved a similar problem where the security configuration needs to be sync'ed on all the nodes. If a new user is created then all nodes need to be aware of the new user.

@owaiskazi19 is looking to solve something similar here for the extension API

The way the security plugin solves this is that it stores the configuration in a system index and on any modification to the index it sends a TransportAction to all nodes of the cluster to tell it to reload the security configuration from the index into memory.

Yeah, we've discussed using a system (or any) index for this and that might work. Looking at other options as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request RFC Issues requesting major changes
Projects
None yet
Development

No branches or pull requests

4 participants