Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add design doc for Control Plane and Data Plane Separation #344

Merged
merged 3 commits into from
Jan 17, 2023

Conversation

kate-osborn
Copy link
Contributor

@kate-osborn kate-osborn commented Dec 22, 2022

Design doc for issue #292

@kate-osborn kate-osborn requested a review from a team as a code owner December 22, 2022 21:25
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Dec 22, 2022
Copy link
Contributor

@pleshakov pleshakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this makes sense to me. I left a number of questions and comments

design/control-data-plane-separation/design.md Outdated Show resolved Hide resolved
design/control-data-plane-separation/design.md Outdated Show resolved Hide resolved
design/control-data-plane-separation/design.md Outdated Show resolved Hide resolved
Copy link

@brianehlert brianehlert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments:

  • control plane per gatewayClass/namespace - wouldn't this be burdensome to both manage and in resource consumption / overhead? I don't think we win anything if we introduce a control plane that is as heavy as or even has the potential than the data plane it is managing. This is in regard to overall resource consumption of control vs data planes. Not really a fan of having multiple control planes in this way (this is me envisioning a control plane deployment with replicas and scaling that per gatewayclass / namespace - that seems entirely excessive).
  • control plane resiliency - can agent reconnect if a control plane pod is recycled?
  • Why the requirement for agent as an image? "produce a container image as an artifact"
  • How does the agent at the data plane discover a control plane? Wouldn't any form of auto-discovery make having many control planes a rather complex discover scenario?
  • There is a use case for being able to push nginx access logs to an external collector (not sending to std_out and mixing them with operational logs. Or do we wait for OpenTel to solve that?
  • The customer providing mTLS certs for agent is burdensome. There should be a way to generate a key with enough to not share anything common.
  • why does agent need a service account token? In addition to an mTLS cert? Isn't this redundant auth for the sake of being redundant. Maybe the cert is only TLS and not mTLS.
  • "agent container" is it implied that it will run as a side car? Is there a reason why? Seems like additional complexity. Not to mention that we should not introduce sidecars just because we can, it creates additional complexity of managing two images (data plane image, agent image) instead of just one.
  • What is the resource burden of the agent on the data plane?

Copy link
Contributor

@pleshakov pleshakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@kate-osborn
Copy link
Contributor Author

@brianehlert

control plane per gatewayClass/namespace - wouldn't this be burdensome to both manage and in resource consumption / overhead? I don't think we win anything if we introduce a control plane that is as heavy as or even has the potential than the data plane it is managing. This is in regard to overall resource consumption of control vs data planes. Not really a fan of having multiple control planes in this way (this is me envisioning a control plane deployment with replicas and scaling that per gatewayclass / namespace - that seems entirely excessive).

GatewayClass is a cluster-scoped resource, so I wouldn't suggest one control plane per namespace. We made a conscious decision when we started working on NKG only to support a single GatewayClass resource. Given this, it follows that we would need one control plane per GatewayClass. If there's a strong use case for supporting multiple GatewayClasses we can revisit this.

control plane resiliency - can agent reconnect if a control plane pod is recycled?

Yes. If it's connected to a Pod and that Pod restarts or is terminated, it will attempt to reconnect using the control plane's service name.

Why the requirement for agent as an image? "produce a container image as an artifact"

Because otherwise, we will have to maintain a dockerfile for the agent and produce an additional container every release.

How does the agent at the data plane discover a control plane? Wouldn't any form of auto-discovery make having many control planes a rather complex discover scenario?

The agent reads the control plane address from its config or command line arg on startup. Our installation manifests will specify the dns name of the control plane service either in the config or with a cli arg.

There is a use case for being able to push nginx access logs to an external collector (not sending to std_out and mixing them with operational logs. Or do we wait for OpenTel to solve that?

My understanding is that this will be implemented with an nginx module. Shouldn't pertain to the agent work.

The customer providing mTLS certs for agent is burdensome. There should be a way to generate a key with enough to not share anything common.

Not sure what you mean by "There should be a way to generate a key with enough to not share anything common." We can provide a K8s Job that generates self-signed certificates for testing and development.

why does agent need a service account token? In addition to an mTLS cert? Isn't this redundant auth for the sake of being redundant. Maybe the cert is only TLS and not mTLS.

I guess that could be seen as redundant. If we use the api token to verify the identity of the agent then we could probably use TLS instead of mTLS.

"agent container" is it implied that it will run as a side car? Is there a reason why? Seems like additional complexity. Not to mention that we should not introduce sidecars just because we can, it creates additional complexity of managing two images (data plane image, agent image) instead of just one.

I used agent and data plane interchangeably in this doc. There will only be one container that is running both the agent and nginx.

What is the resource burden of the agent on the data plane?

This is still unknown. We will have to run some benchmarks once we have it implemented. Presumably, if we drop all the unneeded features in agent, it should be lightweight.

@kate-osborn kate-osborn merged commit 69fcc8a into main Jan 17, 2023
@kate-osborn kate-osborn deleted the data-plane-separation-design branch January 17, 2023 18:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants