Add design doc for Control Plane and Data Plane Separation #344

kate-osborn · 2022-12-22T21:25:32Z

Design doc for issue #292

pleshakov

Overall this makes sense to me. I left a number of questions and comments

design/control-data-plane-separation/design.md

brianehlert

A few comments:

control plane per gatewayClass/namespace - wouldn't this be burdensome to both manage and in resource consumption / overhead? I don't think we win anything if we introduce a control plane that is as heavy as or even has the potential than the data plane it is managing. This is in regard to overall resource consumption of control vs data planes. Not really a fan of having multiple control planes in this way (this is me envisioning a control plane deployment with replicas and scaling that per gatewayclass / namespace - that seems entirely excessive).
control plane resiliency - can agent reconnect if a control plane pod is recycled?
Why the requirement for agent as an image? "produce a container image as an artifact"
How does the agent at the data plane discover a control plane? Wouldn't any form of auto-discovery make having many control planes a rather complex discover scenario?
There is a use case for being able to push nginx access logs to an external collector (not sending to std_out and mixing them with operational logs. Or do we wait for OpenTel to solve that?
The customer providing mTLS certs for agent is burdensome. There should be a way to generate a key with enough to not share anything common.
why does agent need a service account token? In addition to an mTLS cert? Isn't this redundant auth for the sake of being redundant. Maybe the cert is only TLS and not mTLS.
"agent container" is it implied that it will run as a side car? Is there a reason why? Seems like additional complexity. Not to mention that we should not introduce sidecars just because we can, it creates additional complexity of managing two images (data plane image, agent image) instead of just one.
What is the resource burden of the agent on the data plane?

design/control-data-plane-separation/design.md

pleshakov

👍

kate-osborn · 2023-01-10T21:24:28Z

@brianehlert

control plane per gatewayClass/namespace - wouldn't this be burdensome to both manage and in resource consumption / overhead? I don't think we win anything if we introduce a control plane that is as heavy as or even has the potential than the data plane it is managing. This is in regard to overall resource consumption of control vs data planes. Not really a fan of having multiple control planes in this way (this is me envisioning a control plane deployment with replicas and scaling that per gatewayclass / namespace - that seems entirely excessive).

GatewayClass is a cluster-scoped resource, so I wouldn't suggest one control plane per namespace. We made a conscious decision when we started working on NKG only to support a single GatewayClass resource. Given this, it follows that we would need one control plane per GatewayClass. If there's a strong use case for supporting multiple GatewayClasses we can revisit this.

control plane resiliency - can agent reconnect if a control plane pod is recycled?

Yes. If it's connected to a Pod and that Pod restarts or is terminated, it will attempt to reconnect using the control plane's service name.

Why the requirement for agent as an image? "produce a container image as an artifact"

Because otherwise, we will have to maintain a dockerfile for the agent and produce an additional container every release.

How does the agent at the data plane discover a control plane? Wouldn't any form of auto-discovery make having many control planes a rather complex discover scenario?

The agent reads the control plane address from its config or command line arg on startup. Our installation manifests will specify the dns name of the control plane service either in the config or with a cli arg.

There is a use case for being able to push nginx access logs to an external collector (not sending to std_out and mixing them with operational logs. Or do we wait for OpenTel to solve that?

My understanding is that this will be implemented with an nginx module. Shouldn't pertain to the agent work.

The customer providing mTLS certs for agent is burdensome. There should be a way to generate a key with enough to not share anything common.

Not sure what you mean by "There should be a way to generate a key with enough to not share anything common." We can provide a K8s Job that generates self-signed certificates for testing and development.

why does agent need a service account token? In addition to an mTLS cert? Isn't this redundant auth for the sake of being redundant. Maybe the cert is only TLS and not mTLS.

I guess that could be seen as redundant. If we use the api token to verify the identity of the agent then we could probably use TLS instead of mTLS.

"agent container" is it implied that it will run as a side car? Is there a reason why? Seems like additional complexity. Not to mention that we should not introduce sidecars just because we can, it creates additional complexity of managing two images (data plane image, agent image) instead of just one.

I used agent and data plane interchangeably in this doc. There will only be one container that is running both the agent and nginx.

What is the resource burden of the agent on the data plane?

This is still unknown. We will have to run some benchmarks once we have it implemented. Presumably, if we drop all the unneeded features in agent, it should be lightweight.

kate-osborn requested a review from a team as a code owner December 22, 2022 21:25

github-actions bot added the documentation Improvements or additions to documentation label Dec 22, 2022

kate-osborn requested a review from pleshakov December 22, 2022 21:26

pleshakov reviewed Jan 6, 2023

View reviewed changes

kate-osborn requested a review from pleshakov January 9, 2023 22:58

brianehlert reviewed Jan 10, 2023

View reviewed changes

pleshakov reviewed Jan 10, 2023

View reviewed changes

design/control-data-plane-separation/design.md Show resolved Hide resolved

pleshakov approved these changes Jan 10, 2023

View reviewed changes

kate-osborn force-pushed the data-plane-separation-design branch from f700c5a to 08de0ba Compare January 13, 2023 21:32

kate-osborn added 3 commits January 17, 2023 11:05

Add design doc for Control Plane and Data Plane Separation

a7f42e3

Various updates

31c2bcf

Remove mTLS requirements; use token auth only

01e7258

kate-osborn force-pushed the data-plane-separation-design branch from 08de0ba to 01e7258 Compare January 17, 2023 18:05

kate-osborn merged commit 69fcc8a into main Jan 17, 2023

kate-osborn deleted the data-plane-separation-design branch January 17, 2023 18:10

kate-osborn mentioned this pull request Jan 17, 2023

Separate control and data planes #292

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add design doc for Control Plane and Data Plane Separation #344

Add design doc for Control Plane and Data Plane Separation #344

kate-osborn commented Dec 22, 2022 •

edited

Loading

pleshakov left a comment

brianehlert left a comment

pleshakov left a comment

kate-osborn commented Jan 10, 2023

Add design doc for Control Plane and Data Plane Separation #344

Add design doc for Control Plane and Data Plane Separation #344

Conversation

kate-osborn commented Dec 22, 2022 • edited Loading

pleshakov left a comment

Choose a reason for hiding this comment

brianehlert left a comment

Choose a reason for hiding this comment

pleshakov left a comment

Choose a reason for hiding this comment

kate-osborn commented Jan 10, 2023

kate-osborn commented Dec 22, 2022 •

edited

Loading