Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sidecar: Add design callouts and alternatives discussed in the past 2 years #1913

Merged
merged 11 commits into from
Sep 2, 2020

Conversation

rata
Copy link
Member

@rata rata commented Jul 29, 2020

Hi!

This PR adds a "Proposal decisions to discuss" section to discuss some design issues/callouts from the KEP, as discussed on SIG-node meetings. It also adds the "Alternatives" section with the alternatives considered in the past 2+ years this KEP is open.

The motivation, goals, etc. sections are also improved to clarify the KEP scope and motivation. As this PR is quite long, however, I didn't add more sections that will make it longer. Let me know if you think it is worth adding some specific section (as part of this PR).

My idea of the "Proposal decisions to discuss" section was to use it only to discuss and agree alternatives, and completely remove it afterwards (updating the KEP accordingly). We can capture what is relevant from the decision making process in the alternatives section also. This is why, in part, this section is long: I tried to raise all the concerns I could find, to make sure we are all at the same page.

The "Alternatives" section part discussing the other design alternatives was purely done by @Joseph-Irving and @mrbobbytables . The rest of the commits were kindly reviewed by @Joseph-Irving too :)

Please let me know if I can help in any way. Thanks for your time and help so far! :)

Tagging @derekwaynecarr as talked on SIG-node

/cc @sjenning @SergeyKanzhelev

@k8s-ci-robot k8s-ci-robot added do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. labels Jul 29, 2020
@rata rata force-pushed the rata/sidecar-kep branch from 38d3a36 to a0127ad Compare July 29, 2020 20:04
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. label Jul 29, 2020
@rata
Copy link
Member Author

rata commented Jul 29, 2020

Ups, forgot to cc derek to the bot too.

/cc @derekwaynecarr


An example in the open source world about this is the [Istio CNI
plugin][istio-cni-plugin]. This was created as an alternative for the
initContainer hack that the service mesh needs to do. The initContainer will

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The init container is not just a hack to avoid the startup problem, but also a permissions issue. Even if we could guarantee the service mesh starts first, we would still retain the init container, to avoid running as root.

Copy link
Member Author

@rata rata Jul 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. The capabilities for NET_ADMIN and NET_RAW are added to the initContainer only, so the sidecar proxy can run just without them. Thanks for the clarification, will update the text

alternative requires that nodes have the CNI plugin installed, effectively
coupling the service mesh app with the infrastructure.

This KEP removes the need for a service mesh to use either an initContainer or a

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not true, see above comment.

Copy link
Member Author

@rata rata Jul 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for pointing it out.

I think it still makes sense to discuss the other callouts in the PR, so will update this part later to reflect what you pointed out. Thanks! :)

* Recommend users to delay starting their apps by using a script to wait for
the service mesh to be ready. The goal of a service mesh to augment the app
functionality without modifying it, that goal is lost in this case.
* To guarantee that traffic goes via the services mesh, an initContainer is

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My view of the problem is close but slightly different

  • If we set up iptables first (before pod starts with istio-cni, for example, or as the first init container), all init containers will fail as traffic is blackholed since the sidecar is not running
  • If we set up iptables as last init container, then previous init containers will not have traffic blackholed but will also not use the mesh. In many clusters, this means they cannot connect to any services, which require mTLS, which will no longer happen

Neither of these are addressed by the KEP. A third option is potentially running the iptables in the actual sidecar, but this is not feasible due to permissions.

Note - I would be very (very!) happy with the changes in the KEP. But I don't want to confuse that this solves all service mesh startup ordering issues

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, yes, none of these are addressed (sorry, I wasn't aware that was an important problem). It is a non-goal currently to run initContainer with sidecars containers.

The third option you mention, besides capabilities needed for the redirect, will not run concurrent with initContainers, so traffic is either blackholed or not, but initContainers traffic is not going through the istio sidecar either.

The question is, then, do we want to have initContainers with sidecar containers in-scope or not? What do you think @howardjohn ?

The trade-off, from my POV, is:

  • Having it in-scope for this will solve/improve that problem (making this KEP way more complicated to move forward)
    vs
  • Having it out of scope will not solve that problem but will make advances with the rest

IMHO, it makes sense to move forward with this proposal (i.e. not solve the sidecars running concurrently with initContainers) but worries me a little that, if we later want to solve the initContainers with sidecar problem, the API might end up messy.

If we implement this KEP and in the future want to extend this in some way to address it, we can technically do it using other container types. For example, type: daemon (probably a better name :-D) can be used to mean that (those containers starts before initContainers too). Or type: OrderedPhase with an Order: <int> and, let's say, using Order: -1 can mean "before initContainers". However, it starts to get messy if you have containers in the Containers array that start before initContainers.

So, that seems like a messy thing to me. If we want to go the initContainer+sidecars route, we might want to have a new sidecarContainers array that start before initContainers too. That was discussed in the past (only the array, I think, not starting before initContainers IIRC), it is mentioned in the alternatives section and why it was discarded, though.

At the same time, I'm not sure there are many sidecar types that might want to run concurrent with initContainers. Are there any other use case? Maybe if you want to collect metrics/logs from the the initContainers using a sidecar? Not sure there are many users for this, though. Although, maybe it doesn't matter/hurt to most of sidecars. What do others think?

Taking a step back, a question from someone out of the service mesh world: does the service mesh really wants to be part of the pod, if the life-cycle it needs is so different? I guess probably yes for latency and to easily transparently redirect traffic that might not be possible otherwise. But just making sure there are no other solutions and we really want to think about this problem when the service mesh lives in the pod :)

Trying to be pragmatic, we can move to alpha stage without "initContainers + sidecars" (i.e. as it is proposed now) and gather feedback from users. We can gather feedback to see if it is worth doing a sidecarContainers array (or any other alternative that seems best to address the feedback)?

What do others think?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we implemented the KEP as is I would be very happy, it would improve a lot of issues for a lot of people. I would be even more happy if we also fixed the init container issue, but not at the expense of not fixing the startup/shutdown. no need to let perfect be the enemy of good.

It would probably make sense to future proof this so it can be done, but not worry about it right now. Having a new type seems feasible to me.

As far as other use cases - all service meshes require this, as far as I know, and logging seems like something you would want for init containers as well. Honestly most sidecars are feasible to run with the init containers, for all the same reasons you want them to run with the main container. For example, some people have SQL proxies - same is needed if you use SQL in init container.

Taking a step back, a question from someone out of the service mesh world: does the service mesh really wants to be part of the pod, if the life-cycle it needs is so different

I don't think we need it to run in the pod, but there are no other existing alternatives that would work.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commenting from other thread on

It was nice to have, but not many real use cases nor a big deal.

This claim is basically saying init containers that do networking is not a real use case, which isn't true in my opinion. There are very clear use cases for doing so, otherwise init containers would not likely exist 🙂. I think its more accurate to say that the people you talked to happen to not have users complain about them much, but I can't imagine that generalizes well to the broader Kubernetes community

Copy link
Member Author

@rata rata Jul 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I agree that perfect is the enemy of good, I'm open to opinions whether sidecars make more sense before initContainers or after.

But yeah, IMHO, the best solution might be not so "theoretical": implement either one of them and gather user feedback :-)

@derekwaynecarr What do you think?

Copy link
Member Author

@rata rata Aug 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SergeyKanzhelev the biggest change is semantics and how we want to reflect that. But the whole KEP needs to be changed, IMHO, if we want that instead.

Furthermore, those are not sidecars as we know them today, so we need to define how these "new" kind of sidecars behave a little bit. For example, some things we may need to decide from the top of my head:

  1. Will we use the current containers array for that? In that case, the initContainers array will start before the containers not type sidecar only (they are meant to start before any other container in the pod, the name initContainers might become confugsing. We might be able to clarify with docs). That might be unexpected, so we might want to consider using a sidecarContainers array to avoid confusions. Not really sure, something to sleep on it.
  2. Today sidecar containers (as a concept, not the type as it is not repesnt) start after initContainers have run. If someone is using the initContainers to initialize something for sidecar containers (totally legit), another way to provide that functionality might be needed. Another initSidecarContainers thingy or something else? This can become weird
  3. What to do if a sidecar crashes while some initContainers are running? Just restarts, I guess? Then initContainers don't have the guarantee of a sidecar running either (as it can crash)

Thinking about this, I have mixed feelings, but I think I lean towards improving sidecars as we know them: sidecars that start after initContainers and use the alpha phase to gather feedback and revisit that. If we see that, for example, both things make sense, we might want to add a different type that starts before initContainers.
What do you think?

But I'm very open to thoughts :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for delayed response. Somehow all notifications broke for me. It's great to have a ping at a meeting.

I though about these concerns. I understand the main concern is that sidecar containers that has a dependency on init containers would not work as expected when one simply adds a new type to the container? The assumption that everything will just work once conatiner marked as sidecar, may not be true. What if existing sidecar container has some synchronization logic that checks for payload to start and keeps it in "not started" state. This will break the entire pod initialization.

The same time both important scenarios - logs and mesh would need an init containers support. If we will conclude that the way to support init containers is to introduce a similar type field on a container, why wouldn't we do it now. Sidecars which needs to work before init containers will just work. Sidecars which needs to work after init container may wait for the first readiness probe and treat it as an indication that initialization was complete.

Same as @howardjohn I believe that this KEP is great in addressing many issues. I just want to make sure there is no quick win by simply changing startup order for sidecars to run before init. And figuring out details like what to do when sidecars crashed during the initialization.

Perhaps there may be a section in the doc saying why we do not want to run sidecars before init.

Copy link
Member Author

@rata rata Aug 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for delayed response. Somehow all notifications broke for me. It's great to have a ping at a meeting.

Ups, thanks! :)

I though about these concerns. I understand the main concern is that sidecar containers that has a dependency on init containers would not work as expected when one simply adds a new type to the container? The assumption that

Yes, exactly, that is one concern I raised.

everything will just work once conatiner marked as sidecar, may not be true. What if existing sidecar container has some synchronization logic that checks for payload to start and keeps it in "not started" state. This will break the entire pod initialization.

Not sure if we are saying the same here. That is exactly my point, if the sidecar has a dependency on an initContainer doing something before it starts, it will be completely broken.

Or am I misunderstanding you?

The same time both important scenarios - logs and mesh would need an init containers support. If we will conclude that the way to support init containers is to introduce a similar type field on a container, why wouldn't we do it now.

I think both options can work: we can do it now or we can have an alpha without initContainers, see if what we did effectively solves the problems that we think it solves or needs something else. After that, try to expand to initContainers and just focus on the problems of doing that, as we know what we have mostly works and the problems of extending it to that are quite different.

I lean towards to keep the current scope (more on why later in this comment, this really doesn't seem like a quick win as you say). We can try in the next few days if we figure out a way of extending it so we have sidecars + initContianer running concurrently, but if we can't find something obvious that will work in the next few days I'd fall back to the current scope.

What do you think?

Sidecars which needs to work before init containers will just work.

No, they won't on all cases. This is why it is not trivial and I'd rather focus on one problem at a time, and gather feedback.

Istio is one example, they mention in this thread that the service mesh sidecar should be running before initContainers but that itself needs some initialization first. They said:

A third option is potentially running the iptables in the actual sidecar, but this is not feasible due to permissions.

(To add some more context: the problem they refer to is the permissions needed in the container to run the iptables rules. They don't want to have those permissions on the sidecar pod, and instead have them in the initContainer or CNI plugin.)

They either need the CNI plugin to work in this case (for the iptables thing) or an initContainer that runs before sidecars, to create the iptables rules, then start the sidecars and then start the regular initContainer (as they want the sidecar service mesh to be running for other initContainers) and then continue with the initialization.

But yeah, Istio has a way out with the CNI plugin, as they only want some iptables rules. That is, of course, not the case of all sidecars AFAIK (like populating volumes, etc.). Any sidecar relying on an initContainer won't work if started before initContainers, if they also want to run during initContainer phase (as a service mesh does).

Some hacks can be done, like auto-inject the initContainer to be run first. But that has also several problems. Because then several applications want to be "the first" and there is a race injecting there. The analogous is alread happening to inject the last initContainers. Here is an example of such a bug report (it is also in some part of the text this PR adds :)): linkerd/linkerd2#4758 (comment).

So, those workarounds don't seem reliable and will create issues.

Maybe it can be a limitation: if your sidecar runs concurrently during other initContainers run, then you can't have initContainers. 🤔

Sidecars which needs to work after init container may wait for the first readiness probe and treat it as an indication that initialization was complete.

Not sure what you mean exactly with this, can you please elaborate? Wait for which readiness? How that would be explicit in the pod spec?

I just want to make sure there is no quick win by simply changing startup order for sidecars to run before init

IMHO, it doesn't seem like a quick win, but a path to explore that will take time. I think people will be opinionated, with very valid reasons, about having containers in the containers array starting before initContainers vs creating a new array (sidecarContainers), etc.

The more we talk about it, the more I think it will be just easier if we keep the current scope for the alpha phase (sidecars don't run during initContainers), see if the mechanism we add for that are effective for those use cases and then focus on extending this to initContainers (we can make it a show-stopper for beta graduation if that makes anyone more comfortable).

If extending this for sidecar+initContainer is difficult, it might be easier once we gather feedback and benefit users that don't need that shortly. After all, all sidecar containers today just run after initContainers (it is not possible otherwise), they will benefit. And we can just focus on the problem of extending it afterwards and how it is best solved, after having solved the "easier" problem (that is not so easy either, IMHO) and have a better understanding and feedback.

If extending it is easy, then we either figure it out in the next days (and we add it into the scope) or we add it later, it is easy :)

What do you think?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Istio is one example, they mention in this thread that the service mesh sidecar should be running before initContainers but that itself needs some initialization first.

It would actually work fine I think. The sidecar does not need iptables set up. Our only requirement is the iptables must be set up before the app container. So:

  • init, then app/sidecar in parallel is ok (but not ideal - hence the kep)
  • sidecar, then init, then app is ok
  • sidecar, then app, then init is not ok

I can see hypothetical scenarios that do require init containers to run first though, like if we use init containers to provision certificates. It could be worked around though, for example have the sidecar get ready without the certs and have the init container poll to ensure it doesn't exit until the sidecar has picked up the certs. Note this is still a much better scenario because it involves the sidecar/init container to make a change (2 images, likely owned by Istio/Linkerd/) rather than every single application container.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rata I think we are on the same page understanding pros and cons of init and sidecars running in parallel. I think we prioritize different things while weighting options. So my questions is mostly to understand your priorities and understand the complexity.

The way I see it - sidecar containers are something (as you mentioned earlier) that almost like "outside the pod" feature that represent an ambient state of a pod. Something that always exists and does not block termination. So developer should not think of synchronizing with it. And sidecar owner has guarantees that all app logic will be executed when sidecar is active. And I see init and regular containers as a developer's created app logic.

If developer wants to create it's own sidecar that is an app logic (like queue management or config synchronization), and it depends on some Init containers, this will be a little bit more involved as it will require a bit of synchronization with Init containers. I don't see the need for this synchronization implementation as a big deal. Today sidecars like this has are probably implementing even more logic.

As for semantic - we can call this KEP "daemon" or "ambient" containers instead of "sidecar" if this will make semantic clearer.

Finally for the implementation complexity - are you talking about kubelet logic complexity or complexity for end user?

I also pinged you on Slack in case you want to discuss it in-person before writing more text =).

This proposal aims to:
* Allow Kubernetes Jobs to have sidecar containers that run continuously
without any coupling between containers in the pod.
* Allow pods to start a subset of containers first and, only when those are

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is starting all of the contains at once right? Its not ordered like init containers?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if you think I should clarify that :)

the `containers` array.

For most users we talked about (some big companies and linkerd, will try to
contact istio soon) this doesn't seem like a problem. But wanted to make this

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment above on how this impacts Istio (and I suspect linkerd - I am surprised by this sentence).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I had a call with some Linkerd devs and my understanding was that it wasn't very important to them. It was nice to have, but not many real use cases nor a big deal. For other companies using a Kubernetes fork with sidecars I talked to, they just don't need the sidecars to run with initContainers either. That functionality is not present on their fork.

But let's continue the initContainers + sidecars conversation there :-)

> ...
> The Istio CNI plugin performs the Istio mesh pod traffic redirection in the Kubernetes pod lifecycle’s network setup phase, thereby removing the requirement for the NET_ADMIN and NET_RAW capabilities for users deploying pods into the Istio mesh. The Istio CNI plugin replaces the functionality provided by the istio-init container.

In other words, when using the CNI plugin it seems that InitContainer don't use

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its worse - the init container cannot do any networking at all

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, didn't know that drawback from the CNI plugin. Thanks!

1. Run preStop hooks for non-sidecars
1. Send SIGTERM for non-sidecars
1. Run preStop hooks for sidecars
1. Send SIGTERM for sidecars
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@howardjohn Another question, if it is not too much trouble: will this alternative work for you guys?

To make it short: the current KEP proposes to run preStop hooks for sidecars as the first step in the shutdown sequence, to let service mesh drain traffic. This PR proposes (the reasons are explained ^) to modify that behavior and have a TerminationHook that is run first. It will be the same, just under a new field called TerminationHook.

IIUC, this is needed for some service mesh (I'm working with a client with a in-house service mesh that needs this, for example) and I think Istio needs it, but wanted to confirm.

Does Istio needs this and would it work for you guys?

PS: Linkerd, for example, doesn't need this as they use kubernetes services and pods in terminating state are removed from the endpoints objects (except for services with externalTrafficPolicy: Local, but that is addressed in another KEP), so that alone drains the service mesh connections. Not 100% sure in which case Istio currently is.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we have a need to run any prestop hooks. I am not sure why we would start draining traffic. Isn't that what the KEP is fixing? The application shuts down, at which point all connections are closed. Then we can just terminate the sidecar immediately.

We certainly need this draining today but I don't think we do if this is implemented.

Copy link
Member Author

@rata rata Jul 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you need to de-register the pod from service discovery or something, when pod is about to terminate, so the app can just finish in-flight request and stop, without receiving new incoming requests via the service mesh?

On shutdown just having:

  1. SIGTERM non-sidecars containers (using preStop hooks and all as expected)
  2. SIGTERM sidecars containers (using preStop hooks and all as expected)

will work for istio?

I mean, the current KEP says:

PreStop Hooks will be sent to sidecars before containers are terminated. This will be useful in scenarios such as when your sidecar is a proxy so that it knows to no longer accept inbound requests but can continue to allow outbound ones until the the primary containers have shut down.

Istio doesn't need that at all? (either being called preStop hooks or TerminationHook)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without Istio when pod termination is triggered:

  • Pod starts to terminate
  • Pod is removed from Endpoint
  • Endpoint update is sent to kube-proxy, stops sending traffic to this pod
  • Pod fully shuts down

Hopefully in that order, but not guaranteed!! There is no guarantee of zero downtime here. See https://blog.sebastian-daschner.com/entries/zero-downtime-updates-kubernetes for more detailed explaination.

With Istio when pod termination is triggered:

  • Pod starts to terminate
  • Pod is removed from Endpoint
  • Endpoint update is sent to istio-proxy, stops sending traffic to this pod
  • Pod fully shuts down

You'll notice this is really the same.

A valid solution to making the race conditions mentioned in the blog is adding a preStop hook - it doesn't have to be the sidecars preStop hook though. Any of:

  • Telling users to add preStop hook if they want this behavior (this doesn't violate the transparency goal of service mesh, as the same issue is present without service mesh)
  • Add the preStop hook to our sidecar
  • Add the preStop hook to the users container (we control injection, we can modify whatever we want).

Copy link
Member Author

@rata rata Jul 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed answer!

@Joseph-Irving do you remember why that part (running preStop hook twice) was added to the KEP?

I mean, I know use cases where that is needed for the very same reason the KEP says, but thought Istio/Linkerd would need it too, but it doesn't seem to be the case. I'm okay keeping it, though, I know use cases for this.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rata to be clear, I'm not saying that TerminationHook wouldn't be useful, I suspect it would be. It just isn't clear that it would be absolutely necessary for Istio, assuming the sidecar is guaranteed to be up and ready for the duration of the application container.

Effectively, I'm echoing @howardjohn's "valid solution", which just uses preStop hooks. Specifically, the option where we inject a preStop hook into the application container, which effectively puts the sidecar in lameduck mode. This would basically just be a curl command.

If TerminationHook were available, however, it would be cleaner WRT injection since we could handle lameducking in the sidecar's TerminationHook rather than mucking about with the application container.

Copy link
Member Author

@rata rata Aug 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nmittler Cool, thanks a lot! A curl command seems like a trade-off worth considering. Thanks again for the input :-)

For other reviewers, here goes a tl; dr: The TerminationHook field is a proposal to remove calling two times the preStop hooks, due to the problems that doing that creates (see this very same section the comment is on for additional context). Given that implementation-wise is quite similar to what we already have in the code, I think keeping it can be simple too.

Just let us know if you think it will be simpler to remove it for now (either from an implementation POV or from a KEP design POV) so we settle on one or the other.

The main reason to add this 2 years ago was istio, but there are today some other in-house service mesh apps I'm working with that must have this to work properly. Although Istio now is fine without it, having it seems better too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The concept of a Termination hook may be broadly useful so perhaps it makes sense to remove it from this KEP and develop it independently? It has no dependency on this proposal. It would also simplify this KEP if we removed the pre-stop behaviour and just focus on startup/shutdown ordering.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 ... that's really the thrust of much of my discussion above. I think a separate KEP makes a lot of sense here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with either of the choices (keep it or leave it in this KEP) as long as we move forward with this KEP :-D. I'd like to know what @derekwaynecarr or others from SIG-node think?

Removing this from the KEP effectively removes a step from the shutdown sequence and that changes how we might want to split the time in the shutdown sequence. For example, this affects other callouts like this one). So, if this needs to be added as a different KEP, it will change the time for the shutdown sequence for sidecar containers as described in this KEP. IMHO, we can remove it for now but if we want to add it later, we may want to add it to this KEP.


Rodrigo will reach out to Istio devs to see if the situation changed since 2018.

[istio-bug-report]: https://github.com/kubernetes/kubernetes/issues/65502
Copy link
Member Author

@rata rata Jul 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@howardjohn one last question to address this ^ section, if it is not too much to ask.

Do you still need to use, as the bug says, some scripts for other applications in the pod to wait for the service mesh to be ready? Or was the situation changed in these 2 years?

If it changed a lot and you can share your current pain points and which ones this KEP effectively address, it would be great.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @nmittler that opened the bug, maybe? :)

Comment on lines 1026 to 1041
##### Suggestion

Confirm with users that is okay to not have sidecars during the initContainers
phase and they don't foresee any reason to add them in the near future.

It seems likely that is not a problem and this is a win for them, as they can
remove most of the hacks. However, it seems worth investigating if the CNI
plugin is a viable alternative for most service mesh and, in that case, how much
they will benefit from this sidecar KEP.

It seems likely that they will benefit for two reason: (a) this might
be simpler or model better what service mesh need during startup, (b) they still
need to solve the problem on shutdown, where the service mesh needs to drain
connections first and be running until others terminate. These are needed for
graceful shutdown and allowing other containers to use the network on shutdown,
respectively.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got a little confused by this section. The CNI plugin is only being used just for setting up the iptables forwarding. The main goal of this KEP (IMO), i.e. setting the proper startup and shutdown ordering among sidecars and regular containers, remains a critical need regardless of whether the forwarding rules are set through CNI or through an initContainer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sorry, it was a misunderstanding from my side. As it was explained in this comment, will update the text regarding this :)

Comment on lines 912 to 916
##### Alternative 1: Add a per container fatalToPod field

One option is to add a `fatalToPod` bool field _per container_. This will mean
that if the given container crashes, that is fatal to the pod so the pod is
killed.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like a useful feature, no just for sidecars, that can be tackled separately 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but if we don't add it now, you can create jobs that will be stalled there forever. Kubernetes won't have any built-in mechanism to prevent them from collecting those jobs.

I mentioned that alternative too.

Any reasons to prefer to have "stalled pods"?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't prefer stalled pods; just thinking on keeping the KEP as focused as possible. I wouldn't mind if stalled pods were a known inconvenient edge case say during alpha. And if the fatalToPod type has broader applicability than just sidecars IMHO it makes more sense to implement it separately.


Pods with sidecar containers only change the behaviour of the startup and
shutdown sequence of a pod: sidecar containers are started before non-sidecars
and stopped after non-sidecars.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd emphasize here that they forcefully stopped when all other containers finished. E.g.:

Suggested change
and stopped after non-sidecars.
and stopped after non-sidecars. Sidecar containers are also terminated when no non-sidecar containers finished.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, thanks! Will amend to current patches, if that is okay to you :)

* Reimplement an init-system alike semantics for pod containers
startup/shutdown
* Allow sidecar containers to run concurrently with initContainers

Allowing multiple containers to run at once during the init phase - this could be solved using the same principal but can be implemented separately. //TODO write up how we could solve the init problem with this proposal
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a duplication of an item above Allow sidecar containers to run concurrently with initContainers?

Copy link
Member Author

@rata rata Aug 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I just left it as it had a "TODO" note. I can remove it as part of this PR, if you think it is better. Let me know :)

Aim to use alternative 1, while in parallel we collect more feedback from the
community.

It will be nice to do nothing (alternative 2), but probably the first step
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but probably the first step

It can always be added after the feedback. So we can go with do nothing and add exactly what is required later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. If all agree on that, LGTM :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was added by request of istio devs, though. Let's see what they say here: #1913 (comment)

And then decide :)

@derekwaynecarr
Copy link
Member

can we move this under a sig-node directory? implementation of this feels largely oriented to sig-node domain.

@k8s-ci-robot k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Aug 11, 2020
@rata
Copy link
Member Author

rata commented Aug 11, 2020

@derekwaynecarr done! :)

@SergeyKanzhelev
Copy link
Member

nit: @rata good way to rename files is to use git mv command. This will preserve the history of the original file even after the move.

@rata
Copy link
Member Author

rata commented Aug 12, 2020

nit: @rata good way to rename files is to use git mv command. This will preserve the history of the original file even after the move.

I did that: git mv sig-apps/0753-sidecarcontainers.md sig-node/ (from my history). Is there something not working as expected?

rata and others added 3 commits September 2, 2020 19:11
Reviewed-by: Joseph-Irving <joseph.irving500@gmail.com>
Signed-off-by: Rodrigo Campos <rodrigo@kinvolk.io>
As suggested by derekwaynecarr here:
	kubernetes#1874 (comment)

Reviewed-by: Joseph-Irving <joseph.irving500@gmail.com>
Signed-off-by: Rodrigo Campos <rodrigo@kinvolk.io>
This was amended by Rodrigo Campos to use 80 columns and address Joseph
suggestions, while also adding some minor tweaks.
@k8s-ci-robot k8s-ci-robot removed lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Sep 2, 2020
@rata
Copy link
Member Author

rata commented Sep 2, 2020

@SergeyKanzhelev @derekwaynecarr can you pelase lgtm and approve again? Merge was not possible due to a silly conflict (this PR #1939 was merged and somehow caused a conflict), so I rebased without changes.

Thanks again for the reviews! :)

@SergeyKanzhelev
Copy link
Member

/lgtm

thank you @rata!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 2, 2020
@k8s-ci-robot k8s-ci-robot merged commit 6928d9f into kubernetes:master Sep 2, 2020
@k8s-ci-robot k8s-ci-robot added this to the v1.20 milestone Sep 2, 2020
@rata rata deleted the rata/sidecar-kep branch September 2, 2020 17:27
rata added a commit to kinvolk/kubernetes-enhancements that referenced this pull request Sep 9, 2020
This was requested by Sergey here:
	kubernetes#1913 (comment)

Signed-off-by: Rodrigo Campos <rodrigo@kinvolk.io>
rata added a commit to kinvolk/kubernetes-enhancements that referenced this pull request Sep 9, 2020
As requested by Sergey:
	kubernetes#1913 (comment)

Signed-off-by: Rodrigo Campos <rodrigo@kinvolk.io>
rata added a commit to kinvolk/kubernetes-enhancements that referenced this pull request Sep 9, 2020
One decision to revisit was to see if really wanted to change the pod
phase. It was clearly agreed here that we won't:
	kubernetes#1913 (comment)

This commit just removes the section to discuss that (it is already
agreed) and updates the KEP to reflect that.

Signed-off-by: Rodrigo Campos <rodrigo@kinvolk.io>
rata added a commit to kinvolk/kubernetes-enhancements that referenced this pull request Sep 9, 2020
This was added due to a comment from Istio long ago[1], but they don't
need this anymore[2]. Furthermore, our use cases at Kinvolk also work
just fine without this.

[1]: kubernetes/community#2148 (comment)
[2]: kubernetes#1913 (comment)

Signed-off-by: Rodrigo Campos <rodrigo@kinvolk.io>
rata added a commit to kinvolk/kubernetes-enhancements that referenced this pull request Sep 10, 2020
This was requested by Sergey here:
	kubernetes#1913 (comment)

Signed-off-by: Rodrigo Campos <rodrigo@kinvolk.io>
rata added a commit to kinvolk/kubernetes-enhancements that referenced this pull request Sep 10, 2020
As requested by Sergey:
	kubernetes#1913 (comment)

Signed-off-by: Rodrigo Campos <rodrigo@kinvolk.io>
rata added a commit to kinvolk/kubernetes-enhancements that referenced this pull request Sep 10, 2020
One decision to revisit was to see if really wanted to change the pod
phase. It was clearly agreed here that we won't:
	kubernetes#1913 (comment)

This commit just removes the section to discuss that (it is already
agreed) and updates the KEP to reflect that.

Signed-off-by: Rodrigo Campos <rodrigo@kinvolk.io>
rata added a commit to kinvolk/kubernetes-enhancements that referenced this pull request Sep 10, 2020
This was added due to a comment from Istio long ago[1], but they don't
need this anymore[2]. Furthermore, our use cases at Kinvolk also work
just fine without this.

[1]: kubernetes/community#2148 (comment)
[2]: kubernetes#1913 (comment)

Signed-off-by: Rodrigo Campos <rodrigo@kinvolk.io>
rata added a commit to kinvolk/kubernetes-enhancements that referenced this pull request Sep 10, 2020
This was suggested by Sergey here:
	kubernetes#1913 (comment)

Sadly, that branch is merged and can't click on commit suggestion now.
Credits goes to Sergey anyways :-)

Signed-off-by: Rodrigo Campos <rodrigo@kinvolk.io>
rata added a commit to kinvolk/kubernetes-enhancements that referenced this pull request Sep 10, 2020
In the previous PR, istio devs commented that some things were not
accurate. This commit just updates the text to (hopefully) correctly
reflect it now.

Removed the paragraph about this removing the need for an initContainer
due to comment here:
	kubernetes#1913 (comment)

I thought it was an okay to insert the iptables rules within the sidecar
proxy container, but it is not okay as that requires more permissions
(capabilities) on the sidecar proxy container which is not considered
accetable by Istio devs.

Signed-off-by: Rodrigo Campos <rodrigo@kinvolk.io>
SergeyKanzhelev pushed a commit to SergeyKanzhelev/enhancements that referenced this pull request Jan 8, 2021
This was requested by Sergey here:
	kubernetes#1913 (comment)

Signed-off-by: Rodrigo Campos <rodrigo@kinvolk.io>
SergeyKanzhelev pushed a commit to SergeyKanzhelev/enhancements that referenced this pull request Jan 8, 2021
As requested by Sergey:
	kubernetes#1913 (comment)

Signed-off-by: Rodrigo Campos <rodrigo@kinvolk.io>
SergeyKanzhelev pushed a commit to SergeyKanzhelev/enhancements that referenced this pull request Jan 8, 2021
One decision to revisit was to see if really wanted to change the pod
phase. It was clearly agreed here that we won't:
	kubernetes#1913 (comment)

This commit just removes the section to discuss that (it is already
agreed) and updates the KEP to reflect that.

Signed-off-by: Rodrigo Campos <rodrigo@kinvolk.io>
SergeyKanzhelev pushed a commit to SergeyKanzhelev/enhancements that referenced this pull request Jan 8, 2021
This was added due to a comment from Istio long ago[1], but they don't
need this anymore[2]. Furthermore, our use cases at Kinvolk also work
just fine without this.

[1]: kubernetes/community#2148 (comment)
[2]: kubernetes#1913 (comment)

Signed-off-by: Rodrigo Campos <rodrigo@kinvolk.io>
SergeyKanzhelev pushed a commit to SergeyKanzhelev/enhancements that referenced this pull request Jan 8, 2021
This was suggested by Sergey here:
	kubernetes#1913 (comment)

Sadly, that branch is merged and can't click on commit suggestion now.
Credits goes to Sergey anyways :-)

Signed-off-by: Rodrigo Campos <rodrigo@kinvolk.io>
SergeyKanzhelev pushed a commit to SergeyKanzhelev/enhancements that referenced this pull request Jan 8, 2021
In the previous PR, istio devs commented that some things were not
accurate. This commit just updates the text to (hopefully) correctly
reflect it now.

Removed the paragraph about this removing the need for an initContainer
due to comment here:
	kubernetes#1913 (comment)

I thought it was an okay to insert the iptables rules within the sidecar
proxy container, but it is not okay as that requires more permissions
(capabilities) on the sidecar proxy container which is not considered
accetable by Istio devs.

Signed-off-by: Rodrigo Campos <rodrigo@kinvolk.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/node Categorizes an issue or PR as relevant to SIG Node. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants