-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove s6-overlay and NMA sidecar annotation #689
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
// This reconciler is a best effort. It only tries to surface meaningful | ||
// error messages based on the events it see. For this reason, no errors | ||
// are emitted. We will log them then carry on to the next reconciler. | ||
c.Log.Info("Failure detecting in CrashLoopReconciler. Will continue on", "err", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you word this better?
if nmaStatus == nil { | ||
continue | ||
} | ||
if nmaStatus.RestartCount > 0 && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logical expression is quite complex. Can you add a comment breaking down what's going on?
@@ -161,6 +161,12 @@ type PodFact struct { | |||
|
|||
// The in-container path to the catalog. e.g. /catalog/vertdb/v_node0001_catalog | |||
catalogPath string | |||
|
|||
// true if the pod's spec includes a sidecar to run the NMA | |||
hasNMASidecar bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hasNMASidecar bool | |
hasNMASideCar bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to keep the original name since we should only camel case on word boundaries (i.e. the word is "sidecar" not "side car")
pkg/controllers/vdb/podfacts.go
Outdated
// deployments because that will guaranteed a running container. We cannot use | ||
// the server incase the statefulset is (temporarily) setup for NMA sidecar but | ||
// can't support that. It's a classic chicken-in-egg situation where we don't | ||
// know the proper pod spec until we know the Vertica version, but you don't the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// know the proper pod spec until we know the Vertica version, but you don't the | |
// know the proper pod spec until we know the Vertica version, but you don't know the |
pkg/events/event.go
Outdated
@@ -83,7 +83,7 @@ const ( | |||
MgmtFailedDiskFull = "MgmtFailedDiskfull" | |||
LowLocalDataAvailSpace = "LowLocalDataAvailSpace" | |||
WrongImage = "WrongImage" | |||
NMAInSidecarNotSupported = "NMAInSidecarNotSupported" | |||
NMADeploymentIncompatibilty = "NMADeploymentIncompatibility" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it used anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. I will remove it.
Why did you move some tests? |
When you start a 24.1.0 server, we initially deploy with the assumption that there is a NMA sidecar. It isn't until the pods are running that we can check the version. So, we then tear down the statefulset and rebuild it without a sidecar. There was one test (mount-certs) that was sensitive to this. It would run a kubectl exec in the server container, which may or may not be running. I could have added more checks into that e2e test I think. But at the time, I opted to move it to leg 7, which runs on 24.2.0 only. Leg 3 was getting light on tests, so I then moved another test into it. |
} | ||
} | ||
|
||
func (c *CrashLoopReconciler) findNMAContainerStatus(pod *corev1.Pod) *corev1.ContainerStatus { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see around some functions related to k8s, so I wonder if at some point a separate package will be needed, especially to avoid code duplication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we will have to keep an eye on it. I already moved this function out to the bulder package (even before I saw this) to use the same logic somewhere else.
I hit an issue upgrading from 23.4 to 24.2. So, there are few more changes to upgrade to handle that. I also added a new e2e leg (8) that does this upgrade path to verify this change. |
This removes from the v2 server container the s6-overlay init process. Now that we have full support for running the NMA in a sidecar, this systemd-style process isn't needed. One advantage of this change is it allows us run in OpenShift with the restricted SCC. This makes it easier to deploy Vertica in OpenShift because this is the default SCC. This will be available for consumption in the v24.2.0 server to be released in April 2024.
Also removed in this change is the
vertica.com/run-nma-in-sidecar
annotation. This previously was needed to correctly setup the pod spec. But the operator can automatically determine this. It defaults to using the NMA sidecar for all vclusterOps deployments. If it finds itself on 24.1.0, then it will adjust the statefulset so that the NMA and server are run in a single monolithic container.This also will address different use cases as it relates to NMA deployment and the Vertica version.
--startup-conf
setting, will fail immediately. We cannot run anything in these pods to even get the version. We added a new reconciler in the vdb controller that will check if the pod is in a crash loop backoff. And if so it will log an event message under the assumption its because we are trying to run vclusterOps on an old image.--startup-conf
setting that is new in 24.2.0. The pod facts collection will now use the NMA container if present for these cases.