-
Notifications
You must be signed in to change notification settings - Fork 9.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bitnami/redis] Redis and redis-sentinel mix hostnames between config on simultaneous STS rollout restart when running multiple redis instances on single k8s cluster #10016
Comments
Hi @JDKnobloch , Is there any reason for not having the k8s namespaces isolated? I guess this should not happen if there is no mixed traffic between namespaces. |
Hey @miguelaeh, Correct - the nodes are mixing across any redis-sentinel replication instances on the same network. That boils it down a bit smoother - so cross namespace, same namespace, as long as two sentinel replication instances exist on a cluster they can interact (Lines 1-3 in the Redis container log shows a cli command for locating np master return the ci master). While isolating the namespaces would solve the problem our the original nonprod occurrence, ultimately it cannot be done for some scenarios - for example, one of our production k8s clusters runs 2 redis-sentinel replication instances within the default namespace, and both communicate with various apps across the cluster. We need those instances communicating across cluster, thus swapping ports was a simpler solution for us. |
I would need to take a look to how sentinel locates the master in deep. In lines 1-3 we are using the FQDN for the headless service, including the namespace, so the issue is probably on how the sentinel locates the master maybe there is some wrong configuration. |
I created an internal task to investigate the issue, nevertheless, we cannot provide an ETA of when it will be done. |
Hi @JDKnobloch , Could you share the list of commands step by step to reproduce using Helm and Kubectl? (I mean, without Argo) |
Okay, I will try to get around to this in the coming days and get back to you |
This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback. |
Bump as I still plan to return to this. |
This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback. |
Due to the lack of activity in the last 5 days since it was marked as "stale", we proceed to close this Issue. Do not hesitate to reopen it later if necessary. |
We have the same issue. As said, it is possible to reproduce by deploying redis twice within the same namespace. Please note that we tried setting unique |
What seems to help though (still need to verify) is to deploy always with a unique port set... which is a bit unmaintainable for us as we deploy 36 redis sets (3 redis nodes, one being master + 3 sentinel nodes) to a single cluster. |
I have just experienced this, we also have lot's of redis sets across a cluster in different namespaces, it happened when I increase a resource requests across all instances and they all got applied together/around the same time and now they are stuck in a state of trying to assign a sentinel node in another namespace, for which it has no configurations to do so and then failing and restarting. It came along with this error:
|
I'm facing the mismatch master issue too. A sentinel suddenly claims the master from other deployment to be its own master |
@JDKnobloch did you find a solution to this problem? |
@Voolodimer Our solution was to simply migrate each redis instance to use different ports so they could no longer cross streams. So for each instance the following values were updated (we are running replication & sentinel):
Where we moved redis to port 6385 and sentinel to 26385. Then just assign each redis / sentinel instance unique ports. |
We have the same issue, |
Same issue here. We also see this happening when we changed resource requests and updated versions for multiple redis installations in different namespaces. |
I observed it would happen whenever you have previous Redis Sentinel deployed with option useHostnames: false, and the laters with option useHostnames: true. |
I'm facing the same issue, was this issue fixed in the later versions? |
Could you please create a new ticket describing your specific use case and configuration? Thanks |
Name and Version
bitnami/redis 16.8.7
What steps will reproduce the bug?
a. GitOps is likely not necessary but used in this case for visibility and replicability from our production environment.
a. Apply two or more redis instances w/ sentinel enabled and auth disabled (base and sentinel)
kubectl rollout restart statefulset redis-node
commanda. This should be done as quickly as possible - we want to them all restarting at the same time
opt/bitnami/redis-sentinel/etc/sentinel.config
a. It may take multiple restarts before configs are mixed - typically within the first 2 restarts. May be timing related.
Note: These redis instances can be installed in separate namespaces and will still experience this same issue.
Are you using any custom parameters or values?
The only values required to be set are:
What is the expected behavior?
Multiple instances should perform StatefulSet restarts as expected and as they do when restarted alone.
What do you see instead?
As the pods restart, configuration between both redis and redis-sentinel can become mixed - containing hostnames for any other replication redis clusters that are restarted at the same time w/ auth disabled. Despite configs using hostnames and redis instances being in separate namespaces, this same issue occurs.
This may take two sts rollout restarts to occur - allowing pods to fully regen between restarts.
Additional information
This first occurred within our nonprod env where we have 4 redis-sentinel replication clusters running. We were simultaneously bumping versions of some of the instances, triggering a restart that caused this issue. Redis nodes in separate namespaces were stuck communicating with each other - while still appearing 'healthy' from a cluster standpoint.
After discovering this issue and seeing its potential scope, we elected to migrate all redis-sentinel replication instances on our clusters to expressly define exclusive ports - necessary for both redis & redis-sentinel - to circumvent the vulnerability.
Further testing allowed me to break down our values file to the one defined - I have rigorously tested this issue and have verified results multiple times.
This issue addresses topics that were discussed in #1682 and #5418
I ran a minikube cluster locally for testing.
Below are logs from an example test:
Argo Application(s) applied:
Two more applications were deployed - one with names updated to ci-redis, one np-redis - both in their own ci/np namespaces.
After deploying the three applications into the k8s cluster, allowed all nodes to fully launch prior, then ran kubectl restart quickly for all STS. On this test, it took two restarts before the issue occurred - all nodes fully launched prior to both restarts.
As nodes restart, logs can be observed as cross pollination occurs, and by the end, configurations become extremely mixed.
Here are logs and config from a replica node on the cluster - np-redis-node-1.
Redis container logs:
Sentinel container logs:
Redis config from
opt/bitnami/redis/etc/redis.conf
:Replica config from
opt/bitnami/redis/etc/replica.conf
:Sentinel config from
opt/bitnami/redis-sentinel/etc/sentinel.config
:I chose these logs as I believe this issue may have been sourced from this node in this scenario - within the redis container logs, we can see that immediately the following logs are shown:
which appears to indicate that the redis-cli
get-master-addr-by-name
function likely searches for any redis instances named mymaster on the subnet, and is not meant for k8s deployments necessarily (which would explain its returning of an 'incorrect' hostname). When you rename the master using masterSet, replicas are still able to attach across instances - which I believe again indicates how ever replicas are discovered from redis is not meant for a kubernetes multi-instance scenario. Updating port allows us to fully circumvent the issues.This script is one source (more within that file) that call the redis-cli
get-master-addr-by-name
function and are returned the nefarious results. I believe functionality should be added within these scripts (and replica discovery scripts) to ensure only proper hostnames within the current namespace are pulled OR this issue should be brought down to redis, however, I believe redis is likely behaving as expected in this scenario.Please let me know any questions regarding this! I am happy to provide additional information as needed.
The text was updated successfully, but these errors were encountered: