-
Notifications
You must be signed in to change notification settings - Fork 9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bitnami/redis] Redis sentinel enters tilt mode and dies after a time #9689
Comments
Finished retested with smaller values. Exact same results. Redoing again with more things stripped, will post in a bit. ---
image:
debug: true
auth:
password: "password"
replica:
replicaCount: 3
persistence:
enabled: false
sentinel:
enabled: true
image:
debug: true
quorum: 2
downAfterMilliseconds: 2000
failoverTimeout: 1000
livenessProbe:
enabled: true
initialDelaySeconds: 40
readinessProbe:
enabled: true
initialDelaySeconds: 40
|
Finished another test, this time with a more basic values file. Exact same behavior. ---
image:
debug: true
auth:
password: "password"
replica:
replicaCount: 3
persistence:
enabled: false
sentinel:
enabled: true
image:
debug: true
quorum: 2 sentinel log
|
For sanity I reconfigured chrony on the hosts and configured them all to the same server. This did not help either. |
Hi! After some attempts trying to reproduce the issue, it seems it is not an issue related to the Bitnami Redis Helm chart but about how the application or environment is being used/configured. For information regarding the application itself, customization of the content within the application, or questions about the use of technology or infrastructure; we highly recommend checking forums and user guides made available by the project behind the application or the technology. That said, I will keep this ticket open until the stale bot closes it just in case someone from the community adds some valuable info. |
I am just completely baffled by this behavior, and if anyone has any ideas as to what could be causing this, I would be very grateful. |
@carrodher, is Redis Sentinel 7.0+ on the roadmap? I have not seen any reference nor do I see anything in https://hub.docker.com/r/bitnami/redis-sentinel Thanks! |
Hi, at this moment, Redis 7.0 is a "Release Candidate" version, once released this new major version as a stable version by the upstream Redis maintainer, we will include it in our catalog as usual. |
Hello @carrodher, At this point I think I have it narrowed it down to Rancher RKE redis/redis#10547 (comment) I am wondering if you or someone else can test and can confirm? |
I had same issue with bitnami images. Looks like default resource limits are to low. I was able to get reed of the issue once provided more resources: resources:
requests:
cpu: "2"
memory: 1024Mi
limits:
cpu: "2"
memory: 1024Mi also I have memory limit in configuration that is equal to limits. |
Unfortunately no, The chart does not have default. https://github.com/bitnami/charts/blob/master/bitnami/redis/values.yaml#L1061 And setting that does not help. I tried it out thinking maybe RKE does something weird, but nope. The tilting can take a while to happen, but it eventually will. |
@jonathon2nd yep sorry for that. I was able to detect issue after 8h under load, however it's become more rare case |
What is your setup @OxCom? Are you recreating issue on prem or in cloud? |
@jonathon2nd on prem. Also I tried keep data in memory or save to disk, but
still same issue in logs.
monitoring shows only 500k keys and no load by memory and CPU.
|
@OxCom, Do you use Rancher? If so is your downstream k8s cluster RKE? |
@jonathon2nd nope, we rolled out in hard way. only heml charts and custom yamls. |
Hi @carrodher, |
Yes, the code is present at https://github.com/bitnami/bitnami-docker-redis, you can fork this repo and do any modification in the container logic. Then, you can use your own image by using it in the |
I looked and it seems like the actual redis executables are downloaded from here https://github.com/bitnami/bitnami-docker-redis/blob/master/6.2/debian-10/prebuildfs/opt/bitnami/scripts/libcomponent.sh, use vars from here https://github.com/bitnami/bitnami-docker-redis/blob/master/6.2/debian-10/Dockerfile To build a url, example: https://downloads.bitnami.com/files/stacksmith/redis-6.2.6-12-linux-amd64-debian-10.tar.gz I see that bitnami has 12 iterations of 6.2.6, where as the official redis image only has one. We also see that the different 6.2.6 tars that bitnami is hosting are different. Is there a way to see the pipeline bitnami is using to build these executables? |
That part is not hosted on GitHub. The main versioning is following the upstream releases, I mean, 6.2.6 and other versions will follow the same cadence as the Redis project. The revision (-12) is an internal version to track changes in our compilation recipes |
Thanks! Turns out it does not really make a difference. We are still trying to figure out what is happening. All tests still lead to it being some interaction between binami/redis and Rancher RKE. At least that is the best conclusion that matches everything so far. We were thinking it was due to some change bitnami makes to the redis exec, because it does not match official redis at all. Redis:6.2.6
bitnami/redis:6.2.6-debian-10-r192
I then noticed that there are different tars for redis-sentinel entirely So I started comparing. And there are differences between versions Compare redis 2.6.2-9 to sentinel 2.6.2-9 Digging into it more, it looks like the redis and redis-sentinel tars are mostly the same, looks like some header or info change
bitnami sentinel
So to test, we reinstall the chart in diagnostic mode, and installed redis from source. And replacing it in sentinel start script. Unfortunately, redis sentinel still tilted. This is most of the log from my redis sentinel container. It captures most of what I did, shows that the version number is different meaning it is using the newly built executable. The reason we tried this is because official redis sentinel never tilts. Even 17 days later. So we are trying to figure out what the difference is. Something we tried yesterday was separating the redis and sentinel containers onto different pods, sentinel still tilts like crazy. We got a couple more ideas we would like to try. |
Have done some more tests. Since changing out the bitnami built executable for redis source did not work, we quickly tried out changing the base image to We also tried changing the vm clock source from Looking at the single manual official redis I setup (redis/redis#10547 (comment)), I needed to retest a single bitnami/redis node. I had thought I had done this and documented it, but I can not find anything I recorded. So redid the test, it has been about three hours and no tilting yet So this got us thinking that even though the logic of tilting should just be dictated by the vm the container is running on (either by cpu/io issues or clock issues), something must be happening either network/controller wise, or the sentinels communicating that is somehow triggering issue. To test this, I deployed a bitnami/redis replica count of 3 to the same vm. Not only did it still tilt, only two nodes have tilted so far, while one node did not tilt. So now to eliminate that official redis will not tilt with more than one node, I wrote up new yamls. And so far, those have not tilted |
Can confirm I'm also facing the same title issue with the sentinel container in |
My org is facing this issue too. Would love to find a solution. |
Any update on this? Thanks. |
We experience this as well. Seems like the charts are not production ready. Using rancher k8s. Any update? |
Hi all, note that we build Redis in the following way:
Apart from that we don't do anything special. If you are finding issues using |
We know it's not the actual build, since we've changed the binary (as previously documented) and it didn't solve the issue. However, testing it using an entirely different redis container image seems to fix the issue when deploying a similar redis sentinel cluster. If you look above at #9689 (comment) you will see that I created a redis sentinel cluster manually with official redis images, and it does not have the same problem. It appears from all my testing and other user reports (ex: #9689 (comment)) that it is some interactiong between bitnami/redis script or config and Rancher RKE. Is it possible for you to attempt to replicate @marcosbc or @carrodher ? |
Hi @jonathon2nd, we appreciate the detailed investigation on your side.
Have you considered deploying a Bitnami chart with the "official" image configuration and see if that helps? But using similar YAML files to the ones you described above. I don't think it is related to any scripts, since in the case of Redis Sentinel, they only set configuration entries, and when you mount a configuration file, these steps are skipped.
I've deployed a Redis and left it running about 12 hours, but no tilting yet... Should it happen in a specific node (e.g. node 0) or is it common to the entire cluster? I will leave it running for a few more hours/days if possible, in order to see if we are able to get it. |
Thanks for updating the REDIS chart. |
I'm really glad to hear that! 😄 Thanks for letting us know |
@mgrzeszczak we've used useHostnames and set it to false. Now there is no tilting in logs and no "random" restarts. |
@marcosbc , after disabling use hostname feature , connectivuty of pods from differnt k8s cluster would be impacted also . as now master address would be return in form of pod ip , which is only accessible inside same cluster. how can i access redis-master pod via sentinel from a diifferent cluster . |
@jotamartos , my issue is that after connecting with redis-sentinel , i need information of my redis-master in form of ip and port ( when using jedis or connecting from a java service ) . when the hostname usage is disabled , the ip which sentinel will return will be a pod ip . As you know pod ip is only connectible within same k8s cluster . but if my java service is outside the cluster where my redis server is running , my java application would not be able to connect to the pod ip . thats why we need something that is accessible irrespective of cluster . my usage of redis -- java application --> sentinel-->(return ip of redis master to java appliction )----> redis master is connected to java application is there any way we can address this issue also , i can get an ip and port , which are accessible from anywhere |
Name and Version
bitnami/redis 16.8.2
What steps will reproduce the bug?
Install bitnami/redis in sentinel mode.
Wait a while (It typically happens within an hour) for sentinel containers to start restarting for no reason.
Are you using any custom parameters or values?
What is the expected behavior?
Expected behavior is sentinel to not restart for no seemingly reason.
What do you see instead?
After some period of time, sentinel containers enter tilt mode on and off again, and then just restart.
Entire log of sentinel container up till restart
Redis container log.
Additional information
Seems like it may be the probes killing the container?
![Screenshot_20220405_103651](https://user-images.githubusercontent.com/52681917/161803276-edd1331d-59dd-4dcd-8050-65f8fe104680.png)
![Screenshot_20220405_103718](https://user-images.githubusercontent.com/52681917/161803315-f7fc5c06-8379-49cf-856c-3f3b69b9c521.png)
This has also been happening for a while, we just have not noticed cause the redis container is not restarting, so all data is fine.
![Screenshot_20220405_103805](https://user-images.githubusercontent.com/52681917/161803542-4ddb53a7-cbe9-4a7b-9ac8-b1cfe8c0e2de.png)
I think it might be host related somehow. The only reason I am thinking this is because we were testing in a multi-cluster environment. Part on-prem and the other part in a remote managed k8s. We see this behavior on-prem, and not in remote k8s.
I have been testing on a new cluster with Rocky Linux
But this is also happening on the older deployment from the image above
I am still testing, I am pairing down the values file more and more until I see the behavior stop. This is taking a while as it take a random long amount of time until sentinel enters tilt mode repeatedly and dies.
We have many things deployed onto these clusters, and have no other observed issues.
I have been poking around the hosts and trying to see if there was anything happening that would line up with the behavior explained here: https://redis.io/docs/manual/sentinel/#tilt-mode and I have not found anything yet. Metrics for the vms are all reasonable. No monitoring or alarming is going of for any of these clusters/workers.
I am unsure what is happening. If anyone reading this has any insight on what is happening, input would be greatly appreciated.
Thank you!
The text was updated successfully, but these errors were encountered: