-
Notifications
You must be signed in to change notification settings - Fork 57
Zabbix 6 related features / improvements of the chart #54
Zabbix 6 related features / improvements of the chart #54
Conversation
* Zabbix server now always has "HA Cluster" mode set to "on", making it possible to easily raise replicaCount and have multiple redundant server containers * added cronjob to autoclean orphaned HA nodes (pods being restarted by K8S...) * easy-to-use Pod anti affinity to make instances of Zabbix Server, Zabbix web frontend run on different cluster nodes if possible * converted zabbix server from statefulset to deployment * switched from Apache container to Nginx container for web frontend by default * added ZBX_SERVER_NAME as an explicit value, not to have to remember to add the according env variable manually as a extraEnv
* Deployment with configurable replicaCount * automatic configuration of Zabbix Server accordingly * pod anti affinity to spread multiple instances of cluster nodes if possible and replicaCount >1
When starting more than one Zabbix server containers with valid DB connection credentials at the same time on an empty database, all of them would try to connect and install the Zabbix DB schema to it at the same time, leading to errors and invalid queries. This commit implements an init container on the Zabbix Server pods that not only waits for the DB to be accessible IP-wise, but also for the schema to be available. This practically deactivates the schema auto-creation for the Zabbix Server pods. At the same time, a one-time Job has been added, which uses the Zabbix Server image with a slightly modified docker-entrypoint.sh in order to only wait for DB connection and then to install Zabbix DB schema, but not to prepare config nor start Zabbix server. With the combination of these two, we can now start off with a multinode Zabbix HA cluster in Kubernetes.
if the step to create the db schema takes too long, the wait-job that should prevent the Zabbix server container to start unless the database is "really" ready, is leaving off too early and Zabbix server crashes at start. This commit prevents this error by adding an additional step to only start the Zabbix server container if there are more than 0 users in the users table.
- certificate secret - frontend settings example (via env)
We shouldn avoid exposing any part of the application by accident/default
If we remove zabbix-proxy from here (even though it is optional today), it means that the official chart will be used as a dependency, correct? The Zabbix SIA team never contacted us to join forces and create a unique chart. For this reason, I am afraid of the official chart evolving technically to the point of being incompatible using both charts simultaneously for technical reasons, each developer think different technical solutions. That makes sense? What do you propose to avoid this kind of problem when using our chart?
|
Thanks @aeciopires for taking into account my work. I am happy being able to contribute to this project. I agree, version 3..0.0 sounds great, I will prepare that. Also, updating docs etc. is still open. Will do that within the next days. Also, I will streamline the secrets/auth part as proposed. I have read the other issue and comment by you there regarding Zabbix Proxy. Also, last time being at Zabbix HQ, I have talked to Alexei regarding this Helm chart, and he handed over the link to the integrations team, with the comment of that this might take some while. Meanwhile, I would suggest we keep zabbix proxy component in this Chart but change it to be "disabled" by default. I believe, this chart should "just deploy a minimal running Zabbix server instance" ootb without the need to change any value at all, and deploying a Proxy by default would kind of go against this idea. Nevertheless, I think it's a good idea to keep the zabbix proxy and continue maintaining it. Summarizing, I will do:
|
Great @sa-ChristianAnton! I would like to ask you for add your name and contact in Chart.yaml file as maintener. Welcome to the team! :-) Have a nice week. |
From now on, all settings related to database access credentials are being managed centrally for the entire chart: - either by using a "unified_db_secret" containing ALL aspects of the db access configuration, conforming to the format being used by the PGO Postgres operator, - explicit settings in the "db_access" section of the values.yaml. In case these are used, it is possible to use a secret just and only for the DB password, which will then be used by all components that apply. This commit removes all postgres related username/password and DB name settings on the levels of each of the components of the chart, and only keeps the central ones.
instead, there is now an example placed in the values.yaml that shows how to configure the ZBX_SERVER_NAME for the frontend by using additional env variables.
new version is 3.0.0
I think this should be all. One thing open I would like have added, but didn't yet: a "Route" object for Openshift. Probably going to add this within the next days, but if I don't manage to do so before you merge, I will add it later as a new PR. Thanks for giving me a place in the maintainers list. Cheers |
Feel free to make any desired changes and necessary testing. As this release will be a break change, there's no need to rush. I would like to lose a big favor:
Could you add in this PR, reference and document the usage in the Source: https://github.com/VictoriaMetrics/helm-charts/blob/master/Makefile PS.: I haven't used OpenShift yet. I won't be able to test. Let me know when you think it's good for code review and merge. |
Actually, regarding the automatic setting of postgresql host and port when postgresql.enabled=true: I agree this adds quite some extra lines because it is being used at many places. Do you think it would make sense to put this entire part inside _helpers.tpl and refer to just this from all these places? |
I did commit b386878 to implement one central template in _helpers.tpl that implements all the repeating logic for setting the DB access related ENV variables, with a switch of setting the right variable names, depending on whether it's a postgres container or a Zabbix component to set those at. The implementation is not yet 100% tested but works at a first pre-flight. I think this is a good way to get a lot of complexity out of the chart. |
but still, special env var names are needed for the "cleanup" cron job
Tested all scenarios successfully now:
|
From a technical point of view, single in the _helpers.tpl file and calling the others would be great to improve code reuse, but it would make the code less intuitive and more difficult for other people to maintain, especially if they are new to using helm. I think we can keep it the way it is. Do you agree? My comment had been more in the sense of using db_secrets instead of keeping lines referring to environment variables in the code. But I understand your idea of making it easier for the user. |
Great idea! We can test it out and see how it goes. I think the way you implemented it is very good. I also think it's worth adding a comment about this in README.md or CONTRIBUTING.md. What do you think? |
Absolutely. I will figure out some nice text for the README. |
with this commit it is now possible to add additional (sidecar) containers and volumes to each of the components if necessary. For example, this might be needed if an additional utility would have to run alongside Zabbix Server to pick up "real time export" data and send them to a message bus or a logging sidecar container that needs to catch the logs from the main container into an emptyDir volume and ship them to a central logging system
Thank you for the amazing job you did on this PR. I read all recent changes. I think we're very close to merging. I made one more simple comment and I will start the tests in the local environment. |
with this change, it is now possible to supply arbitrary runtime parameters to the postgresql database. The "max_connections" setting that was formerly a setting of its own inside values.yaml has been moved to use this new structure
This pull request adds quite some fundamental improvements and changes to the chart, most related to Zabbix 6 features and best practices.
This pull request implements #38.
Please let me know, in case this PR is being accepted, which version we should raise the chart to.
I would also like to suggest to implement the proposed cleanup of the user/password settings as described in my other PR:
I could do this quite quickly, but didn't want to do so yet without your agreeing, because it would be a "breaking change".
Additionally, I would also like to remove the zabbixproxy part of the chart, as this is already now covered by the official chart provided by Zabbix. What do you think?
Regards
Christian