-
Notifications
You must be signed in to change notification settings - Fork 549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide the ability to configure ruler/alertmanager through prometheus-operator k8s CRDs #2133
Comments
PrometheusRules
k8s CRDPrometheusRules
k8s CRD
Dumping an idea here, this problem could potentially be solved by introducing a Mimir Operator (a Mimir equivalent of the Prometheus Operator for Prometheus) which would discover Besides that, one problem is that |
We generally want to keep Mimir agnostic to whether it's running on kubernetes or bare metal, so I don't think we should add kubernetes specific features into Mimir itself. Your suggestion of a mimir-operator to read the CRDs and apply them via Mimir's API sounds like a good idea to me. |
any ETA on this one? Because whole mimir stack not usable unless we can apply gitops principles to it. |
It's not mentioned by the OP but alertmanagerconfigs CRD should be supported as well. I want to make use of community made rules and alerts provided by various helm charts and not have to maintain them myself. |
PrometheusRules
k8s CRD
@boniek83 you are right. I updated the issue title and contents to reflect that. |
I think the two (AM config and rules) are separate feature requests. @johannaratliff is working on adding rules support as part of #2609 |
@dimitarvdimitrov I'm unsure what you mean by "adding rules support". Do you mean
OR
I believe the scope of this issue is more focused on the latter. It's a request for a Prometheus Operator equivalent for Mimir rather than wanting Mimir to package its own rules config within a If you think it's better to differentiate the two, I will happily update this issue and open a separate one for the |
For me it's definitely the latter. Since promtetheusrule is an existing concept that is commonly used throughout many helm charts (it's basically a standard way to package and deliver prometheus rules) Mimir should not replace an existing one with its own. Same with alertmanagerconfigs. You can have an additional CRDS that have more functionality but you have to support these IMO. |
Sorry @boniek83 , my latest message was a bit unclear. When I said
I meant its own instance of a |
yeah, sorry I was conflating the two. I meant
|
Is there any work on making this a reality that can be tracked? AFAICT #2134 is just about deploying the mixins into the cluster via the Helm chart, but this will still require a Prometheus instance to do all the work, and seems largely unrelated to this issue of providing a mechanism to configure Mimir Ruler/AlertManager via CRDs? The lack of ability to configure these Mimir components sensibly in the cluster makes it prohibitive to actually use them, resulting in an awkward architecture where pretty much everything has to be pushed through Prometheus instances to get rules applied, and making the Mimir components redundant at best. |
As far as I'm aware, there is no work in progress on this yet. I recently scouted the set of PRs in search for that. I raised the issue during a Mimir community call in June and the team indicated that it was something they had talked about but that we may not see implemented before a "couple releases" due to the complicated nature of this feature request (which is totally understandable). It is also my understanding that, without strong Kubernetes integration, some components are unusable and dependence on Prometheus is inevitable. |
@pdf @rojas-diego @boniek83 @yevhen-harmonizehr I think there are a couple ways this could work, and I'm curious what you think. This is not meant to be a commitment that we will build this feature, but I do think we can take a step closer by pinning down a design. Also, I refer often to rules below, but I mean for this to apply equally to rules and alertmanager configuration. Option 1: An OperatorThis is the most common request I've seen: to build a mimir operator that can automatically configure one or more Mimir instances based on CRDs. This is a large undertaking and feels a little strange considering Mimir is meant to be a centralized, multi-tenant time series database. Feel free to disagree on this point, just my personal opinion. If instead the operator is only responsible for creating a configmap, then it feels like overkill to build a new operator for this IMO. The prometheus operator already includes all the code for this, but instead of leaving a ConfigMap, it also deploys prometheus servers with the ConfigMap mounted already. I would prefer to leave that code in the prometheus operator if we can - taking a quick look it's already splitting rules to avoid hitting the 1MB size limit, cleans up old resources, etc. Option 2: A K8s backend for Ruler / Alertmanager storageBasically, add a Other considerationsWith either approach it's clear from this issue that 100% API compatibility with the prometheus operator is a design goal, and there are two missing things as far as I can tell:
apiVersion: grafana.com/v1alpha1
kind: MimirTenant
metadata:
name: <name>
namespace: <namespace>
spec:
tenant: <tenant id>
rules:
matchLabels:
<label>: <value>
alertmanager:
matchLabels:
<label>: <value> The idea is that this would tell Mimir which The idea would be that you configure RBAC so that Mimir operators can create I'm curious what your thoughts are! |
When I was thinking about how I might solve this in the absense of an official solution, my thoughts ran along the lines that this might just end up being a small translation layer that splats out translated CRD content in mimirtool format and calls mimirtool to sync the rules/alerts into the cluster.
I'm (pleasantly) surprised to hear that you think this may be a smaller effort than an operator. Native integration as a backend sounds like a rather nice approach, and I think a single data source is a fair requirement - certainly all of our usage would preferably be deployed as k8s resources. I think most people request/suggest an operator simply because that's the most common method for solving these sorts of tasks, not necessarily because it's optimal.
1.a. The Prometheus operator uses a similar mechanism to restrict the search space for a particular Prometheus instance. Probably want the option to restrict lookups by namespace in addition to labels here though. 2.. I'm not familiar with how backends are implemented, but if Option 2 above is selected, authentication considerations largely disappear since the backend would just communicate with the k8s API and update configuration in-process, right? For the operator option, I think secretRefs in the MimirTenant CRD would likely be the way to go. For our use-case, we're not super-interested in multi-tenant. That said, restricting the creation of MimirTenant resources via RBAC seems fine, but I don't think there's any way to stop the creation of a PrometheusRule with labels that would attach it to an arbitrary tenant without an admission controller, which is why I'd suggest the option of restricting lookups to particular namespaces for a tenant. I suspect namespace segregation is likely to be adequate for the majority of multi-tenant deployments, but that's hard for me to judge. If users want more fine-grained policies, they'll need to deploy some sort of policy agent. |
Thanks @pdf! Another idea that came up internally is this: allow the Grafana Agent to read those CRDs and configure Mimir via the existing Mimir Ruler API. |
It's hackathon week here at Grafana Labs, so I'm looking into this a bit more. Specifically, I'm exploring the possibility of making the Grafana Agent capable of synchronizing the CRDs with Mimir's Ruler. One benefit of this is it could work for users who don't control the k8s cluster where Mimir is running, for example Grafana Cloud customers who want to configure their cloud ruler via CRDs on their "home" cluster. Here’s an interesting design issue: Mimir really has 3 levels of organization for rules: tenant, rule namespace, rule group. The prometheus operator has 3 as well: k8s namespace, PrometheusRule crd, rule group. I don't think it's appropriate to just map these 1:1 (other than rule group). So I'm trying to find a bit more flexibility to support more use cases. The plan so far has been:
The issue with a static rule namespace:
The issue with mapping rule namespace to
Looking into the prometheus operator, it does a few things:
One option is to use the same mapping as the prometheus operator, and create Mimir rule namespaces named |
If supporting Grafana Cloud is a goal (and it probably should be, but I didn't want to muddy this issue with that possibility in earlier discussions) then the Agent is probably the right place for this. If the uid is not included in the mapping, are we certain to do the right thing if the As far as distinguishing between operator-created namespaces and not, could some sort of hash or static value be appended/prepended to the ns perhaps? |
Hey everyone, here's an update on my hackathon project: I managed to get a lot of this working for I've submitted a PR for the agent here: grafana/agent#2604 I've spoken with @rfratto on the agent squad and we'll both be out for the holidays for a few weeks, but we do plan to continue work on that issue once we're available again in January. Follow that PR for updates 😄 |
Hey everyone, ~final update here: we've added support to the Grafana Agent for configuring Mimir's ruler via
|
@Logiraptor can you share a full example of CRD of grafana agent to be used by grafana-agent-operator with this component enabled? |
@Logiraptor Is this added to the mimir helm chart? I checked the new helm chart cannot see any option of mimir.rules in grafana agent. |
I'm also unable to set/enable this new feature via the Mimir helm chart 😞. |
@zakariais @Rohlik this is implemented as a component for Grafana Agent (flow mode), configure the |
@Logiraptor Given that Grafana Agent has already been deprecated and given an EOL date for next year, plus the fact its replacement is simply an OTel Collector distribution, which does not and is unlikely to support this feature, I think it's worth reopening this? |
@MXfive that doesn't appear to be accurate - see the Alloy (the replacement you reference) components, where the following component appears to include the same functionality as the Grafana Agent component that handled this previously: https://grafana.com/docs/alloy/latest/reference/components/mimir.rules.kubernetes/ |
Oh nice, I wasn't able to find yesterday. Thanks! |
Is your feature request related to a problem? Please describe.
It's not uncommon for Prometheus configuration to be described as Custom Resource Definitions (CRDs) within Kubernetes. The Prometheus Operator defines multiple CRDs such as
PrometheusRule
andAlertmanagerConfig
to configure the recording/alerting rules and the alertmanager configuration within Prometheus.Many people rely on those CRDs to configure their Prometheus instances in Kubernetes environments.
At the moment, I'm unable to configure Mimir using Prometheus Operator CRDs. I have to resort to configuring components manually using the HTTP API or
mimirtool
.Describe the solution you'd like
I would like Mimir to automatically discover Prometheus Operator CRDs (
PrometheusRule
andAlertmanagerConfig
) within my cluster and to apply the configuration they hold to the different components. Potentially through a Mimir Operator for Kubernetes.I would like Grafana Mimir to distribute its own
PrometheusRule
resources and that it be included within the helm chart as an option (much like the mimir-distributed chart has a serviceMonitor.enable option forServiceMonitor
CRDs) so that upon installing the helm release, you automatically have Mimir rules and alerts setup without any additional configuration. This would also mean that the rules could be upgraded automatically without users having to manually copy the rules.yml and alerts.yml config file Mimir provides.Describe alternatives you've considered
I've considered configuring recording/alerting rules manually using
mimirtool
but this approach fails to satisfy our requirements fully.I've considered configuring recording/alerting rules using the local storage approach as described here but this means I would have to copy the configuration stored inside each
PrometheusRule
resource and put it in a k8s config map potentially missing out on future upgrades or changes to these configs (given they are sometimes created by third party helm charts).Additional context
This issue is related to this slack thread.
The text was updated successfully, but these errors were encountered: