kube-state-metrics breaking release aka 2.0 #569

brancz · 2018-10-24T15:45:18Z

We have accumulated a number of deprecated metrics and odd behaviors that I believe may justify a 2.0 release. I'd like to take this issue to discuss whether people think this is a good idea and collect what we would potentially like to break should we do a breaking release.

Off the top of my head breaking changes I would like to do:

I would see a breaking release at least 3 months out there, as I would like to validate the performance optimizations independently first. Further thoughts?

@andyxning @zouyee @mxinden

The text was updated successfully, but these errors were encountered:

andyxning · 2018-10-25T06:31:52Z

Agree with do a breaking release to clean up the kube-state-metrics. Actually performance optimizations is a feature. I am fine with a 2.1 release to add that feature.

rename black-/whitelist to allow/deny-list

I am fine with this change. But it seems that black-/whitelist names are also ok. What is the motivation for moving to allow/deny-list. And this apparently is an additional breaking change.

use same ports in all cases (currently the flag defaults to 80/81, but the dockerfile specifies 8080 and 8081)

Fine with this. I also think if we do this, we need to also deprecated the --telemetry-port and --telemetry-host. But what we do for adding telemetry port and host is mainly to split the metrics for kube-state-metrics itself and the metrics for Kubernetes. Merging them into one port is seems not an optimal option.

brancz · 2018-10-25T07:00:22Z

The performance improvements I would prefer to release in a 1.x release, and once we’re comfortable everything works correctly, we go ahead and do these breaking changes.

Renaming black/whitelist is for ethical reasons. I know it has been a used term but it doesn’t even describe well what it does, but it seems it’s not contentious :) .

I think keeping the telemetry host and port is fine, just the default port should be something other than 81.

andyxning · 2018-10-26T05:15:34Z

Renaming black/whitelist is for ethical reasons. I know it has been a used term but it doesn’t even describe well what it does, but it seems it’s not contentious :) .

Seems reasonable to me. allow/deny is more accurate.

I think keeping the telemetry host and port is fine, just the default port should be something other than 81.

Any suggestion about the candidates? 82 or some value else? IMHO, 81 is good enough to do this job. :)

andyxning · 2018-10-26T05:15:50Z

The performance improvements I would prefer to release in a 1.x release, and once we’re comfortable everything works correctly, we go ahead and do these breaking changes.

Agreed.

brancz · 2018-10-26T05:59:10Z

Sorry I should have been more clear about why I think we should change ports. Anything lower than 1024 requires root on Linux (or at least have the CAP_NET_ADMIN capability). That’s why default ports of kube-state-metrics should be higher than that. And beyond that, whatever we use should be consistent across the pure binary and the container.

andyxning · 2018-10-28T02:31:40Z

Understood clearly. @brancz

fejta-bot · 2019-01-26T02:46:39Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

brancz · 2019-01-26T09:06:28Z

/remove-lifecycle stale

tariq1890 · 2019-02-20T14:29:34Z

I would also like to propose that we update the k8s dependencies to 1.13 at the very least. Given that it has the potential of introducing backwards-incompatible/breaking changes, it would be a good candidate for the 2.0 release.

brancz · 2019-02-21T10:22:53Z

Do we have known incompatibilities? So far we only have the policy to support the past 4 versions of Kubernetes, as long as we stick with that it's not so much a "breaking" release.

tariq1890 · 2019-02-21T15:17:23Z

Nothing off the top my head. The client-go version support matrix is unfortunately not very clear in what they support/do not support. I can experiment with the latest k8s client-go dep and see how that turns out with the previous releases.

By past 4 versions of Kubernetes - Do you mean that with respect to the latest Kubernetes version out there or the latest Kubernetes version depped into kube-state-metrics?

brancz · 2019-02-21T17:07:07Z

By past 4 versions of Kubernetes - Do you mean that with respect to the latest Kubernetes version out there or the latest Kubernetes version depped into kube-state-metrics?

The later. Forward compatibility should work, but we can't guarantee it due to client-go.

zuzzas · 2019-02-25T13:43:45Z

@brancz
You've raised a motion about removing kube_[object]_owner metrics with <none> labels for Kubernetes objects that lack any owners.

I'd like to object!

We've got a use case for grouping Pods and other objects that lack any controller. This logic only works on ingestion with recording rules if we have a metric that clearly specifies the lack of owners on an object. We can't solve this riddle with absent(), since it requires clearly specifying an instant vector with the required labels.

Please, do not think ill of me after looking at a screenshot of this abomination:

dohnto · 2019-02-25T21:34:59Z

I might sound silly, but maybe we can unify the single/plural form of options --namespace/--collectors to be more uniform?

brancz · 2019-02-26T10:56:20Z

@zuzzas

Please, do not think ill of me after looking at a screenshot of this abomination

Haha, if anything I have more respect! That's a pretty nice recording rule, would you mind contributing that to the kubernetes-mixin?

I think you have a fair point, let's keep it. In theory this could be done with a join as well, but feels too difficult to get right. Thanks for bringing this up!

@dohnto :

I might sound silly, but maybe we can unify the single/plural form of options --namespace/--collectors to be more uniform?

Not at all silly. The --namespace flag internally is already called "namespaces", it is only singular for backward compatibility. I'll add making it plural to the list. Thanks for the suggestion, keep them coming! 🙂 .

…empty" This reverts commit 15b93c4. Based on Brancz's decision here: kubernetes#569 (comment) Signed-off-by: Andrey Klimentyev <andrey.klimentyev@flant.com>

brancz · 2019-04-04T09:04:34Z

@sylr brought to my attention, that for a number of pod metrics, there is the node label. For consistency, these labels should not be on those metrics directly, but instead be joined onto the metrics at query time. Example:

kube-state-metrics/internal/collector/pod.go

Line 511 in aed9485

    
           LabelValues: []string{c.Name, p.Spec.NodeName, sanitizeLabelName(string(resourceName)), string(constant.UnitCore)},

brancz · 2019-05-20T10:15:41Z

Added "consider renaming"

andyxning · 2019-05-20T10:24:06Z

consider renaming kube-state-metrics to kubernetes-exporter

LGTM.

brancz · 2019-06-05T12:04:02Z

Added

kube_secret_metadata_resource_version, kube_configmap_metadata_resource_version and kube_ingress_metadata_resource_version expose the resource version as a string in its set of labels. This value can change often and would therefore create huge cardinality. This should be a number or not existing at all.

Thanks to @xieyanker for noticing here: #777 (comment)

lilic · 2019-06-20T14:11:57Z

I would add to the list, renaming all the leftover user-facing occurrences of collectors to resources, as we recently removed the collectors package. That would also mean renaming collector in options to resource. Overall the --resources=pods flag would become more self-descriptive.

brancz · 2019-06-20T14:24:04Z

Great suggestion! Added.

tariq1890 · 2019-06-22T06:14:10Z

Let's add the Sharding feature to the list as well.

brancz · 2019-06-22T09:34:06Z

Sharding isn’t breaking so I feel it can be added in a backward compatible way in 1.x or 2.x. It’s fairly close to being ready I would say though so I’d like to see it go into a 1.x release.

bboreham · 2019-06-30T10:51:21Z

Came here to +1 removal of high-cardinality kube_configmap_metadata_resource_version.
This one metric occupies 3% of all the data in our service.

I'm intrigued: what does anyone use this metric for, in its current form?

lilic · 2019-07-31T19:52:20Z

@bboreham you can also just blacklist those metrics for now until they are removed? --metric-blacklist="kube_configmap_metadata_resource_version" for example should work. :)

brancz · 2019-10-16T10:36:12Z

Doesn’t a new major release show that there are breaking changes? Of course we need to properly document these changes.

lilic · 2019-12-04T09:48:26Z

All coresponding issues from above were created under the v2 version 2

Google doc for v2 release that we discussed during kubecon. https://docs.google.com/document/d/1lCvbvOAVFai7ciP_heZrJ_QXLOEaFu8ZBmwVVeCwf54/edit?usp=sharing

brancz · 2019-12-04T09:53:43Z

One more thing that came up during kubecon: Before we do the v2 release we probably want to do another round of scalability tests. I believe Google volunteered to do this.

fejta-bot · 2020-04-21T11:46:20Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

olivierlemasle · 2020-04-21T12:05:54Z

/remove-lifecycle stale

lilic · 2020-04-22T11:35:04Z

@brancz anyone to ping regarding scalability tests? Or should we bring it up in the SIG call maybe? @tariq1890

brancz · 2020-04-22T17:28:33Z

SIG call sounds good

brancz · 2020-09-10T13:21:24Z

Looks like we've done everything on the list, let's get a pre-release started then! :)

QuentinBisson · 2020-09-18T13:12:23Z

Should the kube-state-metrics module path not become module k8s.io/kube-state-metrics/v2 in go.mod as well for v2?

brancz · 2020-09-21T08:16:50Z

You're right, we forgot about that. @omegas27 do you want to take care of that?

QuentinBisson · 2020-09-21T08:34:04Z

Sure 👍

fejta-bot · 2020-12-20T09:30:58Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2021-01-19T10:16:07Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

lilic · 2021-01-19T10:32:19Z

/remove-lifecycle rotten

We are almost there 🎉

fejta-bot · 2021-04-19T10:49:13Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

lilic · 2021-04-19T13:51:08Z

We finished and release is cut 🎉 Thank you all!!

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 26, 2019

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 26, 2019

brancz mentioned this issue Feb 25, 2019

Export Job's owner #681

Merged

zuzzas mentioned this issue Feb 27, 2019

Revert "Do not export Job's owner metrics if Job's OwnerReference is … #686

Merged

lilic added the v2 version 2 label Dec 4, 2019

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 21, 2020

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 21, 2020

brancz mentioned this issue Jul 8, 2020

Support for PersistentVolume cloud volumeID metric/label #704

Closed

brancz mentioned this issue Jul 24, 2020

Add observedGeneration metric for DaemonSets #1178

Merged

brancz mentioned this issue Aug 10, 2020

Avoid conflicts when mapping Kubernetes labels to Prometheus labels #1156

Merged

lilic mentioned this issue Sep 10, 2020

We need a new release for Kubernetes 1.17 #1220

Closed

QuentinBisson mentioned this issue Sep 21, 2020

Update go module path to k8s.io/kube-state-metrics/v2 #1238

Merged

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 20, 2020

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 19, 2021

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jan 19, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 19, 2021

lilic closed this as completed Apr 19, 2021

kube-state-metrics breaking release aka 2.0 #569

kube-state-metrics breaking release aka 2.0 #569

Comments

brancz commented Oct 24, 2018 • edited Loading

andyxning commented Oct 25, 2018

brancz commented Oct 25, 2018

andyxning commented Oct 26, 2018

andyxning commented Oct 26, 2018

brancz commented Oct 26, 2018

andyxning commented Oct 28, 2018

fejta-bot commented Jan 26, 2019

brancz commented Jan 26, 2019

tariq1890 commented Feb 20, 2019

brancz commented Feb 21, 2019

tariq1890 commented Feb 21, 2019

brancz commented Feb 21, 2019

zuzzas commented Feb 25, 2019 • edited Loading

dohnto commented Feb 25, 2019 • edited Loading

brancz commented Feb 26, 2019

brancz commented Apr 4, 2019

brancz commented May 20, 2019

andyxning commented May 20, 2019

brancz commented Jun 5, 2019

lilic commented Jun 20, 2019

brancz commented Jun 20, 2019

tariq1890 commented Jun 22, 2019

brancz commented Jun 22, 2019

bboreham commented Jun 30, 2019

lilic commented Jul 31, 2019

brancz commented Oct 16, 2019

lilic commented Dec 4, 2019 • edited Loading

brancz commented Dec 4, 2019

fejta-bot commented Apr 21, 2020

olivierlemasle commented Apr 21, 2020

lilic commented Apr 22, 2020 • edited Loading

brancz commented Apr 22, 2020

brancz commented Sep 10, 2020

QuentinBisson commented Sep 18, 2020

brancz commented Sep 21, 2020

QuentinBisson commented Sep 21, 2020

fejta-bot commented Dec 20, 2020

fejta-bot commented Jan 19, 2021

lilic commented Jan 19, 2021

fejta-bot commented Apr 19, 2021

lilic commented Apr 19, 2021

brancz commented Oct 24, 2018 •

edited

Loading

zuzzas commented Feb 25, 2019 •

edited

Loading

dohnto commented Feb 25, 2019 •

edited

Loading

lilic commented Dec 4, 2019 •

edited

Loading

lilic commented Apr 22, 2020 •

edited

Loading