Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to klogv2 #89961

Closed
13 tasks done
serathius opened this issue Apr 8, 2020 · 37 comments · Fixed by #90183
Closed
13 tasks done

Upgrade to klogv2 #89961

serathius opened this issue Apr 8, 2020 · 37 comments · Fixed by #90183
Labels
area/code-organization Issues or PRs related to kubernetes code organization kind/feature Categorizes issue or PR as related to a new feature. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation.
Milestone

Comments

@serathius
Copy link
Contributor

serathius commented Apr 8, 2020

As part of Structured Logging effort we will be upgrading klog to version 2. This issue will propose plan for upgrading all dependencies and be used to maintain progress.

Part of kubernetes/enhancements#1602

Plan:

/cc @dims @liggitt @smarterclayton @wojtek-t
/milestone v1.19

@serathius serathius added the kind/feature Categorizes issue or PR as related to a new feature. label Apr 8, 2020
@k8s-ci-robot k8s-ci-robot added this to the v1.19 milestone Apr 8, 2020
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Apr 8, 2020
@dims
Copy link
Member

dims commented Apr 8, 2020

@serathius please add me to utils repo as well.

@dims
Copy link
Member

dims commented Apr 8, 2020

/sig instrumentation
/area code-organization

@k8s-ci-robot k8s-ci-robot added sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. area/code-organization Issues or PRs related to kubernetes code organization and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Apr 8, 2020
@serathius
Copy link
Contributor Author

/cc @44past4

@serathius
Copy link
Contributor Author

/cc @yuzhiquan

@serathius
Copy link
Contributor Author

/cc @mm4tt

@zhijianli88
Copy link
Contributor

/cc @zhijianli88

@hase1128
Copy link
Contributor

hase1128 commented Apr 9, 2020

/cc

@tanjunchen
Copy link
Member

tanjunchen commented Apr 9, 2020

/cc
pick up sigs.k8s.io/apiserver-network-proxy
k8s.io/kube-openapi
github.com/GoogleCloudPlatform/k8s-cloud-provider

@fenggw-fnst
Copy link
Contributor

/cc

@dims
Copy link
Member

dims commented Apr 10, 2020

@serathius please mark klog release as DONE ( https://github.com/kubernetes/klog/releases/tag/v2.0.0 )

@serathius
Copy link
Contributor Author

@mm4tt We are ready to run scalability tests on #89955

@dims
Copy link
Member

dims commented Apr 10, 2020

@serathius please mark k8s.io/utils and k8s.io/klog as done.

@dims
Copy link
Member

dims commented Apr 10, 2020

@serathius filed kubernetes/gengo#173 for gengo

@dims
Copy link
Member

dims commented Apr 10, 2020

@serathius file kubernetes/repo-infra#185 for repo-infra

@dims
Copy link
Member

dims commented Apr 10, 2020

@tanjunchen can you please file the PRs that you picked up and throw us a link?

@tanjunchen
Copy link
Member

@dims i have open these prs. As seen above.

@serathius
Copy link
Contributor Author

Linked all PRs to issue description.

I was planning to find some time and pick one repo on weekend, but you beat me up to it. Thanks for all the help!

@dims
Copy link
Member

dims commented Apr 13, 2020

@serathius kubernetes/gengo#173 has merged.

@tanjunchen
Copy link
Member

tanjunchen commented Apr 14, 2020

kubernetes/kube-openapi#192 has been merged .
GoogleCloudPlatform/k8s-cloud-provider#37 has been merged . @serathius

@serathius
Copy link
Contributor Author

Talked with @mm4tt, he proposed to run 5k release blocking performance tests and do A/B performance analysis.
I will try to find someone from SIG-scalability to help but it might take time due Holidays.

@mm4tt
Copy link
Contributor

mm4tt commented Apr 15, 2020

SIG Scalability will run the tests by the end of the next week. Will have a more precise estimate today.

@dims dims mentioned this issue Apr 15, 2020
14 tasks
@mm4tt
Copy link
Contributor

mm4tt commented Apr 16, 2020

@tosi3k from SIG Scalability will run the tests

@tosi3k
Copy link
Member

tosi3k commented Apr 16, 2020

Follow-up: I will run the tests tomorrow.

@dims
Copy link
Member

dims commented Apr 16, 2020

@tosi3k @mm4tt @serathius Let's please use #90183 for the tests.

@dims
Copy link
Member

dims commented Apr 17, 2020

@tosi3k @mm4tt @serathius did the scalability CI jobs get started? i have updated #90183 and i haven't heard from y'all

@tosi3k
Copy link
Member

tosi3k commented Apr 17, 2020

@dims sorry for the late update. The 5k CI job based on commit 2f0e334 of #90183 is currently running and will take a few more hours. I will give you more update on Monday.

@tosi3k
Copy link
Member

tosi3k commented Apr 20, 2020

@dims the test succeeded but I'm concerned about the actual commit I run the test on top of. It was the freshest one when I left work on Friday but I saw one with description switch over k/k to use klog v2 being pushed afterwards, hence I believe the test I started on Friday didn't really make sense :(. Shall I rerun the tests on top of the most recent change (which is cd1edd0 I believe but please correct me if I'm wrong)?

@serathius
Copy link
Contributor Author

@tosi3k Thanks for running it.
Could you provide link to dashboard with results?

I think intention was to do A/B tests. So run tests twice, one before the update and one after update. @mm4tt can you confirm?

@mm4tt
Copy link
Contributor

mm4tt commented Apr 20, 2020

@serathius, you're right, but we don't need to run the test twice manually. We can use a daily 5k CI/CD run as a baseline.
That being said, before we spin up the dashboards, I'd like to get answer to @tosi3k's question, i.e. whether the manual run he did actually had klogv2 enabled.

@dims
Copy link
Member

dims commented Apr 20, 2020

@serathius my bad, i had not pushed everything needed before i added that note :(

@mm4tt
Copy link
Contributor

mm4tt commented Apr 20, 2020

No worries, @tosi3k can confirm, but I think he reserved a slot for re-running the tests tomorrow.

@dims, could you please double check and give us an explicit green light confirming that the PR is ready to be tested?

@dims
Copy link
Member

dims commented Apr 20, 2020

@mm4tt yes, what is in #90183 is good. specifically cd1edd0c0e06af7c4dd02838d55ccc84586c89fe is the last commit

For next time best way to cross check is ensuring that "k8s.io/klog" is removed from root go.mod/sum and "k8s.io/klog/v2" is added there instead.

-	k8s.io/klog v1.0.0
+	k8s.io/klog/v2 v2.0.0

@tosi3k
Copy link
Member

tosi3k commented Apr 20, 2020

Okay, I will run the test using cd1edd0 commit tomorrow and compare the results with our daily 5k CI/CD test run the day after tomorrow then.

@tosi3k
Copy link
Member

tosi3k commented Apr 22, 2020

I ran a test job with the experiment and the test passed. Looking at the dashboards I noticed a few changes:

  • Clusterloader
    LIST calls for pods in the namespace scope take more time on average than before change (1.63s vs. 3.59s),
  • kube-apiserver
    slightly higher variance of memory allocation for kube-apiserver (but the memory usage stays the same on average),

Overall, the results look good (at least from my novice perspective).

Dashboard with metrics before the change: link (source: grafana-prom).
Dashboard with metrics after the change: link (source: klogv2-experiment).

@serathius
Copy link
Contributor Author

@mm4tt does this look ok for you?

@mm4tt
Copy link
Contributor

mm4tt commented Apr 22, 2020

Thanks, @tosi3k!

The latency of the pod lists seems suspicious, but my intuition says it's a red herring, most likely coming from the different setup of the test runs. The CI/CD test uses configuration than the manual test. For manual test, we used our scalability prow instance, it's likely that the cl2 was run in different region than the cluster, which could affect the pod list (IIRC, cl2 is the main issuer of these calls).

So yes, to summarize it looks ok. We don't see any obvious scalability blockers for klogv2.

@serathius
Copy link
Contributor Author

Thanks @dims for all great work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/code-organization Issues or PRs related to kubernetes code organization kind/feature Categorizes issue or PR as related to a new feature. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants