AppCreate diagnostic #16658

sosiouxme · 2017-10-03T14:56:02Z

Implements https://trello.com/c/Zv4hVlyQ/130-diagnostic-to-recreate-app-create-loop-script as a diagnostic.

https://trello.com/c/Zv4hVlyQ/27-3-continue-appcreate-diagnostic-work
https://trello.com/c/aNWlMtMk/61-demo-merge-appcreate-diagnostic
https://trello.com/c/H0jsgQwu/63-3-complete-appcreate-diagnostic-functionality

Status:

Create and cleanup project
Deploy and cleanup app
Wait for app to start
Test ability to connect to app via service
Test that app responds correctly
Test ability to connect via route
Write stats/results to file as json

Not yet addressed in this PR (depending on how reviews progress vs development):

Run a build to completion
Test ability to attach storage
Gather and write useful information (logs, status) on failure

Builds on top of #17773 for handling parameters to the diagnostic as well as #17857 which is a refactor on top of that.

sosiouxme · 2018-01-08T12:26:31Z

@openshift/sig-master I would like to have online ops start trying this out and getting feedback on actual usage with 3.9; for that to happen, I will need some reviews this week.

liggitt · 2018-01-08T18:08:39Z

pkg/oc/admin/diagnostics/config.go

+		switch index {
+		case 0:
+			errmsg = fmt.Sprintf("--%s specified that client config should be at %s\n", confFlagName, path)
+		case len(paths) - 1: // config in ~/.kube


this is really fragile...

liggitt · 2018-01-08T18:09:21Z

pkg/oc/admin/diagnostics/config.go

+		errmsg := ""
+		switch index {
+		case 0:
+			errmsg = fmt.Sprintf("--%s specified that client config should be at %s\n", confFlagName, path)


if no explicit config file is passed in, does this check even make sense? won't confFlagValue be ""?

liggitt · 2018-01-08T18:09:43Z

pkg/oc/admin/diagnostics/config.go

+		case len(paths) - 1: // config in ~/.kube
+		// no error message indicated if it is not there... user didn't say it would be
+		default: // can be multiple paths from the env var in theory; all cases should go here
+			if len(os.Getenv(config.OpenShiftConfigPathEnvVar)) != 0 {


trying to craft specific messages for specific indices in the loading order seems weird

I agree. This is very old code... I couldn't think of anything cleaner at the time; perhaps I can do better now.

liggitt · 2018-01-08T18:14:13Z

pkg/oc/admin/diagnostics/diagnostics/cluster/app_create/main.go

+	signal.Notify(sig, os.Interrupt, syscall.SIGTERM)
+	go func() {
+		<-sig
+		d.out.Warn("DCluAC001", nil, "Received interrupt; aborting diagnostic")


does this actually abort the other gofunc?

No, it doesn't. There is no way to abort a goroutine - it's collaborative concurrency, the goroutine has to want to stop. The main one keeps running after an interrupt, we're just not paying attention to it any longer. The only thing I could think of is to set up another channel and check it at various points to see if an interrupt occurred, but that seemed even messier. Do you have a different suggestion?

This is not new at all BTW, just code moved around.

missed it was a move

Most of the second commit is stuff moving around

liggitt · 2018-01-08T18:15:56Z

pkg/oc/admin/diagnostics/diagnostics/cluster/app_create/main.go

+
+	<-done // wait until either finishes
+	signal.Stop(sig)
+	d.logResult()


this can run in parallel with the assignment on line 282 or 268-269 if interrupt is received, and crash with a data race error

If you hit interrupt, how concerned are you about race conditions crashing diagnostics? I'm not sure it's even possible here - yes, data could conceivably be written into the object while the result is being logged, but would you get anything worse than bad output? - however it doesn't seem like an important edge case. But I'd be happy to use a better pattern for handling interrupts if one is known. I think the rest of the product... probably just exits?

I wasn't sure if an individual diagnostic could get interrupted and the overall process was expected to keep going

yes, that's the idea... it moves on to run the next diagnostic if any. but at that point nothing is going to use the result from the previous diagnostic. Part of the reason you want it to keep going is to give it a chance to clean up the resources it created, so there's an actual benefit...

liggitt · 2018-01-12T20:57:37Z

pkg/oc/admin/project/new_project.go

@@ -105,6 +108,9 @@ func (o *NewProjectOptions) complete(f *clientcmd.Factory, args []string) error
 }

 func (o *NewProjectOptions) Run(useNodeSelector bool) error {
+	if o.Output == nil {


either move this to complete(), or compute a local var defaulting to os.Stdout... don't generally want to mutate options in the Run() method

👍 that makes sense. Hopefully I can rely on complete() being called.

Of course, it turns out there are a bunch of tests that simply construct the options directly and don't run complete() on them (indeed, they can't since it's private). So I can either change every test, or use a local var with default like you said.

sosiouxme · 2018-01-26T21:40:14Z

/retest

soltysh

/lgtm
/approve

soltysh · 2018-01-29T11:28:18Z

/retest

sosiouxme · 2018-01-29T15:59:53Z

looks like about 50 things went wrong...
/retest

openshift-bot · 2018-01-29T20:52:41Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2018-01-30T02:52:42Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2018-01-30T08:52:42Z

/retest

Please review the full test history for this PR and help us cut down flakes.

soltysh · 2018-01-30T11:09:54Z

More strange errors...
/retest

deads2k · 2018-01-30T13:46:32Z

un lgtm-ing to calm down the retest bot. Those test integration failures are real and caused by this pull.

sosiouxme · 2018-02-05T14:18:40Z

/test origin-verify
(results missing?)
Integration test failures seem related to project creation which I touched here... will work to fix.

sosiouxme · 2018-02-08T03:31:27Z

looks like we're back to normal flakes.
/refresh
/retest

sosiouxme · 2018-02-08T15:27:12Z

https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/16658/test_pull_request_origin_end_to_end/9231/ was #18522 (already resolved)
verify has already passed, but for some reason the bot is still reporting the old failure.
🤷‍♂️
/retest

sosiouxme · 2018-02-12T19:41:25Z

updated and rebased last week, @soltysh can i get a re-lgtm now that the merge window is reopened?

the bit that I needed to change was in new-project... using a local variable to default the output writer because tests didn't set it or complete it

sosiouxme · 2018-02-14T14:08:27Z

https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/16658/test_pull_request_origin_unit/9523/ is flake #18497
/test unit

sosiouxme · 2018-02-14T16:46:40Z

ready for re-review

soltysh

/lgtm

openshift-ci-robot · 2018-02-16T22:03:15Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: soltysh, sosiouxme

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

~~contrib/completions/OWNERS~~ [soltysh]
~~docs/man/OWNERS~~ [soltysh]
~~pkg/oc/OWNERS~~ [soltysh]

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

openshift-merge-robot · 2018-02-16T23:17:27Z

/test all [submit-queue is verifying that this PR is safe to merge]

openshift-ci-robot · 2018-02-17T00:25:48Z

@sosiouxme: The following tests failed, say /retest to rerun them all:

Test name	Commit	Details	Rerun command
ci/openshift-jenkins/origin/verify	`ae25bd4`	link	`/test origin-verify`
ci/openshift-jenkins/gcp	`6c78e37`	link	`/test gcp`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-merge-robot · 2018-02-17T01:12:14Z

Automatic merge from submit-queue (batch tested with PRs 16658, 18643).

openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Oct 3, 2017

openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 3, 2017

openshift-merge-robot assigned pravisankar and liggitt Oct 3, 2017

sosiouxme force-pushed the 20170928-app-loop-diagnostic branch from 75397b5 to 969d4e8 Compare October 3, 2017 15:28

openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 3, 2017

sosiouxme force-pushed the 20170928-app-loop-diagnostic branch from ae25bd4 to 01994df Compare October 6, 2017 13:52

openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 22, 2017

sosiouxme force-pushed the 20170928-app-loop-diagnostic branch from 01994df to 353f7ec Compare December 20, 2017 20:42

openshift deleted a comment from openshift-merge-robot Dec 20, 2017

openshift-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Dec 20, 2017

openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 20, 2017

sosiouxme force-pushed the 20170928-app-loop-diagnostic branch 4 times, most recently from 14ade1a to c3c7591 Compare January 8, 2018 03:07

sosiouxme changed the title ~~[WIP] app-create loop diagnostic~~ AppCreate diagnostic Jan 8, 2018

openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 8, 2018

liggitt reviewed Jan 8, 2018

View reviewed changes

sosiouxme force-pushed the 20170928-app-loop-diagnostic branch 2 times, most recently from 6d82a45 to a2d934d Compare January 10, 2018 03:28

liggitt reviewed Jan 12, 2018

View reviewed changes

liggitt assigned juanvallejo Jan 12, 2018

sosiouxme force-pushed the 20170928-app-loop-diagnostic branch from 672f41c to d115033 Compare January 24, 2018 21:28

openshift-ci-robot assigned soltysh Jan 29, 2018

soltysh approved these changes Jan 29, 2018

View reviewed changes

openshift-ci-robot added lgtm Indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jan 29, 2018

deads2k removed the lgtm Indicates that a PR is ready to be merged. label Jan 30, 2018

sosiouxme force-pushed the 20170928-app-loop-diagnostic branch from d115033 to 6a22b82 Compare February 8, 2018 01:38

diagnostics: introduce AppCreate

6c78e37

sosiouxme force-pushed the 20170928-app-loop-diagnostic branch from 6a22b82 to 6c78e37 Compare February 13, 2018 21:47

soltysh approved these changes Feb 16, 2018

View reviewed changes

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 16, 2018

openshift-merge-robot merged commit b26e530 into openshift:master Feb 17, 2018

AppCreate diagnostic #16658

AppCreate diagnostic #16658

Conversation

sosiouxme commented Oct 3, 2017 • edited Loading

sosiouxme commented Jan 8, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liggitt Jan 8, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sosiouxme commented Jan 26, 2018

soltysh left a comment

Choose a reason for hiding this comment

soltysh commented Jan 29, 2018

sosiouxme commented Jan 29, 2018

openshift-bot commented Jan 29, 2018

openshift-bot commented Jan 30, 2018

openshift-bot commented Jan 30, 2018

soltysh commented Jan 30, 2018

deads2k commented Jan 30, 2018

sosiouxme commented Feb 5, 2018 • edited Loading

sosiouxme commented Feb 8, 2018

sosiouxme commented Feb 8, 2018

sosiouxme commented Feb 12, 2018 • edited Loading

sosiouxme commented Feb 14, 2018

sosiouxme commented Feb 14, 2018

soltysh left a comment

Choose a reason for hiding this comment

openshift-ci-robot commented Feb 16, 2018

openshift-merge-robot commented Feb 16, 2018

openshift-ci-robot commented Feb 17, 2018 • edited Loading

openshift-merge-robot commented Feb 17, 2018

sosiouxme commented Oct 3, 2017 •

edited

Loading

liggitt Jan 8, 2018 •

edited

Loading

sosiouxme commented Feb 5, 2018 •

edited

Loading

sosiouxme commented Feb 12, 2018 •

edited

Loading

openshift-ci-robot commented Feb 17, 2018 •

edited

Loading