Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Netdb debug tool #35677

Merged
merged 3 commits into from
Dec 7, 2017
Merged

Netdb debug tool #35677

merged 3 commits into from
Dec 7, 2017

Conversation

fcrisciani
Copy link
Contributor

@fcrisciani fcrisciani commented Dec 1, 2017

- What I did
New daemon config for networking diagnose

-How I tested it
Added a unit test in the reload_test section

- Description for the changelog

Add a new daemon config to enable the diagnose server for the networking layer

The libnetwork vendoring also bring:
Fix for #34515
Fix for watchMiss leak and ipvs deadlock

Netlink revendoring for netlink timeout: vishvananda/netlink@bd6d5de...b2de5d1
Libnetwork revendoring: moby/libnetwork@64ae588...9bca9a4

- A picture of a cute animal (not mandatory but encouraged)
cute-animals-cover

vendor.conf Outdated
@@ -30,7 +30,7 @@ github.com/moby/buildkit aaff9d591ef128560018433fe61beb802e149de8
github.com/tonistiigi/fsutil dea3a0da73aee887fc02142d995be764106ac5e2

#get libnetwork packages
github.com/docker/libnetwork 72fd7e5495eba86e28012e39b5ed63ef9ca9a97b
github.com/docker/libnetwork fd95ac779f2b9fc3cbd2c256a22887e26f479653 https://github.com/fcrisciani/libnetwork
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

waiting for libnetwork side to be in

@fcrisciani fcrisciani force-pushed the netdb-debug-tool branch 2 times, most recently from e1fa350 to 677789d Compare December 2, 2017 00:21
@thaJeztah
Copy link
Member

libnetwork side; moby/libnetwork#2027

@fcrisciani fcrisciani force-pushed the netdb-debug-tool branch 7 times, most recently from d3feb7e to 6a64fef Compare December 5, 2017 18:28
@fcrisciani
Copy link
Contributor Author

Windows CI failed:

19:16:19 ----------------------------------------------------------------------
19:16:19 PANIC: docker_api_logs_test.go:152: DockerSuite.TestLogsAPIUntil
19:16:19 
19:16:19 ... Panic: runtime error: index out of range (PC=0x45AC01)
19:16:19 
19:16:19 d:/CI/CI-d8f0b6be0/go/src/runtime/asm_amd64.s:509
19:16:19   in call32
19:16:19 d:/CI/CI-d8f0b6be0/go/src/runtime/panic.go:491
19:16:19   in gopanic
19:16:19 d:/CI/CI-d8f0b6be0/go/src/runtime/panic.go:28
19:16:19   in panicindex
19:16:19 docker_api_logs_test.go:175
19:16:19   in DockerSuite.TestLogsAPIUntil
19:16:19 d:/CI/CI-d8f0b6be0/go/src/runtime/asm_amd64.s:509
19:16:19   in call32
19:16:19 d:/CI/CI-d8f0b6be0/go/src/reflect/value.go:434
19:16:19   in Value.call
19:16:19 d:/CI/CI-d8f0b6be0/go/src/reflect/value.go:302
19:16:19   in Value.Call
19:16:19 c:/gopath/src/github.com/docker/docker/vendor/github.com/go-check/check/check.go:816
19:16:19   in suiteRunner.forkTest.func1
19:16:19 c:/gopath/src/github.com/docker/docker/vendor/github.com/go-check/check/check.go:672
19:16:19   in suiteRunner.forkCall.func1
19:16:19 d:/CI/CI-d8f0b6be0/go/src/runtime/asm_amd64.s:2337
19:16:19   in goexit
19:16:23 
19:16:23 ----------------------------------------------------------------------

@fcrisciani fcrisciani force-pushed the netdb-debug-tool branch 2 times, most recently from afda59f to d5c084a Compare December 6, 2017 01:46
Copy link
Contributor

@mavenugo mavenugo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments, but feel free to discuss

daemon/reload.go Outdated
}
// Enable the network diagnose if the flag is set with a valid port withing the range
if conf.IsValueSet("network-diagnose") && conf.NetworkDiagnose > 0 && conf.NetworkDiagnose < 65536 {
logrus.Infof("Calling the diagnose start with %d", conf.NetworkDiagnose)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if we should make this a warning; having this option enabled without knowing it is enabled may be dangerous, correct?

@@ -59,6 +59,7 @@ func installCommonConfigFlags(conf *config.Config, flags *pflag.FlagSet) {
flags.IntVar(&maxConcurrentDownloads, "max-concurrent-downloads", config.DefaultMaxConcurrentDownloads, "Set the max concurrent downloads for each pull")
flags.IntVar(&maxConcurrentUploads, "max-concurrent-uploads", config.DefaultMaxConcurrentUploads, "Set the max concurrent uploads for each push")
flags.IntVar(&conf.ShutdownTimeout, "shutdown-timeout", defaultShutdownTimeout, "Set the default shutdown timeout")
flags.IntVar(&conf.NetworkDiagnose, "network-diagnose", -1, "TCP port number of the network diagnose server")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make the default 0 or is that a valid port?

Also, per our discussion; I think we wanted to make this a config-file only feature (i.e., only support setting this on daemon.json, but not through a flag. Not sure if a flag is required to make the config work (I think we can do without), but if it's required, we can hide the flag perhaps?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also s/diagnose server/diagnostics server/ ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thaJeztah 0 means pick an available port automatically for me

let me add some bikeshed too :D

could we maybe use port in the flag name ? such as "metrics-addr" instead of just "metrics" ? When I look at the current flag it looks like a bool to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like network-diagnostic-port ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My reason for 0 was that it's the default value for an integer; while we do -1 for other options, most we use it if 0 itself is a valid value.

Agree on adding -port (was thinking about that, but thought it may be too verbose)

Flavio Crisciani added 3 commits December 6, 2017 13:19
Add a new configuration option to allow the enabling
of the networkDB debug. The option is only parsed using the
reload event. This will protect the daemon on start or restart
if the option is left behind in the config file

Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
Signed-off-by: Flavio Crisciani <flavio.crisciani@docker.com>
Copy link
Contributor

@vieux vieux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thaJeztah
Copy link
Member

Experimental failure is not related;

22:20:05 ----------------------------------------------------------------------
22:20:05 FAIL: check_test.go:366: DockerSwarmSuite.TearDownTest
22:20:05 
22:20:05 check_test.go:371:
22:20:05     d.Stop(c)
22:20:05 daemon/daemon.go:395:
22:20:05     t.Fatalf("Error while stopping the daemon %s : %v", d.id, err)
22:20:05 ... Error: Error while stopping the daemon dcc3900ad90b2 : exit status 130
22:20:05 
22:20:05 
22:20:05 ----------------------------------------------------------------------
22:20:05 PANIC: docker_api_swarm_service_test.go:201: DockerSwarmSuite.TestAPISwarmServicesUpdateStartFirst
22:20:05 
22:20:05 [dcc3900ad90b2] waiting for daemon to start
22:20:05 [dcc3900ad90b2] daemon started
22:20:05 
22:20:05 [dcc3900ad90b2] daemon started
22:20:05 Attempt #2: daemon is still running with pid 10039
22:20:05 Attempt #3: daemon is still running with pid 10039
22:20:05 Attempt #4: daemon is still running with pid 10039
22:20:05 [dcc3900ad90b2] exiting daemon
22:20:05 ... Panic: Fixture has panicked (see related PANIC)
22:20:05 
22:20:05 ----------------------------------------------------------------------

@thaJeztah
Copy link
Member

See #33805, #33041 for the failing tests

@thaJeztah
Copy link
Member

Merging, after discussion with @vieux since the failures are not related

@thaJeztah
Copy link
Member

This also includes moby/libnetwork#1976, which fixes #35261

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants