Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Technical Report on Nomad 0.4.1-dev compatibility to Windows TP5 environment #1488

Closed
sitano opened this issue Jul 29, 2016 · 26 comments
Closed

Comments

@sitano
Copy link

sitano commented Jul 29, 2016

This is a report on investigation of nomad 0.4.1-dev to Windows TP5 docker compatibility in https://github.com/StefanScherer/docker-windows-box environment. Case shows there are critical incompatibilities to windows/docker platform, which are required to be fixed in order nomad to be able to run jobs.

Some of them were already fixed by @mwieczorek .

Issues

[+] Nomad does not support Windows volumes bind path: Volume binds for windows containers #1321 patch (51149e4) in 0.4.1-dev (by @mwieczorek).
[-] Windows docker does not support Nomad's default networking type: bridging. It supports only NAT, and overlay. Port mapping - nat only according to docs.
[?] Bridging is bad practice anyway. It should not be default.
[-] Windows Docker NAT networking does not support local ip address binding in port binding(#1475) (-p ip:port:port form)
[-] Windows Docker does not have SysLog logging driver plugin. Default MUST be support which is json-file. (#688)
[-] Windows 2016 TP 5 has a bug in virtual nat switch port binding, so binded ports are not accessable from host machine
[-] UDP ports binding do not work due: Failed to create endpoint on network nat: HNS failed with error MicrosoftDocs/Virtualization-Documentation#273 (moby/moby#22084)
[-] Calling GET /containers/4f2e29f66db28b629009c21dda19317bab7a28e9a916983907add24133862a01/stats?stream=true: returned error: Windows does not support stats
[-] Nomad panics after container start with Null Ptr Deref in SyncServices

panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xc0000005 code=0x0 addr=0x10 pc=0xa58aae]

goroutine 10 [running]:
panic(0x105e480, 0xc082002080)
     C:/Go/src/runtime/panic.go:481 +0x3f4
github.com/hashicorp/nomad/client/driver/executor.(*UniversalExecutor).SyncServices(0xc082198400, 0xc082010ba0, 0x0, 0x0)
     C:/GoCode/src/github.com/hashicorp/nomad/client/driver/executor/executor.go:511 +0x17e
github.com/hashicorp/nomad/client/driver.(*ExecutorRPCServer).SyncServices(0xc082151720, 0xc082010ba0, 0xc082002970, 0x0, 0x0)
     C:/GoCode/src/github.com/hashicorp/nomad/client/driver/executor_plugin.go:144 +0x56
reflect.Value.call(0xfd3be0, 0x11cf638, 0x13, 0x12375d0, 0x4, 0xc082203ed8, 0x3, 0x3, 0x0, 0x0, ...)
     C:/Go/src/reflect/value.go:435 +0x1214
reflect.Value.Call(0xfd3be0, 0x11cf638, 0x13, 0xc082203ed8, 0x3, 0x3, 0x0, 0x0, 0x0)
     C:/Go/src/reflect/value.go:303 +0xb8
net/rpc.(*service).call(0xc0821fa100, 0xc0821fa0c0, 0xc08218d0d8, 0xc082055a80, 0xc08222c260, 0x10405e0, 0xc082020070, 0x199, 0xe39e20, 0xc082002970, ...)
     C:/Go/src/net/rpc/server.go:383 +0x1c9
created by net/rpc.(*Server).ServeCodec
     C:/Go/src/net/rpc/server.go:477 +0x4a4

[-] Can't pull and build image if its missing in cache (just not works, yet trying)
[-] Can't delete image on failure, there is stopped container

Nomad version

Output from nomad version

Nomad v0.4.1-dev. at least 5018972

Operating system and Environment details docker info:

Containers: 2
 Running: 0
 Paused: 0
 Stopped: 2
Images: 6
Server Version: 1.12.0-dev
Storage Driver: windowsfilter
 Windows:
Logging Driver: json-file
Plugins:
 Volume: local
 Network: null overlay nat
Swarm: inactive
Security Options:
Kernel Version: 10.0 14300 (14300.1045.amd64fre.rs1_release_svc.160705-1059)
Operating System: Windows Server 2016 Datacenter Technical Preview 5
OSType: windows
Architecture: x86_64
CPUs: 2
Total Memory: 4 GiB
Name: vagrant-2016
ID: D62W:NRXN:UCL2:KTLR:5QEI:Z5JJ:LTBM:6OQD:2HCK:IWJJ:GIPM:EOYZ
Docker Root Dir: C:\ProgramData\docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: -1
 Goroutines: 17
 System Time: 2016-07-29T08:21:05.2349578-07:00
 EventsListeners: 0
Registry: https://index.docker.io/v1/
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

Reproduction steps

Simple container used for tests:
docker run -it -p 80:80 microsoft/iis:windowsservercore cmd

Run attached iis.nomad example with nomad run iis.nomad.

Nomad logs

nomad.log

Docker logs

dockerd.log

Job file

iis.nomad

Links

@sitano sitano changed the title Technical Report on Nomad 0.4-dev compatibility to Windows TP5 environment Technical Report on Nomad 0.4.1-dev compatibility to Windows TP5 environment Jul 29, 2016
@dadgar
Copy link
Contributor

dadgar commented Jul 29, 2016

Thanks for putting this together! This will be a huge help in getting docker/windows support to where it should be!

@mwieczorek
Copy link
Contributor

@sitano Great report!

Windows docker does not support Nomad's default networking type: bridging. It supports only NAT

I can provide PR for setting nat as default network mode on windows (but maybe after #1475 gets merged)

[-] Windows 2016 TP 5 has a bug in virtual nat switch port binding
UDP ports binding do not work

I think it will be fixed in WinServ2016 RTM, so we have to just wait (MS says: september 2016)

Nomad panics after container start with Null Ptr Deref in SyncServices
[-] Can't pull and build image if its missing in cache (just not works, yet trying)
[-] Can't delete image on failure, there is stopped container

I think it can be fixed in Nomad. Maybe I'll take a look next week

For me the main problem is syslog, and how to replace it for windows containers. I assume #1469 (plugin system for logging) won't be anytime soon.
@dadgar @diptanu Any thoughts how to workaround it?

@sitano
Copy link
Author

sitano commented Jul 29, 2016

I can write log plugin if you would give me a spec.

@diptanu
Copy link
Contributor

diptanu commented Jul 30, 2016

This is really good! Thanks @sitano

@sitano
Copy link
Author

sitano commented Jul 30, 2016

Another option could be to add support of syslog logging driver into dockerd it self, which is even better from my pov. I did not research reasons they do not provide it out of the box. And I think that should not be something big. I can research that. /and ask docker guys about this /and /or help them with getting syslog for windows. What do you think?

In this case Nomad can keep syslog.

@mwieczorek
Copy link
Contributor

Another option: etwlogs driver as default for windows.
With this approach windows version of 'universal_collector_unix.go' should be implemented (using winapi).
But etw would require no external depencies (3rd party syslog server) - it's native for windows platform.

Both options need research...

@sitano
Copy link
Author

sitano commented Aug 3, 2016

Update on how to run latest docker on latest Windows moby/moby#25336.

docker rmi  --force *windowsservercore*
docker pull microsoft/windowsservercore:10.0.14300.1030

@sitano
Copy link
Author

sitano commented Aug 3, 2016

I did small research. It seems like syslog/journald logging support just cut out off the docker since May 14, 2015 mark (moby/moby@655a58e). I can look into adding specific windows support if you think syslog will do fine for Nomad? Current situation is they use custom syslog driver with tls support (moby/moby@4b98193). Not sure it can work for win - easy to check though. Maybe its nothing preventing us from adding it.

@mwieczorek
Copy link
Contributor

@sitano
Did check is it possible to use syslog driver on windows?

@sitano
Copy link
Author

sitano commented Aug 9, 2016

@mwieczorek no yet. But it seems like it should work. Guys from irc #docker-dev said they don't mind adding syslog if we prove it works ;) I don't have time now doing that. You can try if you want. I think it will be good having some automated tests there as a prove of concept for the docker guys coming along the PR.

@sitano
Copy link
Author

sitano commented Aug 9, 2016

What do you think about #stats support missing on Windows?

@mwieczorek
Copy link
Contributor

@sitano
about stats - I think it will be soon. See microsoft/hcsshim#59 - they added stats to hcsshim, so next step will be in docker (I think so...didn't confirmed)

about syslog - maybe I can try. Do you know any good implementation of syslog server for windows?

@sitano
Copy link
Author

sitano commented Aug 9, 2016

@mwieczorek It's nice news about microsoft/hcsshim#59 - didn't see that.

About windows logging. I surprisingly have found Nomad client doesn't support Windows syslog server mode https://github.com/hashicorp/nomad/blob/master/client/driver/logging/syslog_server_windows.go?? Why is that?

About working server - don't know. Didn't touch any. Googling don't show much about compiling directly native unix syslog to mingw i.e. But, for windows there are maybe good options using C# servers:

Nice C++ version: https://github.com/MaxBelkov/visualsyslog
C# version: https://github.com/mchudinov/Syslog
Syslog-win32: https://sourceforge.net/projects/syslog-win32/

maybe we could also use just Go version of SysLog server for automated tests which is more native to the language of the choice.

I don't know whether we need to provide working syslog TLS tests for getting syslog support into Docker.

@mwieczorek
Copy link
Contributor

@sitano
I've done some tests.
First I enabled syslog in docker (no code changes except build tags, etc), and it compiled without error. I can also run containers with syslog as log driver.
Then I changed build tags in Nomad (executor/syslog_parser/collector) and replaced log/syslog with https://github.com/RackSec/srslog (log/syslog is not supported on Windows).
Results: I'm able to run job, create and run containers, logs are stored in AllocDir/logs/

@dadgar @diptanu
Suppose docker will enable syslog driver for Windows...
Would you consider to change default 'log/syslog' package to https://github.com/RackSec/srslog (in universal_collector/syslog_parser/executor)? BTW RackSec/srslog is used by docker.

@dadgar
Copy link
Contributor

dadgar commented Aug 12, 2016

@mwieczorek Nice work. Yeah that would be acceptable

@mwieczorek
Copy link
Contributor

I created issue moby/moby#25689 to enable syslog logging driver for windows.

@sitano
Copy link
Author

sitano commented Aug 15, 2016

Nice, seems like we are getting closer. I am little bit off fixing some stuff in consul cross dc serf interconnect (docker / windows / azure). Glad we are moving forward with that. We are totally thinking of running big cluster based on this.

I think its worth updating a status of the subtasks in the description...

@mwieczorek
Copy link
Contributor

about stats: moby/moby#25737
looks like it will be in docker v1.13.0

@mwieczorek
Copy link
Contributor

Update:

@sitano You can check (Windows server 2016 RTM, nomad built from master, docker built from master) and let us know if you get any new issues

@dadgar
Copy link
Contributor

dadgar commented Sep 27, 2016

@mwieczorek At this point is Docker containers launched by Nomad working on Windows Server 2016?

Does any of this work also apply to Docker on Mac? Curious if you all have tested that, as I am assuming there is a decent cross-over with the work you all have done.

@mwieczorek
Copy link
Contributor

@dadgar Yes, with Windows Server 2016 RTM, and unreleased Nomad/Docker versions (built from master). Also: I tested only running/stopping simple containers, on one host. (but will do some more advanced scenarios)

About Docker on Mac...I don't have Mac so I cannot say anything.

@diptanu
Copy link
Contributor

diptanu commented Sep 27, 2016

@mwieczorek Can you also please test if nomad is reporting the correct stats with the docker driver?

@mwieczorek
Copy link
Contributor

@diptanu Sure, I'll let you know here.

@justenwalker
Copy link
Contributor

justenwalker commented Jan 30, 2017

Windows + Docker: Consul TCP/HTTP Health checks will not function due to limitations of the Windows NAT configuration.

Published Ports on Windows Containers do not loopback. This is due to a limitation in default nat Network Stack.

If this is to work seamlessly, nomad needs to discover the Docker IP address that was assigned to the container and use that to perform health checks instead of the Host IP.

As a work-around, we're trying to see if it would be possible to use a powershell script to check health, and use the NOMAD_ALLOC_ID + NOMAD_TASK_NAME to get the container IP and perform the actual check within power-shell. In theory this should work. I'll hopefully have more to report when we start trying this.

The reason why we suspect this could work, is that we can use docker inspect --format '{{ .NetworkSettings.Networks.nat.IPAddress }}' "${NOMAD_TASK_NAME}-${NOMAD_ALLOC_ID}" to get the ip, and then follow up with an HTTP ping inside the PowerShell script.

Update 1

We seem to be able to use localhost for script checks as well, since they are actually run from within the context of the container - not from the host.

check {
  name = "Args Test"
  type = "script"
  command = "C:/Windows/System32/WindowsPowerShell/v1.0/powershell.exe"
  args = ["-Command", "If ((Invoke-WebRequest -UseBasicParsing -Uri http://localhost:${Env:IISPort}${Env:HealthPath}).StatusCode -eq 200) {Exit 0} else {Exit 1}"]
  interval = "10s"
  timeout = "60s"
}

@tgross
Copy link
Member

tgross commented Aug 24, 2020

I'm going to close this issue in lieu of the remaining test items in #2633

@tgross tgross closed this as completed Aug 24, 2020
@github-actions
Copy link

github-actions bot commented Nov 2, 2022

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 2, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants