-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Watchdog endpoints for health #583
Conversation
Thank you for your contribution. I've just checked and your commit doesn't appear to be signed-off. |
watchdog/main.go
Outdated
@@ -250,6 +250,34 @@ func makeRequestHandler(config *WatchdogConfig) func(http.ResponseWriter, *http. | |||
} | |||
} | |||
|
|||
func isLockFileExists() bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am ok with lockFilePresent()
for instance.
watchdog/main.go
Outdated
return true | ||
} | ||
|
||
func health() func(http.ResponseWriter, *http.Request) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makeHealthHandler()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will change that.
watchdog/main.go
Outdated
func health() func(http.ResponseWriter, *http.Request) { | ||
return func(w http.ResponseWriter, r *http.Request) { | ||
switch r.Method { | ||
case "GET": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stefanprodan does Kubernetes care about the method for a health endpoint? should it accept anything and still respond the same way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be GET with 200 or 500 response code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But is the validation necessary?
watchdog/main.go
Outdated
@@ -271,6 +299,7 @@ func main() { | |||
} | |||
|
|||
http.HandleFunc("/", makeRequestHandler(&config)) | |||
http.HandleFunc("/health", health()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should have been with a prefix that cannot clash with the function i.e. /_/
- I think this was mentioned in the issue. Please can you change it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad .. I will add that.
The commits here are a bit messy now. Please can you try to clean that up? Also - the commit message could be better i.e. Subject should be under 50 chars in length for each commit Here's some more context. I haven't asked you to do this before but it will help with the history - let's see if we can improve it? |
0a95c23
to
3a38bf9
Compare
Commit message looks a bit better now - thank you. Should start with a capital letter and we need a new line then a body talking about why the change is needed and what it solves. |
3a38bf9
to
b811d20
Compare
I made the changes requested and updated the commit message as well. |
b811d20
to
9d232cf
Compare
Calling for a review from @johnmccabe - once the PR is completed we will need to replicate it here too - https://github.com/openfaas-incubator/of-watchdog |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Been unable to get this to work locally, could do with some assistance from @viveksyngh to pinpoint whats wrong here.
@@ -232,6 +232,34 @@ func getAdditionalEnvs(config *WatchdogConfig, r *http.Request, method string) [ | |||
return envs | |||
} | |||
|
|||
func lockFilePresent() bool { | |||
path := filepath.Join(os.TempDir(), ".lock") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alexellis this jumped out at me and having a look at the rest of the Watchdog code it doesn't appear to check/ensure that the temp directory actually exists before attempting to write a lock..
If the user attempts to use the watchdog within a Scratch container then it'll bomb out.
2018/03/17 18:03:15 Writing lock-file to: /tmp/.lock
2018/03/17 18:03:15 Cannot write /tmp/.lock. To disable lock-file set env suppress_lock=true.
Or am I missing something here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@johnmccabe @alexellis does this behaviour needs to be changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will crash. Having a temporary directory is a requirement.
watchdog/main.go
Outdated
@@ -271,6 +299,7 @@ func main() { | |||
} | |||
|
|||
http.HandleFunc("/", makeRequestHandler(&config)) | |||
http.HandleFunc("/_/health", makeHealthHandler()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@viveksyngh how did you verify this working?
I've built the watchdog from this branch but I've been unable to hit the health endpoint?
$ export fprocess="echo hello"
$ ./watchdog
2018/03/17 18:16:38 Writing lock-file to: /var/folders/2z/d4szj605431693r_4pj3sfwr0000gq/T/.lock
Curling the function root works as expected:
$ curl -vvv http://localhost:8080/
* Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 8080 (#0)
> GET / HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.54.0
> Accept: */*
>
< HTTP/1.1 200 OK
< X-Duration-Seconds: 0.004187
< Date: Sat, 17 Mar 2018 18:17:30 GMT
< Content-Length: 6
< Content-Type: text/plain; charset=utf-8
<
hello
* Connection #0 to host localhost left intact
But curling the health endpoint doesn't return the expected OK, instead invoking the function again:
$ curl -vvv http://localhost:8080/_/health
* Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 8080 (#0)
> GET /_/health HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.54.0
> Accept: */*
>
< HTTP/1.1 200 OK
< X-Duration-Seconds: 0.003916
< Date: Sat, 17 Mar 2018 18:18:11 GMT
< Content-Length: 6
< Content-Type: text/plain; charset=utf-8
<
hello
* Connection #0 to host localhost left intact
Does this actually work for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok so your route ordering is wrong here, the health route needs to occur before the /
route, currently you're never reaching the /_/health
route as /
is handling it.
I've fixed the ordering locally and its behaving as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I verified it on my local. I can share the result.
curl -i http://localhost:8081/
HTTP/1.1 200 OK
X-Duration-Seconds: 0.005071
Date: Sat, 17 Mar 2018 18:25:38 GMT
Content-Length: 1329
Content-Type: text/plain; charset=utf-8
TERM_SESSION_ID=w0t1p0:633DD086-384A-4778-B803-2F9EA6073601
SSH_AUTH_SOCK=/private/tmp/com.apple.launchd.J8jlwZ3nBI/Listeners
Apple_PubSub_Socket_Render=/private/tmp/com.apple.launchd.fuASYugNfM/Render
COLORFGBG=7;0
ITERM_PROFILE=Default
XPC_FLAGS=0x0
LANG=en_US.UTF-8
PWD=/Users/viveks/Repos/go-projects/src/github.com/openfaas/faas/watchdog
SHELL=/bin/zsh
TERM_PROGRAM_VERSION=3.1.5
TERM_PROGRAM=iTerm.app
PATH=/Library/Frameworks/Python.framework/Versions/3.6/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/go/bin
COLORTERM=truecolor
TERM=xterm-256color
HOME=/Users/viveks
TMPDIR=/var/folders/tw/6ww2mtms12ndnypdycqfy_q40000gy/T/
USER=viveks
XPC_SERVICE_NAME=0
LOGNAME=viveks
__CF_USER_TEXT_ENCODING=0x1FE:0:0
ITERM_SESSION_ID=w0t1p0:633DD086-384A-4778-B803-2F9EA6073601
SHLVL=1
OLDPWD=/Users/viveks/Repos/go-projects/src/github.com/openfaas/faas
ZSH=/Users/viveks/.oh-my-zsh
PAGER=less
LESS=-R
LC_CTYPE=en_US.UTF-8
LSCOLORS=Gxfxcxdxbxegedabagacad
GOPATH=/Users/viveks/Repos/go-projects/
VISUAL=vim
fprocess=env
_=/Users/viveks/Repos/go-projects/src/github.com/openfaas/faas/watchdog/./watchdog
Http_User_Agent=curl/7.54.0
Http_Accept=*/*
Http_Method=GET
Http_ContentLength=0
Http_Path=/
curl -i http://localhost:8081/_/health
HTTP/1.1 200 OK
Date: Sat, 17 Mar 2018 18:25:43 GMT
Content-Length: 2
Content-Type: text/plain; charset=utf-8
OK%
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just changed the port in my local because I have something else running on port 8080
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curl -i -XPOST http://localhost:8081/_/health
HTTP/1.1 405 Method Not Allowed
Date: Sat, 17 Mar 2018 18:30:29 GMT
Content-Length: 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2018/03/17 23:55:29 Writing lock-file to: /var/folders/tw/6ww2mtms12ndnypdycqfy_q40000gy/T/.lock
2018/03/17 23:55:38 Forking fprocess.
2018/03/17 23:55:38 Wrote 1329 Bytes - Duration: 0.005071 seconds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@johnmccabe I will do that ordering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you rebase with master you will find the watchdog as of 0.7.6 now allows port
to be overridden temporarily for this kind of purpose.
watchdog/main.go
Outdated
return true | ||
} | ||
|
||
func makeHealthHandler() func(http.ResponseWriter, *http.Request) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain what the health check is considering as healthy here? Can the presence of the lock file alone be considered sufficient to say that the function is healthy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to issue , the /_/health endpoint need to check for /tmp/.lock
file.
#547
I think this will also make sure that function container has started successfully and functions can be invoked. Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lock file can be deleted at any time by the function to indicate an underlying problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok.
Speaking of which do we have any tests we can extend to prove the behaviour works? |
There are no tests right now, I will add tests for |
We do have tests (I have been using them tonight to push a change in 0.7.6) - please look at the file |
Introduce new endpoint `/_/health` to watchdog for health status of functions which check for `/tmp/.lock` file Fixes first part of openfaas#547 issue. Signed-off-by: Vivek Singh <vivekkmr45@yahoo.in>
9d232cf
to
29dd04a
Compare
@alexellis I have added tests to |
I'll check out the tests |
This work looks good to me including the tests. I'd just be concerned about the ordering of the tests - if they ran out of sync they may not work - therefore I'd consider making sure the lock file is created/present in the positive test. You could raise a new PR for this. |
Introduces new endpoint
/health
in watchdog.Description
This change covers the first part of this issue #547
Motivation and Context
How Has This Been Tested?
It has been tested on a local system (Mac) by building the watchdog and doing curl for the
/health
endpoint.Types of changes
Checklist:
git commit -s