Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Watchdog endpoints for health #583

Merged
merged 1 commit into from
Mar 20, 2018
Merged

Conversation

viveksyngh
Copy link
Contributor

@viveksyngh viveksyngh commented Mar 15, 2018

Introduces new endpoint /health in watchdog.

Description

This change covers the first part of this issue #547

Motivation and Context

  • I have raised an issue to propose this change (required)

How Has This Been Tested?

It has been tested on a local system (Mac) by building the watchdog and doing curl for the /health endpoint.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I've read the CONTRIBUTION guide
  • I have signed-off my commits with git commit -s
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@derek
Copy link

derek bot commented Mar 15, 2018

Thank you for your contribution. I've just checked and your commit doesn't appear to be signed-off.
That's something we need before your Pull Request can be merged. Please see our contributing guide.

@derek derek bot added the no-dco label Mar 15, 2018
watchdog/main.go Outdated
@@ -250,6 +250,34 @@ func makeRequestHandler(config *WatchdogConfig) func(http.ResponseWriter, *http.
}
}

func isLockFileExists() bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am ok with lockFilePresent() for instance.

watchdog/main.go Outdated
return true
}

func health() func(http.ResponseWriter, *http.Request) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makeHealthHandler()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will change that.

watchdog/main.go Outdated
func health() func(http.ResponseWriter, *http.Request) {
return func(w http.ResponseWriter, r *http.Request) {
switch r.Method {
case "GET":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stefanprodan does Kubernetes care about the method for a health endpoint? should it accept anything and still respond the same way?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be GET with 200 or 500 response code

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But is the validation necessary?

watchdog/main.go Outdated
@@ -271,6 +299,7 @@ func main() {
}

http.HandleFunc("/", makeRequestHandler(&config))
http.HandleFunc("/health", health())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should have been with a prefix that cannot clash with the function i.e. /_/ - I think this was mentioned in the issue. Please can you change it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad .. I will add that.

@alexellis
Copy link
Member

The commits here are a bit messy now. Please can you try to clean that up?

Also - the commit message could be better i.e.

Subject should be under 50 chars in length for each commit
Then give a new-line and write a body message saying a bit about why the change is needed, but don't exceed 72 chars per line.

Here's some more context. I haven't asked you to do this before but it will help with the history - let's see if we can improve it?

https://chris.beams.io/posts/git-commit/

@derek derek bot removed the no-dco label Mar 16, 2018
@alexellis
Copy link
Member

Commit message looks a bit better now - thank you. Should start with a capital letter and we need a new line then a body talking about why the change is needed and what it solves.

@viveksyngh
Copy link
Contributor Author

I made the changes requested and updated the commit message as well.

@alexellis
Copy link
Member

Calling for a review from @johnmccabe - once the PR is completed we will need to replicate it here too - https://github.com/openfaas-incubator/of-watchdog

@viveksyngh viveksyngh changed the title [WIP] Watchdog endpoints for health Watchdog endpoints for health Mar 17, 2018
Copy link
Contributor

@johnmccabe johnmccabe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Been unable to get this to work locally, could do with some assistance from @viveksyngh to pinpoint whats wrong here.

@@ -232,6 +232,34 @@ func getAdditionalEnvs(config *WatchdogConfig, r *http.Request, method string) [
return envs
}

func lockFilePresent() bool {
path := filepath.Join(os.TempDir(), ".lock")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexellis this jumped out at me and having a look at the rest of the Watchdog code it doesn't appear to check/ensure that the temp directory actually exists before attempting to write a lock..

If the user attempts to use the watchdog within a Scratch container then it'll bomb out.

2018/03/17 18:03:15 Writing lock-file to: /tmp/.lock
2018/03/17 18:03:15 Cannot write /tmp/.lock. To disable lock-file set env suppress_lock=true.

Or am I missing something here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@johnmccabe @alexellis does this behaviour needs to be changed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will crash. Having a temporary directory is a requirement.

watchdog/main.go Outdated
@@ -271,6 +299,7 @@ func main() {
}

http.HandleFunc("/", makeRequestHandler(&config))
http.HandleFunc("/_/health", makeHealthHandler())
Copy link
Contributor

@johnmccabe johnmccabe Mar 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viveksyngh how did you verify this working?

I've built the watchdog from this branch but I've been unable to hit the health endpoint?

$ export fprocess="echo hello"
$ ./watchdog
2018/03/17 18:16:38 Writing lock-file to: /var/folders/2z/d4szj605431693r_4pj3sfwr0000gq/T/.lock

Curling the function root works as expected:

$ curl -vvv http://localhost:8080/
*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 8080 (#0)
> GET / HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.54.0
> Accept: */*
>
< HTTP/1.1 200 OK
< X-Duration-Seconds: 0.004187
< Date: Sat, 17 Mar 2018 18:17:30 GMT
< Content-Length: 6
< Content-Type: text/plain; charset=utf-8
<
hello
* Connection #0 to host localhost left intact

But curling the health endpoint doesn't return the expected OK, instead invoking the function again:

$ curl -vvv http://localhost:8080/_/health
*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 8080 (#0)
> GET /_/health HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.54.0
> Accept: */*
>
< HTTP/1.1 200 OK
< X-Duration-Seconds: 0.003916
< Date: Sat, 17 Mar 2018 18:18:11 GMT
< Content-Length: 6
< Content-Type: text/plain; charset=utf-8
<
hello
* Connection #0 to host localhost left intact

Does this actually work for you?

Copy link
Contributor

@johnmccabe johnmccabe Mar 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so your route ordering is wrong here, the health route needs to occur before the / route, currently you're never reaching the /_/health route as / is handling it.

I've fixed the ordering locally and its behaving as expected.

Copy link
Contributor Author

@viveksyngh viveksyngh Mar 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I verified it on my local. I can share the result.

curl -i http://localhost:8081/
HTTP/1.1 200 OK
X-Duration-Seconds: 0.005071
Date: Sat, 17 Mar 2018 18:25:38 GMT
Content-Length: 1329
Content-Type: text/plain; charset=utf-8

TERM_SESSION_ID=w0t1p0:633DD086-384A-4778-B803-2F9EA6073601
SSH_AUTH_SOCK=/private/tmp/com.apple.launchd.J8jlwZ3nBI/Listeners
Apple_PubSub_Socket_Render=/private/tmp/com.apple.launchd.fuASYugNfM/Render
COLORFGBG=7;0
ITERM_PROFILE=Default
XPC_FLAGS=0x0
LANG=en_US.UTF-8
PWD=/Users/viveks/Repos/go-projects/src/github.com/openfaas/faas/watchdog
SHELL=/bin/zsh
TERM_PROGRAM_VERSION=3.1.5
TERM_PROGRAM=iTerm.app
PATH=/Library/Frameworks/Python.framework/Versions/3.6/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/go/bin
COLORTERM=truecolor
TERM=xterm-256color
HOME=/Users/viveks
TMPDIR=/var/folders/tw/6ww2mtms12ndnypdycqfy_q40000gy/T/
USER=viveks
XPC_SERVICE_NAME=0
LOGNAME=viveks
__CF_USER_TEXT_ENCODING=0x1FE:0:0
ITERM_SESSION_ID=w0t1p0:633DD086-384A-4778-B803-2F9EA6073601
SHLVL=1
OLDPWD=/Users/viveks/Repos/go-projects/src/github.com/openfaas/faas
ZSH=/Users/viveks/.oh-my-zsh
PAGER=less
LESS=-R
LC_CTYPE=en_US.UTF-8
LSCOLORS=Gxfxcxdxbxegedabagacad
GOPATH=/Users/viveks/Repos/go-projects/
VISUAL=vim
fprocess=env
_=/Users/viveks/Repos/go-projects/src/github.com/openfaas/faas/watchdog/./watchdog
Http_User_Agent=curl/7.54.0
Http_Accept=*/*
Http_Method=GET
Http_ContentLength=0
Http_Path=/
curl -i http://localhost:8081/_/health
HTTP/1.1 200 OK
Date: Sat, 17 Mar 2018 18:25:43 GMT
Content-Length: 2
Content-Type: text/plain; charset=utf-8

OK%

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just changed the port in my local because I have something else running on port 8080.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curl -i -XPOST http://localhost:8081/_/health
HTTP/1.1 405 Method Not Allowed
Date: Sat, 17 Mar 2018 18:30:29 GMT
Content-Length: 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2018/03/17 23:55:29 Writing lock-file to: /var/folders/tw/6ww2mtms12ndnypdycqfy_q40000gy/T/.lock
2018/03/17 23:55:38 Forking fprocess.
2018/03/17 23:55:38 Wrote 1329 Bytes - Duration: 0.005071 seconds

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@johnmccabe I will do that ordering.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you rebase with master you will find the watchdog as of 0.7.6 now allows port to be overridden temporarily for this kind of purpose.

watchdog/main.go Outdated
return true
}

func makeHealthHandler() func(http.ResponseWriter, *http.Request) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain what the health check is considering as healthy here? Can the presence of the lock file alone be considered sufficient to say that the function is healthy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to issue , the /_/health endpoint need to check for /tmp/.lock file.
#547

I think this will also make sure that function container has started successfully and functions can be invoked. Am I missing something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lock file can be deleted at any time by the function to indicate an underlying problem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok.

@alexellis
Copy link
Member

Speaking of which do we have any tests we can extend to prove the behaviour works?

@viveksyngh
Copy link
Contributor Author

viveksyngh commented Mar 17, 2018

There are no tests right now, I will add tests for makeHealthHandler with following testcases -
1.) /tmp/.lock file present
2.) /tmp/.lock file not present
3.) Invoke /_/health with the method other than GET

@alexellis
Copy link
Member

We do have tests (I have been using them tonight to push a change in 0.7.6) - please look at the file requesthandler_test.go

Introduce new endpoint `/_/health` to watchdog for health status of
functions  which check for `/tmp/.lock` file

Fixes first part of openfaas#547 issue.

Signed-off-by: Vivek Singh <vivekkmr45@yahoo.in>
@viveksyngh
Copy link
Contributor Author

@alexellis I have added tests to requesthandler_test.go file.

@alexellis
Copy link
Member

I'll check out the tests

@alexellis alexellis merged commit c484eee into openfaas:master Mar 20, 2018
@alexellis
Copy link
Member

alexellis commented Mar 20, 2018

This work looks good to me including the tests. I'd just be concerned about the ordering of the tests - if they ran out of sync they may not work - therefore I'd consider making sure the lock file is created/present in the positive test. You could raise a new PR for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants