- the shutdown sequence meant that the kubelet was still passing
work to the watchdog after the HTTP socket was closed. This change
means that the kubelet has a chance to run its check before we
finally stop accepting new connections. It will require some
basic co-ordination between the kubelet's checking period and the
"write_timeout" value in the container.
Tested with Kubernetes on GKE - before the change some Pods were
giving a connection refused error due to them being not detected
as unhealthy. Now I receive 0% error rate even with 20 qps.
Issue was shown by scaling to 20 replicas, starting a test with
hey and then scaling to 1 replica while tailing the logs from the
gateway. Before I saw some 502, now I see just 200s.
Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
This test takes inspiration from the PR from @telackey with changes
to make it more maintainable. Since the test does not require
changes to the code, I wanted to add it before merging changes.
Ref: https://github.com/openfaas/faas/pull/789
Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
If the watchdog is sent SIGTERM from an external process then it
should stop accepting new connections and attempt to finish the
work in progress. This change makes use of the new ability in Go
1.9 and onwards to cancel a HTTP server gracefully.
The write_timeout duration is used as a grace period to allow all
in-flight requests to complete. The pattern is taken directly from
the offical example in the Golang documentation. [1]
Further tuning and testing may be needed for Windows containers which
have a different set of signals for closing work. This change aims
to cover the majority use-case for Linux containers.
The HTTP health-check is also invalidated by creating an and
expression with the existing lock file.
Tested with Kubernetes by deploying a custom watchdog and the
fprocess of `env`. Log message was observed when scaling down and
connections stopped being accepted on terminating replica.
Also corrects some typos from previous PR.
[1] https://golang.org/pkg/net/http/#Server.Shutdown
Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
This commit replaces occurences of http method strings with the
corresponding consts from the http package.
*Note* UPDATE is not strictly speaking a valid method and as such isn't
part of the http package (should be a PUT or PATCH?)
Signed-off-by: John McCabe <john@johnmccabe.net>
Introduce new endpoint `/_/health` to watchdog for health status of
functions which check for `/tmp/.lock` file
Fixes first part of #547 issue.
Signed-off-by: Vivek Singh <vivekkmr45@yahoo.in>
Integration test for combine_output should use stat instead of
man as man is not installed in the CI system.
Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
This enables an often-requested feature to separate stderr
from stdout within function responses. New flag combine_output is on
by default to match existing behaviour. When combine_output is set
to false it redirects stderr to the container logs rather than
combining it into the function response.
Tested with unit tests for default behaviour and new behaviour.
Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
HTTP port can now be overriden through use of "port" environmental
variable.
Prefer messaging "want" over "wanted" in error messages, this is more
idiomatic Golang.
Move away from Go ARMv6 (RPi Zero) and give ARMv7 as a minimum
version for release binaries.
Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
- Watchdog - allow new methods with and without body.
- Enforce hard-timeout via exec_timeout variable.
- Correct bug in timeouts for read/write of HTTP.
- Documentation for new verbs and hard timeout.
Signed-off-by: Alex Ellis <alexellis2@gmail.com>