- max_conns / idle / per host are now read from env-vars and have
defaults set to 1024 for both values
- logging / metrics are collected in the client transaction
rather than via defer (this may impact throughput)
- function cache moved to use RWMutex to try to improve latency
around locking when updating cache
- logging message added to show latency in running GetReplicas
because this was observed to increase in a linear fashion under
high concurrency
- changes tested against 3-node bare-metal 1.13 K8s cluster
with kubeadm
Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
- due to what appears to be a frequent issue with the Go HTTP
client some tweaks were needed to the HTTP client used for
reverse proxying to prevent CoreDNS from rejecting connections.
The following PRs / commits implement similar changes in
Prometheus and Minio.
https://github.com/prometheus/prometheus/pull/3592https://github.com/minio/minio/pull/5860
Under a 3-node (1-master) kubeadm cluster running on bare
metal with Ubuntu 18.04 I was able to send 100k requests
with 1000 being concurrent with no errors being returned
by hey.
```
hey -n 100000 -c 1000 -m=POST -d="hi" \
http://192.168.0.26:31112/function/go-echo
```
The go-echo function is based upon the golang-http
template in the function store using the of-watchdog.
Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
- re-vendor queue-worker for publisher via 0.6.0
- bump queue-worker version to 0.6.0 in docker-compose.yml for
AMD64
- use new naming for NATS of nats -> NATS in variables where
required
- add default reconnect of 60 times, 2 seconds apart.
Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
**What**
- Protect the `/system/alert` endpoint when basic auth is enabled
- Update the alert manager config to send the basic auth credentials
- Bump the gateway version
Signed-off-by: Lucas Roesler <roesler.lucas@gmail.com>
- the order of http_requests_total was shown to be incorrect in
testing. This fixes the order as per
http_request_duration_seconds.
Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
- according to discussion in #1013 all unicode characters are
valid label values - this commit allows the original path to be
retained.
Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
- instruments async handler for report and for queueing async
requests
- make MustRegister only ever run once to prevent sync issues
Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
- updates the Prometheus go client version and switches to the
promhttp handler to avoid conflicts with the new system-level
metrics.
Tested with Docker Swarm locally - no conflicts and new metrics
were gathered.
Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
Partially fixes#532 by introducing two metrics that are
supported by Kubernetes HPAv2 and RED metrics-style
dashboards.
Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
When the /system/info endpoint was expanded to include information about the gateway a number of build-args were added to the main Dockerfile. These changes were not mirrored in Dockerfile.armhf, which resulted in nil attributes and an ugly error when running `faas version` against an armhf gateway.
This change carries the changes made to Dockerfile through to Dockerfile.armhf. As well as the build-args which fix the identified issue the license check has also been added at the latest release 0.2.3, as a armhf build has been made available. Further changes are to introduce the app user and moving the binary location from /root/ to /home/app/
Signed-off-by: Richard Gee <richard@technologee.co.uk>
This change validates manual input to the gateway UI when deploying
new functions. This is to prevent poor user experience when attempting
to deploy a function manually from the UI.
The validation check on the function name is the same pattern that
is used in the CLI to ensure that when the deploy button is pressed,
the function will not fail validation.
Signed-off-by: Burton Rheutan <rheutan7@gmail.com>
- added secret definition and removed types used previously
Remove structs for secrets
- after discussion on PR the core contributors decided we just
want simple CRUD with the Secret type.
Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
This change set downloads the CDN resources for the gateway
and bundles them with the other static resources for the UI.
This is needed for situations where a user does not have access
to the CDN either because of firewall rules or network policy.
The files and versions remain the same, only now loaded locally
with directory paths matching the CDN paths.
Signed-off-by: Burton Rheutan <rheutan7@gmail.com>
- this reinstates the cache to reduce the count of lookups to the
provider when checking if scaling is needed.
Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
- this change is needed for Docker Swarm which may give an error
when several concurrent requests come in to scale a deployment.
Tested on Docker Swarm before/after with the hey tool and figlet
scaled down to zero replicas.
Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
- extracting this package means it can be used in other components
such as the asynchronous nats-queue-worker which may need to
invoke functions which are scaled down to zero replicas.
Ref: https://github.com/openfaas/nats-queue-worker/issues/32
Tested on Docker Swarm for scaling up, already scaled and not
found error.
Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
- as reported on Slack and in issue #931 the gateway scaling code
was scaling to zero replicas as a result of the "proportional
scaling" added by @Templum's PR. This commit added a failing test
which was fixed by adding boundary checking - now if the scaling
amount is "0" we keep the current amount of replicas.
Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
Trivial change to add logging around scale from zero events in scaling.go.
Previously scale from zero events were not logged in the same way that normal
scaling events are. This change adds log writes to show when a scale from zero
was requested and when a function successfully moved to > 0 replicas.
Signed-off-by: Richard Gee <richard@technologee.co.uk>
- Covers part of 919 by making the HTTP client used for proxying
stop following redirects. Tested with a stateless microservice,
but additional code changes may be requierd in the queue-worker,
the watchdogs and other areas.
Tested on Swarm with stateless microservice (Node.js) issuing
a redirect via Location header.
Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
- Removes use of "our" from CONTRIBUTING guide
- Updates/adds README.md files
- Commnents and typo fix in watchdog
- Adds good/bad examples of commit messages
Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
- updates comments and adds where missing
- updates locks so that unlock is done via defer instead of
at the end of the statement
- extracts timeout variable in two places
- remove makeClient() unused method from metrics package
No-harm changes tested via go build.
Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
Within MakeScalingHandler() there is a call to GetReplicas() which was not returning an error when a non-200 http response was received from /system/function/. The call would also return a populated struct, so the perception was that a function existed an had been scaled to zero. This meant that the function would be added to the function cache and the code would continue into SetReplicas() where an attempt would be made to scale up a non-existent function.
This change amends GetReplicas() so that it will return an error if the gateway returns anything other than a 200 reponse code from the /system/function/ endpoint. This causes MakeScalingHandler() to return earlier with an error indicating that the function could not be found. The cache.Set call is also moved to after the error check so that the cache is only updated to include existent functions.
During investigations as to the cause of #876 tests were added to function_cache to check that Get() is behaving as intended when function exists and when not. Tests are also added to plugin/external to test that GetReplicas() and SetReplicas() are following their intended modes of operation when 200 and non-200 responses are received from the gateway.
Signed-off-by: Richard Gee <richard@technologee.co.uk>
- The path clipping / transforming behaviour must be turned-off
when we are not using direct_functions as is used in
faas-nomad and faas-ecs. This will need a change in each provider
to strip paths, but fixes a 404 error these users will see if they
upgrade to 0.9.2 or newer. 0.9.3 will have a this fix meaning
the whole un-edited path is passed to the provider when
direct_functions is set to false.
Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>