Commit Graph

322 Commits

Author SHA1 Message Date
299e5a5933 Read config values from environment for max_conns tuning
- max_conns / idle / per host are now read from env-vars and have
defaults set to 1024 for both values
- logging / metrics are collected in the client transaction
rather than via defer (this may impact throughput)
- function cache moved to use RWMutex to try to improve latency
around locking when updating cache
- logging message added to show latency in running GetReplicas
because this was observed to increase in a linear fashion under
high concurrency
- changes tested against 3-node bare-metal 1.13 K8s cluster
with kubeadm

Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
2019-02-04 11:50:25 +00:00
52c27e227a Tune HTTP client for concurrency
- due to what appears to be a frequent issue with the Go HTTP
client some tweaks were needed to the HTTP client used for
reverse proxying to prevent CoreDNS from rejecting connections.

The following PRs / commits implement similar changes in
Prometheus and Minio.

https://github.com/prometheus/prometheus/pull/3592
https://github.com/minio/minio/pull/5860

Under a 3-node (1-master) kubeadm cluster running on bare
metal with Ubuntu 18.04 I was able to send 100k requests
with 1000 being concurrent with no errors being returned
by hey.

```
hey -n 100000 -c 1000 -m=POST -d="hi" \
  http://192.168.0.26:31112/function/go-echo
```

The go-echo function is based upon the golang-http
template in the function store using the of-watchdog.

Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
2019-02-04 11:50:25 +00:00
c5122279c9 Fix unit test fail due to race condition #1063
Signed-off-by: Radoslav Dimitrov <dimitrovr@vmware.com>
2019-01-30 10:28:13 +00:00
b4a550327d Re-vendor queue-worker publisher for reconnect
- re-vendor queue-worker for publisher via 0.6.0
- bump queue-worker version to 0.6.0 in docker-compose.yml for
AMD64
- use new naming for NATS of nats -> NATS in variables where
required
- add default reconnect of 60 times, 2 seconds apart.

Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
2019-01-29 15:15:48 +00:00
f61735b155 Add basic auth to the system alert endpoint
**What**
- Protect the `/system/alert` endpoint when basic auth is enabled
- Update the alert manager config to send the basic auth credentials
- Bump the gateway version

Signed-off-by: Lucas Roesler <roesler.lucas@gmail.com>
2019-01-24 17:45:33 +00:00
ec185bad67 Fix label order for http_requests_total
- the order of http_requests_total was shown to be incorrect in
testing. This fixes the order as per
http_request_duration_seconds.

Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
2019-01-24 09:46:14 +00:00
a26d350376 Allow unicode in service paths
- according to discussion in #1013 all unicode characters are
valid label values - this commit allows the original path to be
retained.

Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
2019-01-24 09:12:46 +00:00
67c9a71686 Add unit tests for MakeNotifierWrapper
- fixes issue where result was assigned to value rather than
to pointer reference.

Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
2019-01-24 09:12:46 +00:00
f7cf7a6496 Split out notifiers
- splits out notifiers and writes status for async handler

Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
2019-01-24 09:12:46 +00:00
fca32a0e79 Instrument async handlers
- instruments async handler for report and for queueing async
requests
- make MustRegister only ever run once to prevent sync issues

Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
2019-01-24 09:12:46 +00:00
5a1bdcdb91 Add instrumentation to the alert handler
Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
2019-01-24 09:12:46 +00:00
e9cf708cb5 Bump Prometheus client version
- updates the Prometheus go client version and switches to the
promhttp handler to avoid conflicts with the new system-level
metrics.

Tested with Docker Swarm locally - no conflicts and new metrics
were gathered.

Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
2019-01-24 09:12:46 +00:00
64a3f4e495 Instrument system calls
Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
2019-01-24 09:12:46 +00:00
1cc767e898 Add service RED metrics definitions
Partially fixes #532 by introducing two metrics that are
supported by Kubernetes HPAv2 and RED metrics-style
dashboards.

Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
2019-01-24 09:12:46 +00:00
0f5ca96bbe Add build-args to Dockerfile.armhf
When the /system/info endpoint was expanded to include information about the gateway a number of build-args were added to the main Dockerfile.  These changes were not mirrored in Dockerfile.armhf, which resulted in nil attributes and an ugly error when running `faas version` against an armhf gateway.

This change carries the changes made to Dockerfile through to Dockerfile.armhf.  As well as the build-args which fix the identified issue the license check has also been added at the latest release 0.2.3, as a armhf build has been made available.  Further changes are to introduce the app user and moving the binary location from /root/ to /home/app/

Signed-off-by: Richard Gee <richard@technologee.co.uk>
2019-01-20 10:00:02 +00:00
988c855163 Gateway UI - validate manual input
This change validates manual input to the gateway UI when deploying
new functions. This is to prevent poor user experience when attempting
to deploy a function manually from the UI.

The validation check on the function name is the same pattern that
is used in the CLI to ensure that when the deploy button is pressed,
the function will not fail validation.

Signed-off-by: Burton Rheutan <rheutan7@gmail.com>
2019-01-19 11:00:59 +00:00
41b452849c Add a consistent ARM64 image build process
Signed-off-by: Radoslav Dimitrov <rdimitrow@gmail.com>
2019-01-16 09:09:41 +00:00
a65df4795b Update swagger for missing secret definitions
- added secret definition and removed types used previously

Remove structs for secrets

- after discussion on PR the core contributors decided we just
want simple CRUD with the Secret type.

Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
2019-01-04 16:51:01 +00:00
b206cb829a Updated secret types based on PR feedback:
- SecretInfo type
- ListSecretsResponse
- Move Annotations to SecretInfo
- update swagger api docs

Signed-off-by: Andrew Cornies <acornies@gmail.com>
2019-01-04 16:51:01 +00:00
b49dded3b3 Updates from PR comments:
- moved Vagrantfile to contrib dir
- gave secret request type more thought

Signed-off-by: Andrew Cornies <acornies@gmail.com>
2019-01-04 16:51:01 +00:00
a9238f5631 Secrets iteration:
- added delete http verb to system/secrets
- added secrets request type
- added vagrant env provisioned by existing deploy_stack.sh

Signed-off-by: Andrew Cornies <acornies@gmail.com>
2019-01-04 16:51:01 +00:00
d2ef8b9207 Initial support for secrets in gw:
- added SecretHandler type
- added discussed system/secret endpoint with appropriate http verbs

Signed-off-by: Andrew Cornies <acornies@gmail.com>
2019-01-04 16:51:01 +00:00
c9befd78e7 Bump gorilla mux to 1.6.2
**What**
- Update the gopkg.toml

Signed-off-by: Lucas Roesler <roesler.lucas@gmail.com>
2018-12-29 19:34:21 +00:00
09736be293 When firing, newReplicas should be greater than currentReplicas
Signed-off-by: Gede Wahyu <tokekbesi@gmail.com>
2018-12-29 19:31:04 +00:00
3bdb194e71 Round up value of newReplicas
Signed-off-by: Gede Wahyu <tokekbesi@gmail.com>
2018-12-29 19:31:04 +00:00
191629151e Remove the differentiation between currentReplicas==1 and not
Signed-off-by: Gede Wahyu <tokekbesi@gmail.com>
2018-12-29 19:31:04 +00:00
81db6514f7 Fix TestInitialScale expectation
Signed-off-by: Gede Wahyu <tokekbesi@gmail.com>
2018-12-29 19:31:04 +00:00
058d1e481a Test scaled up from 1
Signed-off-by: Gede Wahyu <tokekbesi@gmail.com>
2018-12-29 19:31:04 +00:00
334288b130 Undo early return in updateData callback
**What**
- Revert to original if-block structure to reduce the size of the diff

Signed-off-by: Lucas Roesler <roesler.lucas@gmail.com>
2018-12-19 21:10:41 +00:00
cb367096ae Refresh function image during ui update loop
**What**
- Update the function image value during the `refreshData`

Signed-off-by: Lucas Roesler <roesler.lucas@gmail.com>
2018-12-19 21:10:41 +00:00
a51d42c983 Download vendor cdn files for gateway
This change set downloads the CDN resources for the gateway
and bundles them with the other static resources for the UI.

This is needed for situations where a user does not have access
to the CDN either because of firewall rules or network policy.

The files and versions remain the same, only now loaded locally
with directory paths matching the CDN paths.

Signed-off-by: Burton Rheutan <rheutan7@gmail.com>
2018-12-05 20:20:15 +00:00
350907aacd Update gateway to golang:1.10.4
Signed-off-by: Radoslav Dimitrov <rdimitrow@gmail.com>
2018-12-05 19:59:18 +00:00
6ef5ef73cc Update arm64 build of gateway
- adds version folder - tested on Rock64

Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
2018-12-04 11:15:38 +00:00
ade5f32513 Bump files to use alpine:3.8
Bump Dockerfiles and mentions of alpine 3.7 to be now
3.8

Signed-off-by: Martin Dekov (VMware) <mdekov@vmware.com>
2018-11-16 20:27:05 +00:00
b4c12f824b Make use of cache in scaling
- this reinstates the cache to reduce the count of lookups to the
provider when checking if scaling is needed.

Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
2018-11-07 13:49:56 +00:00
117707df14 Enable backoff/retries on scaling up
- this change is needed for Docker Swarm which may give an error
when several concurrent requests come in to scale a deployment.

Tested on Docker Swarm before/after with the hey tool and figlet
scaled down to zero replicas.

Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
2018-11-07 13:49:56 +00:00
9cea08c728 Extract scaling from zero
- extracting this package means it can be used in other components
such as the asynchronous nats-queue-worker which may need to
invoke functions which are scaled down to zero replicas.

Ref: https://github.com/openfaas/nats-queue-worker/issues/32

Tested on Docker Swarm for scaling up, already scaled and not
found error.

Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
2018-11-01 15:10:08 +00:00
101b06243b Add documentation for scaling handler
- documents ScalingConfig and MakeScalingHandler

Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
2018-10-28 12:24:25 +00:00
f5939c9a60 Update for scaling edge-case
- as reported on Slack and in issue #931 the gateway scaling code
was scaling to zero replicas as a result of the "proportional
scaling" added by @Templum's PR. This commit added a failing test
which was fixed by adding boundary checking - now if the scaling
amount is "0" we keep the current amount of replicas.

Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
2018-10-25 11:47:47 +01:00
7df548668f Add logging to scale from zero requests
Trivial change to add logging around scale from zero events in scaling.go.
Previously scale from zero events were not logged in the same way that normal
scaling events are.  This change adds log writes to show when a scale from zero
was requested and when a function successfully moved to > 0 replicas.

Signed-off-by: Richard Gee <richard@technologee.co.uk>
2018-10-20 08:56:14 +01:00
70a5e343c5 code formatter
Signed-off-by: qinpengfei <qinpengfei@jd.com>
2018-10-19 21:24:15 +01:00
476b652c26 Update integration test
We now send Accepted not OK for creating functions.

Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
2018-10-19 21:19:21 +01:00
62525f6570 Don't follow redirects from functions
- Covers part of 919 by making the HTTP client used for proxying
stop following redirects. Tested with a stateless microservice,
but additional code changes may be requierd in the queue-worker,
the watchdogs and other areas.

Tested on Swarm with stateless microservice (Node.js) issuing
a redirect via Location header.

Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
2018-10-19 21:19:21 +01:00
7db8ad1bda Update README files
- Removes use of "our" from CONTRIBUTING guide
- Updates/adds README.md files
- Commnents and typo fix in watchdog
- Adds good/bad examples of commit messages

Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
2018-10-03 14:07:41 +01:00
bd39b9267a Update comments
- updates comments and adds where missing
- updates locks so that unlock is done via defer instead of
at the end of the statement
- extracts timeout variable in two places
- remove makeClient() unused method from metrics package

No-harm changes tested via go build.

Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
2018-10-03 13:16:28 +01:00
e33061702a Change the http status code on unfound function error to 404
Signed-off-by: Richard Gee <richard@technologee.co.uk>
2018-09-23 12:39:13 +01:00
df6f4c49f2 Add checking for existent function in GetReplicas
Within MakeScalingHandler() there is a call to GetReplicas() which was not returning an error when a non-200 http response was received from /system/function/.  The call would also return a populated struct, so the perception was that a function existed an had been scaled to zero.  This meant that the function would be added to the function cache and the code would continue into SetReplicas() where an attempt would be made to scale up a non-existent function.

This change amends GetReplicas() so that it will return an error if the gateway returns anything other than a 200 reponse code from the /system/function/ endpoint.  This causes MakeScalingHandler() to return earlier with an error indicating that the function could not be found.  The cache.Set call is also moved to after the error check so that the cache is only updated to include existent functions.

During investigations as to the cause of #876 tests were added to function_cache to check that Get() is behaving as intended when function exists and when not.  Tests are also added to plugin/external to test that GetReplicas() and SetReplicas() are following their intended modes of operation when 200 and non-200 responses are received from the gateway.

Signed-off-by: Richard Gee <richard@technologee.co.uk>
2018-09-23 12:39:13 +01:00
3598da2e51 Enable basic auth for service query / scaling on provider
- this is a blocking issue for auth with Docker Swarm
fixes #879

Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
2018-09-19 20:52:14 +01:00
c67c9f2b30 Fix issue with direct_functions and path behaviour
- The path clipping / transforming behaviour must be turned-off
when we are not using direct_functions as is used in
faas-nomad and faas-ecs. This will need a change in each provider
to strip paths, but fixes a 404 error these users will see if they
upgrade to 0.9.2 or newer. 0.9.3 will have a this fix meaning
the whole un-edited path is passed to the provider when
direct_functions is set to false.

Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
2018-09-15 14:40:22 +01:00
3e0ed5edd7 Add X-Forwarded-Host test when already present
Signed-off-by: Stefan Prodan <stefan.prodan@gmail.com>
2018-09-10 09:57:03 +01:00