323 Commits

Author SHA1 Message Date
Alex Ellis
c394b09ae6 Bump to nats-queue-worker 0.7.0
Includes fix for reconnection bug to NATS Streaming

Signed-off-by: Alex Ellis <alexellis2@gmail.com>
2019-02-12 17:46:00 +00:00
Alex Ellis (VMware)
299e5a5933 Read config values from environment for max_conns tuning
- max_conns / idle / per host are now read from env-vars and have
defaults set to 1024 for both values
- logging / metrics are collected in the client transaction
rather than via defer (this may impact throughput)
- function cache moved to use RWMutex to try to improve latency
around locking when updating cache
- logging message added to show latency in running GetReplicas
because this was observed to increase in a linear fashion under
high concurrency
- changes tested against 3-node bare-metal 1.13 K8s cluster
with kubeadm

Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
2019-02-04 11:50:25 +00:00
Alex Ellis (VMware)
52c27e227a Tune HTTP client for concurrency
- due to what appears to be a frequent issue with the Go HTTP
client some tweaks were needed to the HTTP client used for
reverse proxying to prevent CoreDNS from rejecting connections.

The following PRs / commits implement similar changes in
Prometheus and Minio.

https://github.com/prometheus/prometheus/pull/3592
https://github.com/minio/minio/pull/5860

Under a 3-node (1-master) kubeadm cluster running on bare
metal with Ubuntu 18.04 I was able to send 100k requests
with 1000 being concurrent with no errors being returned
by hey.

```
hey -n 100000 -c 1000 -m=POST -d="hi" \
  http://192.168.0.26:31112/function/go-echo
```

The go-echo function is based upon the golang-http
template in the function store using the of-watchdog.

Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
2019-02-04 11:50:25 +00:00
Radoslav Dimitrov
c5122279c9 Fix unit test fail due to race condition #1063
Signed-off-by: Radoslav Dimitrov <dimitrovr@vmware.com>
2019-01-30 10:28:13 +00:00
Alex Ellis (VMware)
b4a550327d Re-vendor queue-worker publisher for reconnect
- re-vendor queue-worker for publisher via 0.6.0
- bump queue-worker version to 0.6.0 in docker-compose.yml for
AMD64
- use new naming for NATS of nats -> NATS in variables where
required
- add default reconnect of 60 times, 2 seconds apart.

Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
2019-01-29 15:15:48 +00:00
Lucas Roesler
f61735b155 Add basic auth to the system alert endpoint
**What**
- Protect the `/system/alert` endpoint when basic auth is enabled
- Update the alert manager config to send the basic auth credentials
- Bump the gateway version

Signed-off-by: Lucas Roesler <roesler.lucas@gmail.com>
2019-01-24 17:45:33 +00:00
Alex Ellis (VMware)
ec185bad67 Fix label order for http_requests_total
- the order of http_requests_total was shown to be incorrect in
testing. This fixes the order as per
http_request_duration_seconds.

Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
2019-01-24 09:46:14 +00:00
Alex Ellis (VMware)
a26d350376 Allow unicode in service paths
- according to discussion in #1013 all unicode characters are
valid label values - this commit allows the original path to be
retained.

Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
2019-01-24 09:12:46 +00:00
Alex Ellis (VMware)
67c9a71686 Add unit tests for MakeNotifierWrapper
- fixes issue where result was assigned to value rather than
to pointer reference.

Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
2019-01-24 09:12:46 +00:00
Alex Ellis (VMware)
f7cf7a6496 Split out notifiers
- splits out notifiers and writes status for async handler

Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
2019-01-24 09:12:46 +00:00
Alex Ellis (VMware)
fca32a0e79 Instrument async handlers
- instruments async handler for report and for queueing async
requests
- make MustRegister only ever run once to prevent sync issues

Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
2019-01-24 09:12:46 +00:00
Alex Ellis (VMware)
5a1bdcdb91 Add instrumentation to the alert handler
Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
2019-01-24 09:12:46 +00:00
Alex Ellis (VMware)
e9cf708cb5 Bump Prometheus client version
- updates the Prometheus go client version and switches to the
promhttp handler to avoid conflicts with the new system-level
metrics.

Tested with Docker Swarm locally - no conflicts and new metrics
were gathered.

Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
2019-01-24 09:12:46 +00:00
Alex Ellis (VMware)
64a3f4e495 Instrument system calls
Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
2019-01-24 09:12:46 +00:00
Alex Ellis (VMware)
1cc767e898 Add service RED metrics definitions
Partially fixes #532 by introducing two metrics that are
supported by Kubernetes HPAv2 and RED metrics-style
dashboards.

Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
2019-01-24 09:12:46 +00:00
Richard Gee
0f5ca96bbe Add build-args to Dockerfile.armhf
When the /system/info endpoint was expanded to include information about the gateway a number of build-args were added to the main Dockerfile.  These changes were not mirrored in Dockerfile.armhf, which resulted in nil attributes and an ugly error when running `faas version` against an armhf gateway.

This change carries the changes made to Dockerfile through to Dockerfile.armhf.  As well as the build-args which fix the identified issue the license check has also been added at the latest release 0.2.3, as a armhf build has been made available.  Further changes are to introduce the app user and moving the binary location from /root/ to /home/app/

Signed-off-by: Richard Gee <richard@technologee.co.uk>
2019-01-20 10:00:02 +00:00
Burton Rheutan
988c855163 Gateway UI - validate manual input
This change validates manual input to the gateway UI when deploying
new functions. This is to prevent poor user experience when attempting
to deploy a function manually from the UI.

The validation check on the function name is the same pattern that
is used in the CLI to ensure that when the deploy button is pressed,
the function will not fail validation.

Signed-off-by: Burton Rheutan <rheutan7@gmail.com>
2019-01-19 11:00:59 +00:00
Radoslav Dimitrov
41b452849c Add a consistent ARM64 image build process
Signed-off-by: Radoslav Dimitrov <rdimitrow@gmail.com>
2019-01-16 09:09:41 +00:00
Alex Ellis (VMware)
a65df4795b Update swagger for missing secret definitions
- added secret definition and removed types used previously

Remove structs for secrets

- after discussion on PR the core contributors decided we just
want simple CRUD with the Secret type.

Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
2019-01-04 16:51:01 +00:00
Andrew Cornies
b206cb829a Updated secret types based on PR feedback:
- SecretInfo type
- ListSecretsResponse
- Move Annotations to SecretInfo
- update swagger api docs

Signed-off-by: Andrew Cornies <acornies@gmail.com>
2019-01-04 16:51:01 +00:00
Andrew Cornies
b49dded3b3 Updates from PR comments:
- moved Vagrantfile to contrib dir
- gave secret request type more thought

Signed-off-by: Andrew Cornies <acornies@gmail.com>
2019-01-04 16:51:01 +00:00
Andrew Cornies
a9238f5631 Secrets iteration:
- added delete http verb to system/secrets
- added secrets request type
- added vagrant env provisioned by existing deploy_stack.sh

Signed-off-by: Andrew Cornies <acornies@gmail.com>
2019-01-04 16:51:01 +00:00
Andrew Cornies
d2ef8b9207 Initial support for secrets in gw:
- added SecretHandler type
- added discussed system/secret endpoint with appropriate http verbs

Signed-off-by: Andrew Cornies <acornies@gmail.com>
2019-01-04 16:51:01 +00:00
Lucas Roesler
c9befd78e7 Bump gorilla mux to 1.6.2
**What**
- Update the gopkg.toml

Signed-off-by: Lucas Roesler <roesler.lucas@gmail.com>
2018-12-29 19:34:21 +00:00
Gede Wahyu
09736be293 When firing, newReplicas should be greater than currentReplicas
Signed-off-by: Gede Wahyu <tokekbesi@gmail.com>
2018-12-29 19:31:04 +00:00
Gede Wahyu
3bdb194e71 Round up value of newReplicas
Signed-off-by: Gede Wahyu <tokekbesi@gmail.com>
2018-12-29 19:31:04 +00:00
Gede Wahyu
191629151e Remove the differentiation between currentReplicas==1 and not
Signed-off-by: Gede Wahyu <tokekbesi@gmail.com>
2018-12-29 19:31:04 +00:00
Gede Wahyu
81db6514f7 Fix TestInitialScale expectation
Signed-off-by: Gede Wahyu <tokekbesi@gmail.com>
2018-12-29 19:31:04 +00:00
Gede Wahyu
058d1e481a Test scaled up from 1
Signed-off-by: Gede Wahyu <tokekbesi@gmail.com>
2018-12-29 19:31:04 +00:00
Lucas Roesler
334288b130 Undo early return in updateData callback
**What**
- Revert to original if-block structure to reduce the size of the diff

Signed-off-by: Lucas Roesler <roesler.lucas@gmail.com>
2018-12-19 21:10:41 +00:00
Lucas Roesler
cb367096ae Refresh function image during ui update loop
**What**
- Update the function image value during the `refreshData`

Signed-off-by: Lucas Roesler <roesler.lucas@gmail.com>
2018-12-19 21:10:41 +00:00
Burton Rheutan
a51d42c983 Download vendor cdn files for gateway
This change set downloads the CDN resources for the gateway
and bundles them with the other static resources for the UI.

This is needed for situations where a user does not have access
to the CDN either because of firewall rules or network policy.

The files and versions remain the same, only now loaded locally
with directory paths matching the CDN paths.

Signed-off-by: Burton Rheutan <rheutan7@gmail.com>
2018-12-05 20:20:15 +00:00
Radoslav Dimitrov
350907aacd Update gateway to golang:1.10.4
Signed-off-by: Radoslav Dimitrov <rdimitrow@gmail.com>
2018-12-05 19:59:18 +00:00
Alex Ellis (VMware)
6ef5ef73cc Update arm64 build of gateway
- adds version folder - tested on Rock64

Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
2018-12-04 11:15:38 +00:00
Martin Dekov (VMware)
ade5f32513 Bump files to use alpine:3.8
Bump Dockerfiles and mentions of alpine 3.7 to be now
3.8

Signed-off-by: Martin Dekov (VMware) <mdekov@vmware.com>
2018-11-16 20:27:05 +00:00
Alex Ellis (VMware)
b4c12f824b Make use of cache in scaling
- this reinstates the cache to reduce the count of lookups to the
provider when checking if scaling is needed.

Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
2018-11-07 13:49:56 +00:00
Alex Ellis (VMware)
117707df14 Enable backoff/retries on scaling up
- this change is needed for Docker Swarm which may give an error
when several concurrent requests come in to scale a deployment.

Tested on Docker Swarm before/after with the hey tool and figlet
scaled down to zero replicas.

Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
2018-11-07 13:49:56 +00:00
Alex Ellis (VMware)
9cea08c728 Extract scaling from zero
- extracting this package means it can be used in other components
such as the asynchronous nats-queue-worker which may need to
invoke functions which are scaled down to zero replicas.

Ref: https://github.com/openfaas/nats-queue-worker/issues/32

Tested on Docker Swarm for scaling up, already scaled and not
found error.

Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
2018-11-01 15:10:08 +00:00
Alex Ellis (VMware)
101b06243b Add documentation for scaling handler
- documents ScalingConfig and MakeScalingHandler

Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
2018-10-28 12:24:25 +00:00
Alex Ellis (VMware)
f5939c9a60 Update for scaling edge-case
- as reported on Slack and in issue #931 the gateway scaling code
was scaling to zero replicas as a result of the "proportional
scaling" added by @Templum's PR. This commit added a failing test
which was fixed by adding boundary checking - now if the scaling
amount is "0" we keep the current amount of replicas.

Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
2018-10-25 11:47:47 +01:00
Richard Gee
7df548668f Add logging to scale from zero requests
Trivial change to add logging around scale from zero events in scaling.go.
Previously scale from zero events were not logged in the same way that normal
scaling events are.  This change adds log writes to show when a scale from zero
was requested and when a function successfully moved to > 0 replicas.

Signed-off-by: Richard Gee <richard@technologee.co.uk>
2018-10-20 08:56:14 +01:00
qinpengfei
70a5e343c5 code formatter
Signed-off-by: qinpengfei <qinpengfei@jd.com>
2018-10-19 21:24:15 +01:00
Alex Ellis (VMware)
476b652c26 Update integration test
We now send Accepted not OK for creating functions.

Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
2018-10-19 21:19:21 +01:00
Alex Ellis (VMware)
62525f6570 Don't follow redirects from functions
- Covers part of 919 by making the HTTP client used for proxying
stop following redirects. Tested with a stateless microservice,
but additional code changes may be requierd in the queue-worker,
the watchdogs and other areas.

Tested on Swarm with stateless microservice (Node.js) issuing
a redirect via Location header.

Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
2018-10-19 21:19:21 +01:00
Alex Ellis
7db8ad1bda Update README files
- Removes use of "our" from CONTRIBUTING guide
- Updates/adds README.md files
- Commnents and typo fix in watchdog
- Adds good/bad examples of commit messages

Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
2018-10-03 14:07:41 +01:00
Alex Ellis
bd39b9267a Update comments
- updates comments and adds where missing
- updates locks so that unlock is done via defer instead of
at the end of the statement
- extracts timeout variable in two places
- remove makeClient() unused method from metrics package

No-harm changes tested via go build.

Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
2018-10-03 13:16:28 +01:00
Richard Gee
e33061702a Change the http status code on unfound function error to 404
Signed-off-by: Richard Gee <richard@technologee.co.uk>
2018-09-23 12:39:13 +01:00
Richard Gee
df6f4c49f2 Add checking for existent function in GetReplicas
Within MakeScalingHandler() there is a call to GetReplicas() which was not returning an error when a non-200 http response was received from /system/function/.  The call would also return a populated struct, so the perception was that a function existed an had been scaled to zero.  This meant that the function would be added to the function cache and the code would continue into SetReplicas() where an attempt would be made to scale up a non-existent function.

This change amends GetReplicas() so that it will return an error if the gateway returns anything other than a 200 reponse code from the /system/function/ endpoint.  This causes MakeScalingHandler() to return earlier with an error indicating that the function could not be found.  The cache.Set call is also moved to after the error check so that the cache is only updated to include existent functions.

During investigations as to the cause of #876 tests were added to function_cache to check that Get() is behaving as intended when function exists and when not.  Tests are also added to plugin/external to test that GetReplicas() and SetReplicas() are following their intended modes of operation when 200 and non-200 responses are received from the gateway.

Signed-off-by: Richard Gee <richard@technologee.co.uk>
2018-09-23 12:39:13 +01:00
Alex Ellis (VMware)
3598da2e51 Enable basic auth for service query / scaling on provider
- this is a blocking issue for auth with Docker Swarm
fixes #879

Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
2018-09-19 20:52:14 +01:00
Alex Ellis (VMware)
c67c9f2b30 Fix issue with direct_functions and path behaviour
- The path clipping / transforming behaviour must be turned-off
when we are not using direct_functions as is used in
faas-nomad and faas-ecs. This will need a change in each provider
to strip paths, but fixes a 404 error these users will see if they
upgrade to 0.9.2 or newer. 0.9.3 will have a this fix meaning
the whole un-edited path is passed to the provider when
direct_functions is set to false.

Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
2018-09-15 14:40:22 +01:00