During some exploratory testing, I ran into an issue where
the gateway would attempt to scale a deployment from zero
replicas to min, despite there already being min replicas.
Why?
The scaling logic was looking for Available replicas when
it should have looked for Desired replicas. So when a
deployment had zero ready replicas due to readiness checks
failing, the gateway was attempting to scale from zero
to min.
This logic has been corrected and separated from the
a holding pattern where the gateway waits for a ready
replica.
Tested with KinD and an edited function which had a
readiness probe, which was failing and no ready
replicas. As desired, the gateway did not scale to min.
However, when setting desired replicas to zero, the
gateway did scale up as expected.
This change also modifies all print statements for
"seconds" and makes them use 4 decimal places instead of
the default which was a longer, more verbose string for
the logs.
Signed-off-by: Alex Ellis (OpenFaaS Ltd) <alexellis2@gmail.com>
Introduces a single-flight call to a function's health
endpoint to verify that it is registered with an Istio
sidecar (Envoy) before letting the invocation through.
Results are cached for 5 seconds, before a probe is
required again.
Tested without Istio, with probe_functions environment
variable set to true, I saw a probe execute in the logs.
Fixes: #1721 for Istio users.
Signed-off-by: Alex Ellis (OpenFaaS Ltd) <alex@openfaas.com>
When querying for replicas during a scale up event, then the
gateway can overwhelm the provider with requests. This is
especially true under high concurrent load.
The changes in this PR limit the inflight requests.
Signed-off-by: Alex Ellis (OpenFaaS Ltd) <alex@openfaas.com>
* Add service target metric
* Add service min replicas metric
* Add scale type metric
These combined allow new auto-scaling modes and parameters
for OpenFaaS Pro customers.
Signed-off-by: Alex Ellis (OpenFaaS Ltd) <alexellis2@gmail.com>
The code was calling into the cache twice, even if the first
call was a cache hit and not a miss.
Signed-off-by: Alex Ellis (OpenFaaS Ltd) <alexellis2@gmail.com>
This type abstracts the function_query type and introduces an
interface for testing and substitution.
Signed-off-by: Alex Ellis (OpenFaaS Ltd) <alexellis2@gmail.com>
Enables publishing to various topics according to annotations
on the functions. The function cache is moved up one level so
that it can be shared between the scale from zero code and the
queue proxy.
Unit tests added for new internal methods.
Tested e2e with arkade and the newest queue-worker and RC
gateway image with two queues and an annotation on one of the
functions of com.openfaas.queue. It worked as expected including
with multiple namespace support.
Signed-off-by: Alex Ellis (OpenFaaS Ltd) <alexellis2@gmail.com>
Allows alerts to trigger functions to scale when they
also have an optional namespace set.
Tested e2e with Kubernetes 1.15 and a non-default namespace.
Signed-off-by: Alex Ellis (OpenFaaS Ltd) <alexellis2@gmail.com>
- max_conns / idle / per host are now read from env-vars and have
defaults set to 1024 for both values
- logging / metrics are collected in the client transaction
rather than via defer (this may impact throughput)
- function cache moved to use RWMutex to try to improve latency
around locking when updating cache
- logging message added to show latency in running GetReplicas
because this was observed to increase in a linear fashion under
high concurrency
- changes tested against 3-node bare-metal 1.13 K8s cluster
with kubeadm
Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
- this reinstates the cache to reduce the count of lookups to the
provider when checking if scaling is needed.
Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
- this change is needed for Docker Swarm which may give an error
when several concurrent requests come in to scale a deployment.
Tested on Docker Swarm before/after with the hey tool and figlet
scaled down to zero replicas.
Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>
- extracting this package means it can be used in other components
such as the asynchronous nats-queue-worker which may need to
invoke functions which are scaled down to zero replicas.
Ref: https://github.com/openfaas/nats-queue-worker/issues/32
Tested on Docker Swarm for scaling up, already scaled and not
found error.
Signed-off-by: Alex Ellis (VMware) <alexellis2@gmail.com>