Any request to a Target can produce a TCP error, timeout, or an HTTP status code.
The health check uses the data in the request to determine whether a Target is healthy or unhealthy.
- For active checks, this information is gathered by an active probe
- For passive checks, this information is gathered from a proxied request
Based on the gathered data, the health checker updates a series of internal counters:
- If the returned status code is configured as
healthy
, it
increments the Successes
counter for the Target and clears all its other
counters
- If it fails to connect, it increments the
TCP failures
counter
for the Target and clears the Successes
counter
- If it times out, it increments the
Timeouts
counter
for the Target and clears the Successes
counter
- If the returned status code is one configured as
unhealthy
, it
increments the HTTP failures
counter for the Target and clears the Successes
counter
If any of the TCP failures
, HTTP failures
, or timeouts
counters reach
their configured threshold, the Target will be marked as unhealthy.
If the Successes
counter reaches its configured threshold, the Target will be
marked as healthy.
The list of which HTTP status codes are healthy
or unhealthy
and the
individual thresholds for each of these counters are configurable for each individual Upstream.
You can find all of the default values for an Upstream in the Upstream schema.
Notes:
- Unhealthy Targets won’t be removed from the load balancer, and won’t have any impact on the balancer layout when using a hashing algorithm. Instead, they will just be skipped.
- Health checks operate only on enabled Targets and don’t modify the status of a Target in the Kong Gateway database.
- The DNS caveats also apply to health checks.
If using hostnames for the Targets, then make sure the DNS server always returns the full set of IP addresses for a name, and does not limit the response.
The health of an Upstream is determined based on the status of its Targets.
You can configure the threshold for a healthy Upstream using its healthchecks.threshold
parameter.
This sets a percentage of minimum available Target weight
(capacity) for the Upstream to be considered healthy.
If the available capacity percentage of an Upstream is less than the configured threshold, the Upstream is considered unhealthy and Kong Gateway will respond to requests to the Upstream with 503 Service Unavailable
.
Here is a simple example:
- You have an Upstream configured with
healthchecks.threshold=55
- The Upstream has 5 Targets, each with
weight=100
, so the total weight in the ring balancer is 500
- Each Target represents 20% of the total available capacity
In this scenario, the Upstream can handle losing 2 of its 5 Targets, as it will then be working at 60% capacity, which is still higher than the configured threshold of 55%.
Once a third Target becomes unhealthy, the capacity drops to 40%, and the Upstream itself becomes unhealthy as well.
Once it enters an unhealthy state, the Upstream will only return errors.
This lets the Targets recover from the cascading failures they were experiencing.
When the Targets start recovering and the Upstream’s available capacity passes the threshold again, the health status of the ring balancer is automatically updated and the Upstream is reactivated.