18 minute read

Monitoring containerized workloads in Kubernetes is key to maintaining application performance and reliability. As an integral monitoring strategy of container health management, Kubernetes health checks help detect unhealthy containers, minimize downtime, and optimize resource utilization.

As much as implementing health checks is important, another critical aspect is to choose the appropriate Kubernetes probe type:

  • Kubernetes liveness probes: Monitor container health and restart containers as needed to maintain application availability.
  • Kubernetes readiness probes: Assess container readiness to serve traffic by checking if it is up and responding to requests, as opposed to an application that is still initializing or unable to handle incoming traffic; also help inspect load balancing and scaling.
  • Kubernetes startup probes: Evaluate container initialization status; achieve additional control during the startup phase of an application.

While each probe is important, Kubernetes liveness probes provide the essential foundation for health checks. They should be among the first topics cluster admins explore to ensure application reliability and fault tolerance.

This article will explore the key Kubernetes liveness probes concepts when to use different settings, detailed example configurations, and essential tips and tricks for working with liveness probes.

Summary of key Kubernetes liveness probe concepts

The table below summarizes different configuration concepts that can help you optimize liveness probes and groups them by complexity.

Beginner concepts should be accessible to all Kubernetes administrators. Intermediate concepts are more complex than beginner topics. Advanced concepts may require deep Kubernetes knowledge to get right.

ConceptPurpose
Beginner
Choosing probe design patterns Selecting the most appropriate liveness probe type and configuration based on your application's requirements out of:
  • HTTP GET
  • TCP Socket
  • gRPC
  • Command
Container failure detectionDetecting container failures ensures problematic containers are restarted automatically for optimum application availability.
Intermediate
Probe retries and backoff
  • initialDelaySeconds - Time to wait before starting the first probe after the container has started
  • periodSeconds - Time between subsequent probe attempts
  • timeoutSeconds - Time in seconds that the liveness probe waits for a response before considering the probe as failed
  • failureThreshold - Defined number of consecutive failed probe results before considering the container as unhealthy
Container restart policiesUnderstand how liveness probe failures trigger container restarts and how restart policies interact with probe configurations.
Advanced
Probes in stateful applicationsConfigure probes for stateful applications considering the stateful aspects of the application, such as data replication.
Container state transitionsAdjusting probe configurations to match the container's lifecycle stages.

Kubernetes liveness probe design pattern types and use cases

The purpose of configuring a basic liveness probe is to ensure that your containerized application is healthy and responsive. When configuring liveness probes, the first step is choosing the appropriate probe type.

Kubernetes offers HTTP GET, TCP socket, gRPC, and liveness command probe types, each catering to specific use cases and application requirements. If the liveness probe fails, Kubernetes will automatically restart the container, allowing the application to recover from a faulty state.

logo

Comprehensive Kubernetes cost monitoring & optimization

HTTP GET Kubernetes liveness probes

An HTTP GET liveness probe is a common choice to determine container health when your container exposes a web service or an HTTP endpoint. The probe uses an HTTP GET request to the specified endpoint, where the container is considered healthy if the response has a successful HTTP status code (between 200 and 399).

When to use HTTP GET Kubernetes liveness probes

Use HTTP GET probes for applications that expose an HTTP(S) service and can be considered healthy based on the status code returned by a GET request to a specific URL path.

HTTP GET probes are typically used for:

  • Web applications
  • Microservices
  • RESTful APIs
  • Other HTTP-based services

How to configure an HTTP GET Kubernetes liveness probe

To configure an HTTP GET liveness probe, add the livenessProbe configuration to your container spec in the pod definition. Specify the httpGet probe type and provide the path and port for your HTTP service.

apiVersion: v1
kind: Pod
metadata:
  name: darwin-app
spec:
  containers:
  - name: darwin-container
    image: darwin-image
    livenessProbe:
      httpGet:
        path: /darwin-path
        port: 8080
      initialDelaySeconds: 60
      periodSeconds: 10
      timeoutSeconds: 5
      failureThreshold: 3

In this example, the liveness probe sends an HTTP GET request to the /darwin-path endpoint on port 8080. The probe will start after an initial delay of 60 seconds and will be performed every 10 seconds. Each probe attempt has a timeout of 5 seconds to receive a successful response. If an attempt does not receive a successful response within this timeframe, it is marked as a failure. The container is restarted if the probe encounters 3 consecutive failed attempts.

TCP socket Kubernetes liveness probes

TCP Socket probes are relatively lightweight, as they only check for the ability to establish a connection without needing an HTTP server or custom logic.

When to use TCP socket Kubernetes liveness probes

Use a TCP Socket liveness probe for applications that don’t expose an HTTP endpoint but rely on TCP connections. TCP socket probes are useful when your workload’s health can be determined by establishing a connection to a specific port, indicating that the corresponding service is running and accepting connections.

TCP probes are typically used for:

  • Databases
  • Message brokers
  • Custom TCP servers

How to configure a TCP socket Kubernetes liveness probe

Identify the port on which your application exposes its TCP-based service. The TCP Socket probe will attempt to connect this port to check the container’s health.

In your pod configuration YAML file, add or modify the livenessProbe section for the container you want to monitor as shown below:

livenessProbe:
  tcpSocket:
    port: 3306
  initialDelaySeconds: 15
  periodSeconds: 10

Kubernetes liveness command probes

The kubectl exec command enables the initiation of a shell session inside a container. Kubernetes allows liveness command probes that run on a script within a container.

When to use Kubernetes liveness command probes

Consider using command probes when your workload requires a complex or tailored health check that involves executing a custom command or script inside the container.

Command probes are typically used for applications that require custom logic to determine health, such as:

  • Querying internal application metrics
  • Checking process status
  • Validating the response from an API endpoint

How to configure an Kubernetes liveness command probe

In the example configuration below, the /app/health-check.sh script is the custom health check script created to be used with the liveness command probe.

livenessProbe:
  exec:
    command:
    - /app/health-check.sh
  initialDelaySeconds: 15
  periodSeconds: 10

You can customize this script to suit your use case by including the logic needed to determine the health of your application or container as shown below:

if pgrep -x "darwin-app-process-name" > /dev/null; then
    exit 0 
else
    exit 1 
fi

K8s clusters handling 10B daily API calls use Kubecost

Learn More

The above script checks if a process with the name darwin-app-process-name is running in the container. If the process is running, the script signals that the application is healthy. If not, it signals that the application is unhealthy.

To use this script in your Kubernetes deployment:

1.Create a new file named health-check.sh on your local machine and add the above script contents.

2.Make the script executable by running:

chmod +x health-check.sh

3.In your application’s Dockerfile, copy the script to the appropriate location within the container:

COPY health-check.sh /app/health-check.sh
RUN chmod +x /app/health-check.sh

This will add the script to the /app directory within the container image. The script is now ready to be referenced in your Kubernetes pod definition.

gRPC Kubernetes liveness probes

With Kubernetes 1.24, the gRPC probe functionality has advanced to the beta stage and comes enabled by default. This permits you to set up startup, liveness, and readiness probes for your gRPC-based applications, eliminating the need to expose an HTTP endpoint or rely on an extra executable. As a result, Kubernetes can interface directly with your workload via gRPC, inspecting its status and marking a significant enhancement from the earlier requirement of using command probes.

When to use gRPC Kubernetes liveness probes

Use a gRPC liveness probe if your application provides a gRPC endpoint for health checks.

gRPC probes are especially beneficial for:

  • Applications utilizing the gRPC framework for inter-service communication
  • Microservices architecture where services need to communicate reliably and efficiently
  • Any application where native gRPC support improves health check performance and removes the need for an additional 10MB executable

How to configure a gRPC Kubernetes liveness probe

First, recognize the port where your gRPC service is exposed. The gRPC probe will use this port to connect and evaluate the health of your container.

In the following configuration, the gRPC liveness probe is set up to connect to port 50051. The probe will wait 15 seconds before its first check to allow your application time to start, and it will continue to run checks every 10 seconds thereafter.

livenessProbe:
  grpc:
    port: 50051
  initialDelaySeconds: 15
  periodSeconds: 10

This allows Kubernetes to natively assess the health status of your gRPC-based application. Be sure to adjust the port, initialDelaySeconds, and periodSeconds values to best suit the behavior and requirements of your application.

How to choose the right Kubernetes liveness probe types

Picking the right Kubernetes liveness probe can be tricky. Here are five tips that can help Kubernetes admins get it right.

  • When using an HTTP GET liveness probe, ensure the health check endpoint is lightweight and doesn’t consume excessive resources.
  • Before using TCP Socket liveness probes, ensure that establishing a TCP connection can be treated as a reliable indicator of your application’s health.
  • If you are using a gRPC liveness probe, ensure that the gRPC service and the port it communicates with are correctly configured. Also, make sure your application’s health can be accurately determined by the gRPC service’s response.
  • If your application is deployed as part of a Deployment, StatefulSet, or another higher-level resource, update the corresponding template within the resource configuration after configuring the pod definition.
  • Ensure custom exec scripts are configured to explicitly return a zero and non-zero exit code for success and failure, respectively.
  • Regardless of the design pattern chosen, configure the initialDelaySeconds, periodSeconds, and other liveness probe parameters to match your application’s startup time and desired health check frequency.

Container failure detection for availability monitoring

Monitoring and restarting unhealthy containers ensures that resources are utilized more efficiently than if non-functioning processes consumed them indefinitely. Proactive detection of container failures also contributes to efficient workload management by enabling Kubernetes orchestration systems to maintain desired application states, balance workloads across clusters automatically, and ultimately provide users with a superior user experience.

When to use container failure detection

  • To detect applications becoming unresponsive or entering a deadlock, infinite loop, or crash without terminating the container.
  • When the container process cannot recover by itself and requires a restart to return to a healthy state.
  • When you want to ensure that your application can recover from failures automatically, without manual intervention.

Administrators can choose HTTP GET, TCP socket, gRPC or command probes to monitor container failures. To detect container failure using liveness probes, you need to define the probe settings in your Kubernetes deployment or pod configuration as shown below for a container named darwin-container:

    spec:
      containers:
        - name: darwin-container
          image: darwin-image
          ports:
            - containerPort: 8080
          livenessProbe:
            httpGet:
              path: /darwin-path
              port: 8080
            initialDelaySeconds: 30
            periodSeconds: 10
            timeoutSeconds: 5
            successThreshold: 1
            failureThreshold: 3

In the above case, the HTTP GET liveness probe is configured to send a GET request to the /darwin-path endpoint on port 8080. A single successful probe response (successThreshold: 1) confirms a healthy container while the container is considered unhealthy if it fails three consecutive probe attempts (failureThreshold: 3).

Tips for container failure detection

  • Configure the initialDelaySeconds parameter to allow enough time for your application to start and become steady before the probe starts checking its health.
  • Avoid setting the periodSeconds parameter value too low (causing unnecessary container restarts) or too high (delaying the detection of container failures).
  • If your application is resource-intensive or takes time to initialize, consider configuring the timeoutSeconds parameter to prevent liveness probe timeouts.

Probe retries and backoff

When you move to an intermediate level of cluster administration, configuring probes requires a thorough understanding of Kubernetes and the nuances of application behavior. For example, understanding how to fine-tune probe retries and backoff requires a deep knowledge of application behavior, startup times, resource usage, and the implications of different probe configurations on the application’s overall health.

Configuring retries and backoff helps control the frequency and timing of liveness probe attempts in Kubernetes. This also helps optimize the probe behavior by avoiding excessive probe attempts while providing reliable container health monitoring.

Learn how to manage K8s costs via the Kubecost APIs

WATCH 30 MIN YOUTUBE VIDEO

When to use probe retires and back off

  • For customizing the frequency and timing of probe attempts based on your application’s specific needs.
  • When you want to avoid placing unnecessary load on your application due to excessive probe attempts.
  • When a workload requires a more gradual approach to monitor container health, such as those with a long startup time or those sensitive to frequent health check requests.

Probe configuration parameters

  • initialDelaySeconds: Seconds to wait before the first probe is initiated, allowing time for your application to start up
  • periodSeconds: Interval between probe attempts
  • timeoutSeconds: Time after which the probe times out, and the attempt is considered a failure
  • successThreshold: Consecutive successful probe attempts are required for the container to be considered healthy
  • failureThreshold: Number of consecutive failed probe attempts before the container is considered unhealthy

Tips for probe retries and backoff

  • Set initialDelaySeconds according to your application’s startup time to avoid triggering unnecessary container restarts during initialization. For applications with unpredictable or longer startup times, consider using startup probes with a high failureThreshold for more accuracy and reliability compared to merely adjusting the initialDelaySeconds.
  • A shorter periodSeconds interval allows for quicker detection of failures but may place more load on the application. On the other hand, a longer interval reduces the load but may result in slower failure detection.
  • Use an appropriate value for timeoutSeconds to prevent probes from taking too long and delaying container recovery. Also, avoid setting it too low, which could lead to false positives.
  • Customize failureThreshold and successThreshold to suit your application’s tolerance for failures and recovery time. Higher thresholds provide more room for recovery, while lower thresholds trigger quicker actions in response to unhealthy states.

Container restart policies

Container restart policies determine how Kubernetes should handle container failures within a pod. This helps you control the conditions under which a container should be restarted, ultimately helping you optimize the recovery process and ensure your application remains available even in the event of container failures.

When to use container restart policies

  • For defining custom behavior for handling container failures within a pod.
  • To ensure containers are restarted automatically upon failure to maintain application availability.
  • To fine-tune container recovery strategies based on the specific requirements of your application.

Configuring restartPolicy fields

Container restart policies are configured using the restartPolicy field in the pod specification. Supported values for restartPolicy include:

  • Always: The container is always restarted if it stops, regardless of the exit status.
  • OnFailure: The container is only restarted if it exits with a non-zero status (i.e., it has failed).
  • Never: The container is never restarted, regardless of the exit status.

In the following example, the pod’s restart policy is set to OnFailure, implying that the container will only be restarted if it exits with a non-zero status (indicating a failure).

apiVersion: v1
kind: Pod
metadata:
  name: darwin-app-restart
spec:
  restartPolicy: OnFailure
  containers:
  - name: darwin-container
    image: darwin-image

Tips for container restart policies

  • The default restart policy for Kubernetes is Always, which ensures that containers are automatically restarted upon failure.
  • Use the OnFailure policy when you want to restart containers only when they have failed, such as when a crash or other error occurs. This policy is suitable for applications that can gracefully exit after completing their tasks without requiring a restart.
  • As the Never policy can lead to extended periods of application unavailability, use it only for debugging purposes when you need to inspect the state of a container after it has stopped.
  • When creating your pod specifications, note that restart policies are applied at the pod level and affect all containers within the pod.

Probes in stateful applications

Stateful applications, by nature, maintain state across multiple components and often have complex interdependencies. When it comes to probing stateful applications, it is important to consider the unique characteristics, such as maintaining state consistency and handling interdependencies between stateful components.

It is also crucial to be cautious about restarting stateful components, as it might lead to state inconsistencies or data loss. Sophisticated health checks through liveness probes in such instances help detect issues early, initiate recovery actions faster, and minimize the impact of failures on the application’s state.

When to use probes in stateful applications

  • When the application relies on maintaining state consistency across different components or replicas.
  • For identifying and addressing issues like component crashes, deadlocks, or resource exhaustion that can impact the application’s state.
  • To optimize the recovery process for minimal risk of data loss, state inconsistencies, or other issues that can arise from component failures.

How to configure a probe configuration for database replication

In a stateful application with database replication, you may want to ensure that the replicas are in sync and up-to-date before considering the application healthy. You can configure the liveness probe to check the replication lag to achieve this.

livenessProbe:
  exec:
    command:
    - /usr/local/bin/check-replication-lag.sh
  initialDelaySeconds: 60
  periodSeconds: 30
  timeoutSeconds: 10
  failureThreshold: 3

In this case, the check-replication-lag.sh script is defined to query the database and determine if the replication lag is within an acceptable range. Depending on your database system, the script can query the appropriate system tables or use the provided APIs to obtain replication lag information.

For example, in a PostgreSQL setup, the script can be defined similar to:

max_allowed_lag=60
psql_command="psycopg2"

lag_seconds=$($psql_command -h localhost -U postgres -d your_database -t -c "SELECT EXTRACT(EPOCH FROM (NOW() - pg_last_xact_replay_timestamp())) AS replication_lag;")

if (( $(echo "$lag_seconds < $max_allowed_lag" | bc -l) )); then
    exit 0
else
    exit 1
fi

The above script connects to a PostgreSQL database, queries the pg_last_xact_replay_timestamp() function to get the timestamp of the last transaction replayed, and calculates the replication lag in seconds. If the lag is less than the max_allowed_lag, the script exits with a status code of 0 for success or 1 for failure.

Tips for probes in stateful applications

  • Fine-tune probe parameters and container restart policies to minimize the risk of restarting stateful components, as it may lead to state inconsistencies or data loss.
  • For applications that rely on data replication, ensure that liveness probes account for replication lag or temporary unavailability of replicas.
  • In multi-node deployments, optimize probe configurations to handle the added complexity of maintaining state and communication between nodes.

Container state transitions

From creation to termination, a container’s state goes through various stages during its lifecycle. Understanding and managing these transitions is crucial for ensuring the smooth operation of containerized workloads and maintaining overall system stability.

When to use container state transitions

  • Ensure that the application is ready to serve requests before the container is marked “ready” by Kubernetes before the application is fully initialized.
  • Improve the application’s overall reliability and fault tolerance by allowing it to start and stop gracefully.
  • If the application takes longer than expected to start, optimize resource usage by preventing Kubernetes from repeatedly restarting the container.

How to configure a probe for a slow-starting application

Startup probes are designed as dedicated health checks that solely monitor an application during its startup phase and are particularly helpful in handling slow-starting applications.

While startup probes offer a more reliable and advanced approach for managing the health of slow-starting applications in Kubernetes, adjusting liveness probes for slow-starting applications is another workaround.

To achieve this, you can configure a liveness probe for a slow-starting application by adjusting the initialDelaySeconds parameter to give the application enough time to start before the probe begins checking its health. This can be achieved by configuring the pod specification as shown below:

spec:
      containers:
      - name: darwin-container
        image: darwin-image
        ports:
        - containerPort: 8080
        livenessProbe:
          httpGet:
            path: /darwin-path/health-check
            port: 8080
          initialDelaySeconds: 90
          periodSeconds: 10
          timeoutSeconds: 5
          successThreshold: 1
          failureThreshold: 3

Adjusting initialDelaySeconds based on your workload’s specific startup time can help prevent premature probe failures and unnecessary container restarts. In the following example, we configure an HTTP GET liveness probe for darwin-container with an initialDelaySeconds value of 90 seconds. We do so to delay the initialization of the slow-starting application enough time before the liveness probe starts monitoring its health.

Kubernetes also provides a set of lifecycle hooks to execute specific scripts during container startup and termination.

For instance, you can use the postStart lifecycle hook to execute a custom script and perform any necessary initialization tasks or delay the application startup to accommodate the slow-starting nature of an application. Alternatively, you can also use the preStop hook to allow the application to gracefully terminate before the container is stopped:

spec:
      containers:
      - name: darwin-container
        image: darwin-image
        ports:
        - containerPort: 8080
        lifecycle:
          postStart:
            exec:
              command: ["/bin/sh", "-c", "sleep 90; ./slow-startup-script.sh"]
          preStop:
            exec:
              command: ["/bin/sh", "-c", "sleep 10; ./graceful-termination-script.sh"]

Tips for container state transitions

  • Keep an eye on container events, such as OOMKilled or CrashLoopBackOff to understand and troubleshoot issues affecting container state transitions.
  • Be aware of application dependencies when managing container state transitions, and use readiness probes to manage the startup order of interdependent components.
  • Fine-tune your container restart policies (Always, OnFailure, Never) to control the container state transitions and recovery process based on your application’s requirements.
logo

Comprehensive Kubernetes cost monitoring & optimization

Conclusion

In this article, we delved into key touchpoints of liveness probe configuration, exploring how factors like initial delays, timeouts, failure thresholds, and probe types can be fine-tuned to align with your specific application requirements. Implementing liveness probes involves making informed decisions about probe type selection and fine-tuning configurations. However, enhancing the efficiency of your Kubernetes ecosystem extends beyond the realm of liveness probes.

Take the next step in your Kubernetes journey by adopting monitoring best practices and utilizing Kubecost. Kubecost empowers you to optimize resource utilization and uncover hidden savings in your containerized applications. Beyond tracking resource usage and costs, Kubecost enables you to establish cluster health alerts based on specific thresholds, including low cluster memory, insufficient cluster CPU, and pods running out of memory.