Fault injection

This chapter presents how to inject faults into service mesh to test application resilency against service failures.

Outline

In this chapter you will learn:

  • What is Fault Injection.
  • How to inject delay and abort faults into a service mesh.

Walkthrough

Fault injection is a testing method that introduces errors into a system to ensure that it can withstand and recover from error conditions.

Unlike other mechanisms for introducing errors such as delaying packets or killing pods at the network layer, Istio proxy enables injecting faults at the application layer. That allows operators to inject more relevant failures, such as HTTP error codes.

Istio supports two categories of faults, both configured using a VirtualService policy:

  • delays - timing failures, mimic increased network latency or an overloaded upstream service,
  • aborts - crash failures, mimic failures in upstream services (HTTP error codes or TCP connection failures).

In this chapter, we will inject both types of faults into the test application and observe how it handles the failure.

Inject HTTP delay fault

In this scenario we will inject a delay fault into the payment service.

Start with applying the fault policy:

$ kubectl -n default apply -f ./release/istio/paymentservice-fault-vs-1.yaml
virtualservice.networking.istio.io/paymentservice configured

Inspect its details:

$ kubectl -n default describe vs paymentservice
Name:         paymentservice
Namespace:    default
...
Spec:
  Hosts:
    paymentservice
  Http:
    Fault:
      Delay:
        Fixed Delay:  7s
        Percentage:
          Value:  100
    Route:
      Destination:
        Host:    paymentservice
        Subset:  v1

The above configuration injects 7s delay for 100% percent of the traffic to the payment service.

Now, visit the test application in your web browser. Order one item and proceed to the checkout stage:

Before placing the order, open the network analyzer tool provided by your web browser:

Filter out the requests to the checkout service:

Place the order. The operation should complete with the 7s delay:

The checkout service communicates with the payment service, hence the delay occurred.

The 7s delay was handled gracefully by the application. Let's rise the delay to 120s to emulate a severe network congestion:

$ kubectl -n default apply -f ./release/istio/paymentservice-fault-vs-2.yaml
virtualservice.networking.istio.io/paymentservice configured

Add another item and place the order. This time the application should fail and expose the 500 Internal Server Error to the user:

In production environment, applications must always hide internal errors from the end user. In addition, the application should configure a request timeout.

Inject HTTP abort fault

In this scenario we will inject an abort fault into the productcatalog service.

Apply the following fault policy:

$ kubectl -n default apply -f ./release/istio/productcatalogservice-fault-vs-1.yaml
virtualservice.networking.istio.io/productcatalogservice configured

Inspect its details:

$ kubectl -n default describe vs productcatalogservice
Name:         productcatalogservice
Namespace:    default
...
Spec:
  Hosts:
    productcatalogservice
  Http:
    Fault:
      Abort:
        Http Status:  500
        Percentage:
          Value:  30
    Route:
      Destination:
        Host:    productcatalogservice
        Subset:  v1

The above configuration injects 500 error for 30% percent of the traffic to the productcatalog service.

Refresh the main page of the test application several times to observe the error message exposed to the user:

Again, the application should hide the internal error from the user and instead display a gentle warning, for instance:

Product Catalog is temporarily unavailable. Sorry for inconvenience.`

results matching ""

    No results matching ""