Skip to main content
Version: release-0.8

Simulate GCP faults

By creating a GCPChaos experiment, you can simulate fault scenarios of the specified GCP instance. Currently, GCPChaos supports the following fault types:

  • Node Stop: stops the specified GCP instance.
  • Node Restart: reboots the specified GCP instance.
  • Disk Loss: uninstalls the storage volume from the specified instance.

Before you start

  • By default, the GCP authentication information for local code has been imported. If you have not imported the authentication, follow the steps in Prerequisite.

  • To connect to the GCP cluster easily, you can create a Kubernetes Secret file in advance to store authentication information. A Secret file sample is as follows:

    apiVersion: v1
    kind: Secret
    metadata:
    name: cloud-key-secret-gcp
    namespace: default
    type: Opaque
    stringData:
    service_account: your-gcp-service-account-base64-encode
    • name means the Kubernetes Secret object.
    • namespace means the namespace of the Kubernetes Secret object.
    • service_account stores the service account key of your GCP cluster. Remember to complete Base64 encoding for your GCP service account key. To learn more about service account key, see Creating and managing service account keys.

Simulate fault injections by kbcli

Stop

The command below injects the node-stop fault into the specified GCP instance so that the GCP instance will be unavailable in 3 minutes.

kbcli fault node stop [node1] [node2] -c=gcp --region=us-central1-c --duration=3m

After running the above command, the node-stop command creates resources, Secret cloud-key-secret-gcp and GCPChaos node-chaos-w98j5. You can run kubectl describe node-chaos-w98j5 to verify whether the node-stop fault is injected successfully.

caution

When changing the cluster permissions, updating the key, or changing the cluster context, the cloud-key-secret-gcp must be deleted, and then the node-stop injection creates a new cloud-key-secret-gcp according to the new key.

Restart

The command below injects an node-restart fault into the specified GCP instance so that this instance will be restarted.

kbcli fault node restart [node1] [node2] -c=gcp --region=us-central1-c

Detach volume

The command below injects a detach-volume fault into the specified GCP instance so that this instance is detached from the specified storage volume within 3 minutes.

kbcli fault node detach-volume [node1] -c=gcp --region=us-central1-c --device-name=/dev/sdb

Simulate fault injections by YAML file

GCP-stop example

  1. Write the experiment configuration to the gcp-stop.yaml file.

    In the following example, Chaos Mesh injects the node-stop fault into the specified GCP instance so that the GCP instance will be unavailable in 30 seconds.

    apiVersion: chaos-mesh.org/v1alpha1
    kind: GCPChaos
    metadata:
    creationTimestamp: null
    generateName: node-chaos-
    namespace: default
    spec:
    action: node-stop
    duration: 30s
    instance: gke-yjtest-default-pool-c2ee710b-fs5q
    project: apecloud-platform-engineering
    secretName: cloud-key-secret-gcp
    zone: us-central1-c
  2. Run kubectl to start an experiment.

    kubectl apply -f ./aws-detach-volume.yaml

GCP-restart example

  1. Write the experiment configuration to the gcp-restart.yaml file.

    In the following example, Chaos Mesh injects an node-reset fault into the specified GCP instance so that this instance will be restarted.

    apiVersion: chaos-mesh.org/v1alpha1
    kind: GCPChaos
    metadata:
    creationTimestamp: null
    generateName: node-chaos-
    namespace: default
    spec:
    action: node-reset
    duration: 30s
    instance: gke-yjtest-default-pool-c2ee710b-fs5q
    project: apecloud-platform-engineering
    secretName: cloud-key-secret-gcp
    zone: us-central1-c
  2. Run kubectl to start an experiment.

    kubectl apply -f ./aws-detach-volume.yaml

GCP-detach-volume example

  1. Write the experiment configuration to the gcp-detach-volume.yaml file.

    In the following example, Chaos Mesh injects a disk-loss fault into the specified GCP instance so that this instance is detached from the specified storage volume within 30 seconds.

    apiVersion: chaos-mesh.org/v1alpha1
    kind: GCPChaos
    metadata:
    creationTimestamp: null
    generateName: node-chaos-
    namespace: default
    spec:
    action: disk-loss
    deviceNames:
    - /dev/sdb
    duration: 30s
    instance: gke-yjtest-default-pool-c2ee710b-fs5q
    project: apecloud-platform-engineering
    secretName: cloud-key-secret-gcp
    zone: us-central1-c
  2. Run kubectl to start an experiment.

    kubectl apply -f ./aws-detach-volume.yaml

Field description

The following table shows the fields in the YAML configuration file.

ParameterTypeDescriptionDefault valueRequired
actionstringIt indicates the specific type of faults. The available fault types include node-stop, node-reset, and disk-loss.node-stopYes
modestringIt indicates the mode of the experiment. The mode options include one (selecting a Pod at random), all (selecting all eligible Pods), fixed (selecting a specified number of eligible Pods), fixed-percent (selecting a specified percentage of the eligible Pods), and random-max-percent (selecting the maximum percentage of the eligible Pods).NoneYes
valuestringIt provides parameters for the mode configuration, depending on mode. For example, when mode is set to fixed-percent, value specifies the percentage of pods.NoneNo
secretNamestringIt indicates the name of the Kubernetes secret that stores the GCP authentication information.NoneNo
projectstringIt indicates the ID of GCP project.NoneYes
zonestringIndicates the region of GCP instance.NoneYes
instancestringIt indicates the name of GCP instance.NoneYes
deviceNames[]stringThis is a required field when the action is disk-loss. This field specifies the machine disk ID.NoneNo
durationstringIt indicates the duration of the experiment.NoneYes