ChaosMesh Introduction

ChaosMesh is a chaos engineering experimentation platform targeting Kubernetes environments, aimed at helping users test system stability and fault tolerance by simulating various failure scenarios.

Chaos engineering is an advanced system testing methodology that intentionally injects various faults into distributed systems, such as network errors, disk failures, CPU load, etc., to verify the system's behavior and recovery capabilities when encountering failures. This proactive testing approach helps improve system reliability and resilience.

ChaosMesh provides a set of fault injection toolkits based on Kubernetes, including various failure scenarios such as network, kernel, disk, and container, capable of simulating various abnormal situations that may occur in real environments. Users can define the types of faults to be injected, scope, duration, and other parameters through YAML configuration files or a Web UI, and then ChaosMesh will execute the corresponding fault operations on the target Pods or nodes.

During the chaos experiment, ChaosMesh continuously monitors the system status and records various metrics and log information during the system recovery process. After the experiment is completed, the collected data can be analyzed to evaluate the system's resilience and fault tolerance capabilities, thereby optimizing system design and fixing potential vulnerabilities. Furthermore, ChaosMesh supports extending new fault scenarios to meet customized requirements in different scenarios.

The emergence of ChaosMesh has lowered the barrier to chaos experiments and can help development and operation teams more efficiently and systematically test and improve the reliability of distributed systems. Through the various chaos experiment scenarios provided by ChaosMesh, it becomes easier to identify the weaknesses in the system and enhance application reliability.

In summary, as an open-source chaos engineering platform, ChaosMesh brings better system reliability assurance for Kubernetes applications. In the increasingly complex cloud-native environment, continuously conducting chaos experiments is crucial for ensuring system high availability.

Pod Failure Introduction

The Pod Failure experiment in ChaosMesh simulates a scenario where Pods in a Kubernetes cluster experience failures. This chaos engineering test is crucial for validating the high availability and fault tolerance of distributed systems.

Working Principle:

In a Kubernetes cluster, a Pod is the smallest deployable unit that encapsulates application containers. When a Pod fails, Kubernetes automatically detects the failure and takes corrective actions, such as recreating the Pod on another node or restarting it if possible. The Pod Failure experiment simulates the unavailability of Pods to evaluate how the system handles unexpected disruptions and recovers from them.

Implementation Method:

ChaosMesh uses the PodChaos feature to simulate Pod failures. Users can configure the Pod Failure experiment in a YAML file by setting the following parameters:

mode: Specifies the experiment object selection mode (one/all/fixed/fixed-percent, etc.)
selector: Specifies the label selector for the target Pods

Significance:

- Verifying High Availability

Simulating Pod failures helps verify whether the system can automatically detect and recover from Pod failures, ensuring service availability even under adverse conditions.

- Testing Failover Mechanisms

This experiment tests the system's ability to handle Pod failures, including rescheduling Pods on other nodes and ensuring data consistency during recovery.

- Identifying Potential Issues

Real-world Pod failures may expose hidden issues such as configuration errors, resource contention, or bugs in application code. Simulating these failures helps identify and resolve these problems in advance.

- Enhancing System Robustness

By exposing the system to controlled Pod failure scenarios, developers and operations teams can gain deeper insights into system behavior and improve its robustness and resilience. In summary, using ChaosMesh to simulate Pod failures is an effective method to ensure the reliability and fault tolerance of Kubernetes-based applications. It enables teams to identify weaknesses in the system before they become problematic in production environments and take corrective actions.

Delete Pod Introduction

Simulating a database instance failover scenario by deleting a Pod involves the following principles and significance:

Principles:

In a Kubernetes cluster, stateful database components are typically deployed as StatefulSets, with each Pod corresponding to a database instance. When a running database Pod is deleted, Kubernetes automatically recreates a new Pod using the same persistent data volume. The newly created Pod needs to perform database recovery operations based on the existing data and rejoin the database cluster. This process simulates a database instance failover.

Significance:

-Verifying the high availability of the database cluster

By triggering a failover by deleting a Pod, you can check whether the database cluster can automatically recover when an instance fails, and whether the new instance can properly rejoin the cluster and provide services.

-Testing the performance of database failover

During the failover process, you can observe various metrics of the database cluster, such as failure detection time, new instance rebuilding time, data recovery time, etc., and evaluate whether the failover performance meets the requirements.

-Discovering potential issues with database failover

Real-world failover scenarios may expose some failures and edge cases, such as data inconsistency, split-brain, etc. Simulation helps to discover and fix these potential issues.

-Enhancing the robustness of applications

Applications need to have the ability to gracefully handle database failovers. Various failover exceptions discovered during experiments can be used to improve the design and implementation of applications. In summary, simulating database failover by leveraging Kubernetes Pods is an efficient and controlled chaos engineering practice that helps to improve the reliability and resilience of distributed systems.

Kill process 1 Introduction

When testing the database failover scenario, we observe whether the database instance in KubeBlocks can switch over normally by killing the number 1 process in the main container of the Pod. The principle and significance are as follows:

Principle:

In the Linux system, each running process has a unique process ID (PID), and the number 1 process is the initialization process of the entire system. The database process is usually not the number 1 process, but rather a child process forked from the number 1 process. When the number 1 process is killed, all its child processes, including the database process, will also be terminated. As a container orchestration system, KubeBlocks will detect the exit of the main process in the Pod, triggering the Pod rebuild process. During the rebuild process, KubeBlocks will start a new database instance Pod and perform data recovery based on the existing data, thereby completing the failover.

Significance:

-Simulating actual process crash scenarios

In production environments, processes may crash due to code errors, resource depletion, etc. Killing the number 1 process can simulate this extreme situation.

-Testing edge cases for instance failover

Directly deleting a Pod to trigger failover is a common case, while killing the number 1 process is an edge abnormal case, which can more comprehensively test the robustness of the failover mechanism.

-Verifying the automatic recovery capability of the container orchestration system

When the main process exits abnormally, KubeBlocks needs to be able to quickly detect and rebuild the Pod, ensuring service continuity. This is a test of its automatic recovery capability.

-Discovering potential system vulnerabilities and defects

Extreme exceptional situations may expose potential system vulnerabilities and defects, such as resource leaks, deadlocks, etc., which can help discover and fix problems in advance. In summary, killing the number 1 process is an extreme approach to inducing database failover, which can subject KubeBlocks' database high availability and reliability to more stringent verification, thereby enhancing the robustness of the system.

OOM Introduction

Testing the Failover operation of database instance Pods under OOM (Out of Memory) situations in Kubernetes is crucial, as it can verify the high availability and fault tolerance capabilities of the database cluster.

OOM refers to the situation where the system memory is insufficient, causing processes within the Pod to be unable to allocate the required memory resources. In such cases, Kubernetes will select one or more processes based on their OOM Score values within the Pod and perform memory eviction to release memory and ensure the stability of the entire system. You can deterministically trigger OOM for a specific process by setting its OOM Score to -1000, thereby simulating a memory shortage scenario.

In the Failover test, you will select a specific Replica Pod of the database instance as the OOM object. Since a database instance typically consists of a Primary node and multiple Secondary nodes, where the Primary node is responsible for write operations and the Secondary nodes are responsible for backup and read operations. Therefore, when the Primary Pod encounters OOM, it is necessary to verify whether the entire cluster can correctly execute the Failover operation, promote one of the Secondary Pods to become the new Primary, and adjust the roles and perform data replication for the other Pods to ensure the availability and data consistency of the cluster.

During the testing process, the test program will trigger OOM and closely monitor the entire Failover process. Once the Failover is detected as completed, it will verify the role labels of each Pod to confirm whether the new Primary node and Secondary nodes have been properly switched. Additionally, it will check whether the new Primary node can provide write services normally, and whether the Secondary nodes can correctly replicate data and provide read services.

Through this approach, the test system can comprehensively test the fault tolerance capability of the database instance under extreme memory pressure, ensuring that even in the event of node failure, the entire cluster can quickly recover and continue to provide services. This is crucial for ensuring the high availability and data consistency of the database, especially when running critical business in production environments.

In summary, the role of the OOM mechanism in Failover testing is to simulate extreme resource pressure situations, verify whether the database instance can correctly execute high availability policies in the event of failure, and evaluate the fault tolerance capabilities and reliability of the entire cluster through the verification of Failover results.

Full CPU Introduction

ChaosMesh's CPU Stress fault experiment aims to simulate scenarios where the system encounters CPU resource stress. Its working principle and implementation method are as follows:

Working Principle:

ChaosMesh injects a process that consumes a large amount of CPU resources into the target Pod, thereby reducing the available CPU resources of the Pod and triggering the system's capacity planning or auto-scaling mechanisms.

Implementation Method:

ChaosMesh utilizes the Kubernetes ChaosContainer mechanism to inject a container named chaos-cpu into the target Pod. This chaos-cpu container runs a stress-ng process that continuously consumes CPU resources using algorithms such as prime number calculation.

Users can set the following parameters in the configuration file for the CPU Stress experiment:

workers: Specifies the number of threads consuming CPU resources
load: Specifies the percentage of CPU resources to be consumed (0-100)
mode: Specifies the experiment object selection mode (one/all/fixed/fixed-percent, etc.)
selector: Specifies the label selector for the target Pods

During the experiment, ChaosMesh continuously monitors the CPU usage of the target Pods. After the experiment ends, ChaosMesh will automatically stop the chaos-cpu container and release the occupied CPU resources.

Through the CPU Stress fault experiment, users can evaluate the system's response to CPU resource stress scenarios, such as scaling or priority preemption, thereby optimizing resource scheduling and improving system robustness.

Network Delay Introduction

ChaosMesh's Network Delay fault experiment aims to simulate network latency scenarios. Its working principle and implementation method are as follows:

Working Principle:

ChaosMesh configures the corresponding network rules on the node where the target Pod resides, artificially introducing network latency, thereby simulating the situation where network communication between the Pod and other resources experiences delay.

Implementation Method:

ChaosMesh utilizes the NetEm (Network Emulator) module in the Linux kernel and injects network delay rules on a specific network interface of the node where the target Pod resides. Specifically, it runs the tc (traffic control) command on the node to invoke the delay queue mechanism of NetEm.

Users can set the following key parameters in the configuration file for the Network Delay experiment:

latency: Specifies the network delay time to be injected, such as 10ms
correlation: Specifies the correlation between the current delay and the previous delay (0-100)
jitter: Specifies the range of delay jitter By setting these parameters, ChaosMesh can simulate various complex network delay scenarios, such as fixed delay, random delay, and correlated delay.

During the experiment, ChaosMesh continuously monitors the network status. After the experiment ends, ChaosMesh automatically cleans up the previously injected network rules and restores the normal network.

Through the Network Delay experiment, users can evaluate the stability and availability of distributed systems under network delay conditions, thereby optimizing network policies and improving system robustness.

Network Loss Introduction

ChaosMesh's Network Loss fault experiment aims to simulate packet loss scenarios. Its working principle and implementation method are as follows:

Working Principle:

ChaosMesh configures the corresponding network rules on the node where the target Pod resides, artificially introducing packet loss, thereby simulating the situation where network communication between the Pod and other resources experiences packet loss.

Implementation Method:

ChaosMesh utilizes the NetEm (Network Emulator) module in the Linux kernel and injects packet loss rules on a specific network interface of the node where the target Pod resides. Specifically, it runs the tc (traffic control) command on the node to invoke the packet loss mechanism of NetEm.

Users can set the following key parameters in the configuration file for the Network Loss experiment:

loss: Specifies the packet loss rate to be injected (0-100%)
correlation: Specifies the correlation between the current packet loss and the previous packet loss (0-100)

By setting these parameters, ChaosMesh can simulate various complex packet loss scenarios, such as fixed loss, random loss, and correlated loss.

Through the Network Loss experiment, users can evaluate the stability and availability of distributed systems under packet loss conditions, thereby optimizing network policies and improving system robustness.

Network Duplicate Introduction

ChaosMesh's Network Duplicate fault experiment aims to simulate packet duplication scenarios.

Working Principle:

ChaosMesh configures the corresponding network rules on the node where the target Pod resides, artificially introducing packet duplication, thereby simulating the situation where network communication between the Pod and other resources experiences duplicate packets.

Implementation Method:

ChaosMesh utilizes the NetEm (Network Emulator) module in the Linux kernel and injects packet duplication rules on a specific network interface of the node where the target Pod resides. Specifically, it runs the tc command on the node to invoke the packet duplication mechanism of NetEm.

Users can set the following key parameters in the configuration file for the Network Duplicate experiment:

duplicate: Specifies the probability of packet duplication (0-100%)
correlation: Specifies the correlation between consecutive duplication events (0-100)

By setting these parameters, ChaosMesh can simulate various complex packet duplication scenarios.

Through the Network Duplicate experiment, users can evaluate the stability and availability of distributed systems under packet duplication conditions, thereby optimizing network policies and improving system robustness.

Network Corrupt Introduction

ChaosMesh's Network Corrupt fault experiment aims to simulate packet corruption scenarios.

Working Principle:

ChaosMesh configures the corresponding network rules on the node where the target Pod resides, artificially introducing packet corruption, thereby simulating the situation where network communication between the Pod and other resources experiences corrupted packets.

Implementation Method:

ChaosMesh utilizes the NetEm (Network Emulator) module in the Linux kernel and injects packet corruption rules on a specific network interface of the node where the target Pod resides. Specifically, it runs the tc command on the node to invoke the packet corruption mechanism of NetEm.

Users can set the following key parameters in the configuration file for the Network Corrupt experiment:

corrupt: Specifies the probability of packet corruption (0-100%)
correlation: Specifies the correlation between consecutive corruption events (0-100)

By setting these parameters, ChaosMesh can simulate various complex packet corruption scenarios.

Through the Network Corrupt experiment, users can evaluate the stability and availability of distributed systems under packet corruption conditions, thereby optimizing network policies and improving system robustness.

Network Partition Introduction

ChaosMesh's Network Partition fault experiment aims to simulate network partitioning scenarios.

Working Principle:

ChaosMesh isolates certain nodes in the cluster, creating independent network partitions, to test the system's fault tolerance and consistency.

Implementation Method:

ChaosMesh uses iptables or nftables rules to create network isolation between target nodes. These rules can block traffic to specific IP addresses or ports.

Users can set the following key parameters in the configuration file for the Network Partition experiment:

partitions: Defines the range of network partitions to be isolated
duration: Specifies the duration of the partition

Through the Network Partition experiment, users can evaluate the stability and availability of distributed systems under network partition conditions, thereby optimizing network policies and improving system robustness.

Network Bandwidth Introduction

ChaosMesh's Network Bandwidth fault experiment aims to simulate bandwidth limitation scenarios.

Working Principle:

ChaosMesh restricts the network bandwidth available to the target Pod, thereby simulating low-bandwidth network environments.

Implementation Method:

ChaosMesh utilizes the NetEm (Network Emulator) module and the HTB (Hierarchical Token Bucket) queue discipline in the Linux kernel to inject bandwidth limitations on a specific network interface of the node where the target Pod resides. Specifically, it runs the tc command on the node to invoke the bandwidth control mechanisms.

Users can set the following key parameters in the configuration file for the Network Bandwidth experiment:

rate: Specifies the maximum bandwidth limit (e.g., 1Mbps, 10Mbps)
limit: Specifies the queue length
buffer: Specifies the buffer size

By setting these parameters, ChaosMesh can simulate various complex bandwidth limitation scenarios.

Through the Network Bandwidth experiment, users can evaluate the stability and availability of distributed systems under bandwidth limitation conditions, thereby optimizing network policies and improving system robustness.

TimeOffset Introduction

ChaosMesh's Time Chaos feature aims to simulate time offset scenarios. Its working principle and implementation method are as follows:

Working Principle:

ChaosMesh modifies the target process's perception of time, causing it to mistakenly believe that the system time has shifted, thereby simulating a time error scenario. Such time errors may cause abnormal process behavior, such as timer triggering errors, inaccurate scheduled task execution times, etc.

Implementation Method:

ChaosMesh utilizes the time-related system calls (clock_gettime, clock_settime, etc.) provided by the Linux system to control time. When initiating a Time Chaos attack, ChaosMesh attaches to the target process, intercepts its requests for time-related system calls, and returns artificially modified time values, thereby deceiving the process.

Specifically, ChaosMesh attaches to the target process using the ptrace system call, modifies the system call parameters and return values in its memory, and injects the desired time offset. Users can specify the following in the attack command:

timeOffset: The time offset to be injected, e.g., +30m indicates a 30-minute shift forward
clockIds: The clock type to be affected, e.g., CLOCK_REALTIME affects the real-time clock
pid: The ID of the target process to be attacked By setting these parameters, ChaosMesh can simulate various time error scenarios, such as time advancing, time lagging, and specific clock type errors.

During the attack process, ChaosMesh continuously modifies the target process's perception of time. After the attack ends, ChaosMesh automatically removes the previously injected time modifications and restores the normal time.

Through the Time Chaos experiment, users can evaluate the robustness of distributed systems under clock error conditions, identify potential time-related bugs, optimize time handling strategies, and improve system reliability.

Evicting Pod Introduction

Evicting a Pod is an operation that forcibly removes a Pod from its current node, which can be used to simulate scenarios where nodes become unavailable or need maintenance. This operation helps verify the system's high availability and fault tolerance capabilities.

Working Principle:

In a Kubernetes cluster, Pods are scheduled on nodes based on resource availability and scheduling policies. When a node becomes unhealthy or requires maintenance, it may be necessary to evict all Pods running on that node. Evicting a Pod triggers the following actions:

The eviction command sends a signal to the kubelet on the node to terminate the Pod gracefully.
The kubelet then starts terminating the Pod by sending termination signals to the containers within the Pod.
Once the Pod is terminated, Kubernetes reschedules the Pod on another healthy node in the cluster.
The new node initializes the Pod using the same persistent data volume (if applicable) and performs any necessary recovery operations.

Implementation Method:

To evict Pods from a node, you can use the kubectl drain command, which safely evicts all Pods from the specified node, ensuring minimal disruption to services.

The kubectl drain command performs the following steps: 1. Mark the Node as Unschedulable: Prevents new Pods from being scheduled on the node. 2. Evict All Pods: Gracefully terminates all Pods on the node except DaemonSets (unless --ignore-daemonsets is specified). 3. Reschedule Pods: Kubernetes automatically reschedules the evicted Pods on other healthy nodes in the cluster.

Significance:

-Verifying High Availability

By evicting Pods using kubectl drain, you can check whether the system can automatically recover when a node becomes unavailable, ensuring that services remain available and resilient.

-Testing Failover Mechanisms

Eviction simulates real-world scenarios where nodes may fail or require maintenance, allowing you to test the failover mechanisms and ensure that the system can handle such situations without significant downtime.

-Discovering Potential Issues

Real-world eviction scenarios may expose issues such as slow Pod termination, resource allocation problems, or service disruptions. Simulating these scenarios helps identify and address potential issues.

-Enhancing System Robustness

Regularly testing eviction scenarios improves the overall robustness of the system by identifying and fixing vulnerabilities, ensuring that the system can handle unexpected node failures or maintenance activities.

In summary, evicting Pods using kubectl drain is a critical chaos engineering practice that helps improve the reliability and resilience of distributed systems by simulating node unavailability and verifying failover mechanisms. This method ensures that the eviction process is handled gracefully, minimizing service disruptions and enhancing system stability.

Connection Stress Introduction

Connection Stress fault experiment aims to simulate scenarios where the system encounters a high volume of connection requests, thereby testing the system's ability to handle connection stress. This is achieved by generating a large number of concurrent connections directly to the target service or Pod.

Working Principle:

The experiment involves creating a large number of client connections to the target service, exhausting available connection resources and triggering the system's capacity planning or auto-scaling mechanisms. This approach mimics real-world traffic spikes and evaluates how the system handles high connection loads.

Implementation Method:

To generate a large number of connections, a custom script or tool is used that runs on a separate machine or within a dedicated container. This tool continuously establishes connections to the target service, simulating high connection stress.

Users can set the following parameters for the Connection Stress experiment:

connections: Specifies the total number of connections to establish.
concurrent: Specifies the number of concurrent connections to maintain.
duration: Specifies the duration of the connection stress test.
target_service: Specifies the endpoint or service to which connections are made.
interval: Specifies the interval between connection attempts (if applicable).

During the experiment, the system continuously monitors the connection status and resource usage of the target Pods. After the experiment ends, the connection generation tool stops, and the system returns to normal operation.

Through the Connection Stress fault experiment, users can evaluate the system's response to high connection load scenarios, such as scaling or connection handling limits, thereby optimizing resource scheduling and improving system robustness.

Significance:

-Testing Connection Handling Limits

By injecting a large number of connections, you can verify whether the system can handle the maximum number of connections it is designed for and identify any bottlenecks or limitations.

-Evaluating Auto-scaling Mechanisms

Under high connection stress, the system may trigger auto-scaling policies to add more resources. This experiment helps evaluate the effectiveness and responsiveness of these mechanisms.

-Identifying Potential Issues

High connection loads can expose issues such as connection leaks, timeouts, or degraded performance. Identifying these issues early can help improve system stability and reliability.

-Optimizing Resource Allocation

Understanding how the system behaves under connection stress allows for better resource allocation and capacity planning, ensuring optimal performance even under heavy loads.

In summary, the Connection Stress experiment is an essential tool for evaluating the system's ability to handle high connection loads, ensuring that it can maintain performance and availability even under extreme conditions. By directly generating a large number of connections, this method provides a realistic simulation of high-traffic scenarios, helping to identify and address potential issues in advance.

DNS Error Introduction

The DNS Error experiment in ChaosMesh is designed to simulate DNS resolution failure scenarios. Below are the working principles and implementation methods:

Working Principle:

ChaosMesh configures network rules on the nodes where target Pods reside, artificially introducing DNS resolution failures. This simulates situations where Pods encounter DNS resolution errors when communicating with external resources.

Implementation Method:

ChaosMesh leverages Linux kernel modules like Netfilter (such as iptables or nftables) to inject DNS resolution failure rules on the node hosting the target Pod. Specifically, it runs commands on the node to intercept and tamper with DNS queries, causing them to return erroneous responses or no response at all.

Users can define a DNS Error experiment by setting the following key parameters in the configuration file:

action: Specifies the action to perform (error indicates injecting an error)
patterns: Selects a domain template that matches faults.

During the experiment, ChaosMesh continuously monitors the network state. After the experiment ends, ChaosMesh automatically cleans up the injected network rules and restores the normal network environment. By conducting a DNS Error experiment, users can evaluate the stability and availability of distributed systems under DNS resolution failure conditions, thereby optimizing DNS configurations and enhancing system robustness.

DNS Random Introduction

The DNS Random experiment in ChaosMesh is designed to simulate scenarios where DNS resolution results are randomized. Below are the working principles and implementation methods:

Working Principle:

ChaosMesh configures network rules on the nodes where target Pods reside, artificially introducing randomization in DNS resolution results. This simulates situations where Pods encounter inconsistent DNS resolution results when communicating with external resources.

Implementation Method:

ChaosMesh leverages Linux kernel modules like Netfilter (such as iptables or nftables) to inject DNS resolution randomization rules on the node hosting the target Pod. Specifically, it runs commands on the node to intercept and tamper with DNS queries, causing them to return randomly generated IP addresses or other resolution results.

Users can define a DNS Random experiment by setting the following key parameters in the configuration file:

action: Specifies the action to perform (random indicates randomizing resolution results)
patterns: Selects a domain template that matches faults.

During the experiment, ChaosMesh continuously monitors the network state. After the experiment ends, ChaosMesh automatically cleans up the injected network rules and restores the normal network environment. By conducting a DNS Random experiment, users can evaluate the stability and availability of distributed systems under DNS resolution result randomization conditions, thereby optimizing DNS configurations and enhancing system robustness.

Postgresql ( Topology = replication ) failover

Engine	FailoverOps	State	Props	Description
Postgresql ( Topology = replication )	DNS Random	PASSED	HA=DNS Random Durations=2m ComponentName=postgresql	Simulates conditions where pods experience random IP addresses being returned by the DNS service for a period of time either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to the DNS service returning random IP addresses.
	DNS Error	PASSED	HA=DNS Error Durations=2m ComponentName=postgresql	Simulates conditions where pods experience DNS service errors for a period of time either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to DNS service errors.
	OOM	PASSED	HA=OOM Durations=2m ComponentName=postgresql	Simulates conditions where pods experience OOM either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high Memory load.
	Network Loss	PASSED	HA=Network Loss Durations=2m ComponentName=postgresql	Simulates network loss fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to loss network.
	Full CPU	PASSED	HA=Full CPU Durations=2m ComponentName=postgresql	Simulates conditions where pods experience CPU full either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high CPU load.
	Time Offset	PASSED	HA=Time Offset Durations=2m ComponentName=postgresql	Simulates a time offset scenario thereby testing the application's resilience to potential slowness/unavailability of some replicas due to time offset.
	Evicting Pod	PASSED	HA=Evicting Pod ComponentName=postgresql	Simulates conditions where pods evicting either due to node drained thereby testing the application's resilience to unavailability of some replicas due to evicting.
	Network Bandwidth	PASSED	HA=Network Bandwidth Durations=2m ComponentName=postgresql	Simulates network bandwidth fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to bandwidth network.
	Network Partition	PASSED	HA=Network Partition Durations=2m ComponentName=postgresql	Simulates network partition fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to partition network.
	Kill 1	PASSED	HA=Kill 1 ComponentName=postgresql	Simulates conditions where process 1 killed either due to expected/undesired processes thereby testing the application's resilience to unavailability of some replicas due to abnormal termination signals.
	Network Corrupt	PASSED	HA=Network Corrupt Durations=2m ComponentName=postgresql	Simulates network corrupt fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to corrupt network.
	Network Duplicate	PASSED	HA=Network Duplicate Durations=2m ComponentName=postgresql	Simulates network duplicate fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to duplicate network.
	Pod Failure	PASSED	HA=Pod Failure Durations=2m ComponentName=postgresql	Simulates conditions where pods experience failure for a period of time either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to failure.
	Connection Stress	PASSED	HA=Connection Stress ComponentName=postgresql	Simulates conditions where pods experience connection stress either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high Connection load.
	Network Delay	PASSED	HA=Network Delay Durations=2m ComponentName=postgresql	Simulates network delay fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to delay network.

Redis ( Topology = replication ) failover

Engine	FailoverOps	State	Props	Description
Redis ( Topology = replication )	Evicting Pod	PASSED	HA=Evicting Pod ComponentName=redis	Simulates conditions where pods evicting either due to node drained thereby testing the application's resilience to unavailability of some replicas due to evicting.
	Network Loss	PASSED	HA=Network Loss Durations=2m ComponentName=redis	Simulates network loss fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to loss network.
	Network Delay Failover	PASSED	HA=Network Delay Failover Durations=2m ComponentName=redis	Simulates network delay fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to delay network.
	Pod Failure	PASSED	HA=Pod Failure Durations=2m ComponentName=redis	Simulates conditions where pods experience failure for a period of time either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to failure.
	Network Corrupt	PASSED	HA=Network Corrupt Durations=2m ComponentName=redis	Simulates network corrupt fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to corrupt network.
	Time Offset	PASSED	HA=Time Offset Durations=2m ComponentName=redis	Simulates a time offset scenario thereby testing the application's resilience to potential slowness/unavailability of some replicas due to time offset.
	Network Bandwidth	PASSED	HA=Network Bandwidth Durations=2m ComponentName=redis	Simulates network bandwidth fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to bandwidth network.
	DNS Error	PASSED	HA=DNS Error Durations=2m ComponentName=redis	Simulates conditions where pods experience DNS service errors for a period of time either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to DNS service errors.
	DNS Random	PASSED	HA=DNS Random Durations=2m ComponentName=redis	Simulates conditions where pods experience random IP addresses being returned by the DNS service for a period of time either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to the DNS service returning random IP addresses.
	Network Duplicate	PASSED	HA=Network Duplicate Durations=2m ComponentName=redis	Simulates network duplicate fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to duplicate network.
	Connection Stress	PASSED	HA=Connection Stress ComponentName=redis	Simulates conditions where pods experience connection stress either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high Connection load.
	Full CPU Failover	PASSED	HA=Full CPU Failover Durations=2m ComponentName=redis	Simulates conditions where pods experience CPU full either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high CPU load.
	Kill 1	PASSED	HA=Kill 1 ComponentName=redis	Simulates conditions where process 1 killed either due to expected/undesired processes thereby testing the application's resilience to unavailability of some replicas due to abnormal termination signals.
	OOM	PASSED	HA=OOM Durations=2m ComponentName=redis	Simulates conditions where pods experience OOM either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high Memory load.
	Network Partition	PASSED	HA=Network Partition Durations=2m ComponentName=redis	Simulates network partition fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to partition network.

Kafka failover

Engine	FailoverOps	State	Props	Description
Kafka	Network Bandwidth	PASSED	HA=Network Bandwidth Durations=2m ComponentName=kafka-combine	Simulates network bandwidth fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to bandwidth network.
	Network Corrupt	PASSED	HA=Network Corrupt Durations=2m ComponentName=kafka-combine	Simulates network corrupt fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to corrupt network.
	DNS Error	PASSED	HA=DNS Error Durations=2m ComponentName=kafka-combine	Simulates conditions where pods experience DNS service errors for a period of time either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to DNS service errors.
	Network Loss	PASSED	HA=Network Loss Durations=2m ComponentName=kafka-combine	Simulates network loss fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to loss network.
	Evicting Pod	PASSED	HA=Evicting Pod ComponentName=kafka-combine	Simulates conditions where pods evicting either due to node drained thereby testing the application's resilience to unavailability of some replicas due to evicting.
	OOM	PASSED	HA=OOM Durations=2m ComponentName=kafka-combine	Simulates conditions where pods experience OOM either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high Memory load.
	DNS Random	PASSED	HA=DNS Random Durations=2m ComponentName=kafka-combine	Simulates conditions where pods experience random IP addresses being returned by the DNS service for a period of time either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to the DNS service returning random IP addresses.
	Network Duplicate	PASSED	HA=Network Duplicate Durations=2m ComponentName=kafka-combine	Simulates network duplicate fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to duplicate network.
	Kill 1	PASSED	HA=Kill 1 ComponentName=kafka-combine	Simulates conditions where process 1 killed either due to expected/undesired processes thereby testing the application's resilience to unavailability of some replicas due to abnormal termination signals.
	Time Offset	PASSED	HA=Time Offset Durations=2m ComponentName=kafka-combine	Simulates a time offset scenario thereby testing the application's resilience to potential slowness/unavailability of some replicas due to time offset.
	Network Partition	PASSED	HA=Network Partition Durations=2m ComponentName=kafka-combine	Simulates network partition fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to partition network.
	Network Delay	PASSED	HA=Network Delay Durations=2m ComponentName=kafka-combine	Simulates network delay fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to delay network.
	Full CPU	PASSED	HA=Full CPU Durations=2m ComponentName=kafka-combine	Simulates conditions where pods experience CPU full either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high CPU load.
	Pod Failure	PASSED	HA=Pod Failure Durations=2m ComponentName=kafka-combine	Simulates conditions where pods experience failure for a period of time either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to failure.
	Connection Stress	PASSED	HA=Connection Stress ComponentName=kafka-combine	Simulates conditions where pods experience connection stress either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high Connection load.

Qdrant ( Topology = cluster ) failover

Engine	FailoverOps	State	Props	Description
Qdrant ( Topology = cluster )	Connection Stress	PASSED	HA=Connection Stress ComponentName=qdrant	Simulates conditions where pods experience connection stress either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high Connection load.

Mysql ( Topology = replication ) failover

Engine	FailoverOps	State	Props	Description
Mysql ( Topology = replication )	Network Loss	PASSED	HA=Network Loss Durations=2m ComponentName=mysql	Simulates network loss fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to loss network.
	Evicting Pod	FAILED	HA=Evicting Pod ComponentName=mysql	Simulates conditions where pods evicting either due to node drained thereby testing the application's resilience to unavailability of some replicas due to evicting.
	Kill 1	PASSED	HA=Kill 1 ComponentName=mysql	Simulates conditions where process 1 killed either due to expected/undesired processes thereby testing the application's resilience to unavailability of some replicas due to abnormal termination signals.
	Network Bandwidth	PASSED	HA=Network Bandwidth Durations=2m ComponentName=mysql	Simulates network bandwidth fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to bandwidth network.
	Network Delay	PASSED	HA=Network Delay Durations=2m ComponentName=mysql	Simulates network delay fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to delay network.
	Connection Stress	PASSED	HA=Connection Stress ComponentName=mysql	Simulates conditions where pods experience connection stress either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high Connection load.
	Pod Failure	PASSED	HA=Pod Failure Durations=2m ComponentName=mysql	Simulates conditions where pods experience failure for a period of time either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to failure.
	Network Corrupt	PASSED	HA=Network Corrupt Durations=2m ComponentName=mysql	Simulates network corrupt fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to corrupt network.
	OOM	PASSED	HA=OOM Durations=2m ComponentName=mysql	Simulates conditions where pods experience OOM either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high Memory load.
	DNS Error	PASSED	HA=DNS Error Durations=2m ComponentName=mysql	Simulates conditions where pods experience DNS service errors for a period of time either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to DNS service errors.
	Network Partition	PASSED	HA=Network Partition Durations=2m ComponentName=mysql	Simulates network partition fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to partition network.
	Network Duplicate	PASSED	HA=Network Duplicate Durations=2m ComponentName=mysql	Simulates network duplicate fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to duplicate network.
	DNS Random	PASSED	HA=DNS Random Durations=2m ComponentName=mysql	Simulates conditions where pods experience random IP addresses being returned by the DNS service for a period of time either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to the DNS service returning random IP addresses.
	Full CPU	FAILED	HA=Full CPU Durations=2m ComponentName=mysql	Simulates conditions where pods experience CPU full either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high CPU load.
	Time Offset	PASSED	HA=Time Offset Durations=2m ComponentName=mysql	Simulates a time offset scenario thereby testing the application's resilience to potential slowness/unavailability of some replicas due to time offset.

Clickhouse ( Topology = cluster ) failover

Engine	FailoverOps	State	Props	Description
Clickhouse ( Topology = cluster )	Kill 1	PASSED	HA=Kill 1 ComponentName=clickhouse	Simulates conditions where process 1 killed either due to expected/undesired processes thereby testing the application's resilience to unavailability of some replicas due to abnormal termination signals.
	DNS Random	PASSED	HA=DNS Random Durations=2m ComponentName=clickhouse	Simulates conditions where pods experience random IP addresses being returned by the DNS service for a period of time either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to the DNS service returning random IP addresses.
	DNS Error	PASSED	HA=DNS Error Durations=2m ComponentName=clickhouse	Simulates conditions where pods experience DNS service errors for a period of time either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to DNS service errors.
	Pod Failure	PASSED	HA=Pod Failure Durations=2m ComponentName=clickhouse	Simulates conditions where pods experience failure for a period of time either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to failure.
	Network Delay	PASSED	HA=Network Delay Durations=2m ComponentName=clickhouse	Simulates network delay fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to delay network.
	Network Partition	PASSED	HA=Network Partition Durations=2m ComponentName=clickhouse	Simulates network partition fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to partition network.
	Evicting Pod	PASSED	HA=Evicting Pod ComponentName=clickhouse	Simulates conditions where pods evicting either due to node drained thereby testing the application's resilience to unavailability of some replicas due to evicting.
	Network Loss	PASSED	HA=Network Loss Durations=2m ComponentName=clickhouse	Simulates network loss fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to loss network.
	Time Offset	PASSED	HA=Time Offset Durations=2m ComponentName=clickhouse	Simulates a time offset scenario thereby testing the application's resilience to potential slowness/unavailability of some replicas due to time offset.
	Connection Stress	PASSED	HA=Connection Stress ComponentName=clickhouse	Simulates conditions where pods experience connection stress either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high Connection load.
	Network Bandwidth	PASSED	HA=Network Bandwidth Durations=2m ComponentName=clickhouse	Simulates network bandwidth fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to bandwidth network.
	Full CPU	PASSED	HA=Full CPU Durations=2m ComponentName=clickhouse	Simulates conditions where pods experience CPU full either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high CPU load.
	Network Corrupt	PASSED	HA=Network Corrupt Durations=2m ComponentName=clickhouse	Simulates network corrupt fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to corrupt network.
	Network Duplicate	PASSED	HA=Network Duplicate Durations=2m ComponentName=clickhouse	Simulates network duplicate fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to duplicate network.
	OOM	PASSED	HA=OOM Durations=2m ComponentName=clickhouse	Simulates conditions where pods experience OOM either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high Memory load.

ElasticSearch ( Topology = multi-node ) failover

Engine	FailoverOps	State	Props	Description
ElasticSearch ( Topology = multi-node )	Connection Stress	PASSED	HA=Connection Stress ComponentName=master	Simulates conditions where pods experience connection stress either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high Connection load.

OceanBase Ent ( Topology = distribution ) failover

Engine	FailoverOps	State	Props	Description
OceanBase Ent ( Topology = distribution )	Connection Stress	PASSED	HA=Connection Stress ComponentName=oceanbase	Simulates conditions where pods experience connection stress either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high Connection load.

Starrocks Ent ( Topology = shared-nothing ) failover

Engine	FailoverOps	State	Props	Description
Starrocks Ent ( Topology = shared-nothing )	Connection Stress	PASSED	HA=Connection Stress ComponentName=be	Simulates conditions where pods experience connection stress either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high Connection load.

Minio failover

Engine	FailoverOps	State	Props	Description
Minio	Connection Stress	PASSED	HA=Connection Stress ComponentName=minio	Simulates conditions where pods experience connection stress either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high Connection load.

Damengdb ( Topology = realtime-replication ) failover

Engine	FailoverOps	State	Props	Description
Damengdb ( Topology = realtime-replication )	Kill 1	PASSED	HA=Kill 1 ComponentName=dmdb	Simulates conditions where process 1 killed either due to expected/undesired processes thereby testing the application's resilience to unavailability of some replicas due to abnormal termination signals.
	Pod Failure	PASSED	HA=Pod Failure Durations=2m ComponentName=dmdb	Simulates conditions where pods experience failure for a period of time either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to failure.
	Network Bandwidth	PASSED	HA=Network Bandwidth Durations=2m ComponentName=dmdb	Simulates network bandwidth fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to bandwidth network.
	Network Corrupt	PASSED	HA=Network Corrupt Durations=2m ComponentName=dmdb	Simulates network corrupt fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to corrupt network.
	Network Partition	PASSED	HA=Network Partition Durations=2m ComponentName=dmdb	Simulates network partition fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to partition network.
	Network Delay	PASSED	HA=Network Delay Durations=2m ComponentName=dmdb	Simulates network delay fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to delay network.
	OOM	PASSED	HA=OOM Durations=2m ComponentName=dmdb	Simulates conditions where pods experience OOM either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high Memory load.
	Network Duplicate	PASSED	HA=Network Duplicate Durations=2m ComponentName=dmdb	Simulates network duplicate fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to duplicate network.
	DNS Random	PASSED	HA=DNS Random Durations=2m ComponentName=dmdb	Simulates conditions where pods experience random IP addresses being returned by the DNS service for a period of time either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to the DNS service returning random IP addresses.
	Connection Stress	PASSED	HA=Connection Stress ComponentName=dmdb	Simulates conditions where pods experience connection stress either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high Connection load.
	Time Offset	PASSED	HA=Time Offset Durations=2m ComponentName=dmdb	Simulates a time offset scenario thereby testing the application's resilience to potential slowness/unavailability of some replicas due to time offset.
	Evicting Pod	FAILED	HA=Evicting Pod ComponentName=dmdb	Simulates conditions where pods evicting either due to node drained thereby testing the application's resilience to unavailability of some replicas due to evicting.
	DNS Error	PASSED	HA=DNS Error Durations=2m ComponentName=dmdb	Simulates conditions where pods experience DNS service errors for a period of time either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to DNS service errors.
	Full CPU	PASSED	HA=Full CPU Durations=2m ComponentName=dmdb	Simulates conditions where pods experience CPU full either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high CPU load.
	Network Loss	PASSED	HA=Network Loss Durations=2m ComponentName=dmdb	Simulates network loss fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to loss network.

Kingbase ( Topology = kingbase-cluster ) failover

Engine	FailoverOps	State	Props	Description
Kingbase ( Topology = kingbase-cluster )	Network Partition	PASSED	HA=Network Partition Durations=2m ComponentName=kingbase	Simulates network partition fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to partition network.
	OOM	PASSED	HA=OOM Durations=2m ComponentName=kingbase	Simulates conditions where pods experience OOM either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high Memory load.
	Time Offset	PASSED	HA=Time Offset Durations=2m ComponentName=kingbase	Simulates a time offset scenario thereby testing the application's resilience to potential slowness/unavailability of some replicas due to time offset.
	Network Loss	PASSED	HA=Network Loss Durations=2m ComponentName=kingbase	Simulates network loss fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to loss network.
	Network Delay	PASSED	HA=Network Delay Durations=2m ComponentName=kingbase	Simulates network delay fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to delay network.
	Evicting Pod	PASSED	HA=Evicting Pod ComponentName=kingbase	Simulates conditions where pods evicting either due to node drained thereby testing the application's resilience to unavailability of some replicas due to evicting.
	Kill 1	FAILED	HA=Kill 1 ComponentName=kingbase	Simulates conditions where process 1 killed either due to expected/undesired processes thereby testing the application's resilience to unavailability of some replicas due to abnormal termination signals.
	Network Duplicate	PASSED	HA=Network Duplicate Durations=2m ComponentName=kingbase	Simulates network duplicate fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to duplicate network.
	Pod Failure	PASSED	HA=Pod Failure Durations=2m ComponentName=kingbase	Simulates conditions where pods experience failure for a period of time either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to failure.
	Connection Stress	PASSED	HA=Connection Stress ComponentName=kingbase	Simulates conditions where pods experience connection stress either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high Connection load.
	Full CPU	PASSED	HA=Full CPU Durations=2m ComponentName=kingbase	Simulates conditions where pods experience CPU full either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high CPU load.
	Network Bandwidth	PASSED	HA=Network Bandwidth Durations=2m ComponentName=kingbase	Simulates network bandwidth fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to bandwidth network.
	DNS Error	FAILED	HA=DNS Error Durations=2m ComponentName=kingbase	Simulates conditions where pods experience DNS service errors for a period of time either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to DNS service errors.
	DNS Random	PASSED	HA=DNS Random Durations=2m ComponentName=kingbase	Simulates conditions where pods experience random IP addresses being returned by the DNS service for a period of time either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to the DNS service returning random IP addresses.
	Network Corrupt	FAILED	HA=Network Corrupt Durations=2m ComponentName=kingbase	Simulates network corrupt fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to corrupt network.

Vastbase ( Topology = replication ) failover

Engine	FailoverOps	State	Props	Description
Vastbase ( Topology = replication )	Pod Failure	PASSED	HA=Pod Failure Durations=2m ComponentName=vastbase	Simulates conditions where pods experience failure for a period of time either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to failure.
	Connection Stress	PASSED	HA=Connection Stress ComponentName=vastbase	Simulates conditions where pods experience connection stress either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high Connection load.
	OOM	PASSED	HA=OOM Durations=2m ComponentName=vastbase	Simulates conditions where pods experience OOM either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high Memory load.
	Full CPU	PASSED	HA=Full CPU Durations=2m ComponentName=vastbase	Simulates conditions where pods experience CPU full either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high CPU load.
	DNS Random	PASSED	HA=DNS Random Durations=2m ComponentName=vastbase	Simulates conditions where pods experience random IP addresses being returned by the DNS service for a period of time either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to the DNS service returning random IP addresses.
	Evicting Pod	FAILED	HA=Evicting Pod ComponentName=vastbase	Simulates conditions where pods evicting either due to node drained thereby testing the application's resilience to unavailability of some replicas due to evicting.
	Network Corrupt	FAILED	HA=Network Corrupt Durations=2m ComponentName=vastbase	Simulates network corrupt fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to corrupt network.
	Network Partition	PASSED	HA=Network Partition Durations=2m ComponentName=vastbase	Simulates network partition fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to partition network.
	Network Duplicate	PASSED	HA=Network Duplicate Durations=2m ComponentName=vastbase	Simulates network duplicate fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to duplicate network.
	Time Offset	PASSED	HA=Time Offset Durations=2m ComponentName=vastbase	Simulates a time offset scenario thereby testing the application's resilience to potential slowness/unavailability of some replicas due to time offset.
	Network Delay	PASSED	HA=Network Delay Durations=2m ComponentName=vastbase	Simulates network delay fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to delay network.
	Network Loss	FAILED	HA=Network Loss Durations=2m ComponentName=vastbase	Simulates network loss fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to loss network.
	Network Bandwidth	FAILED	HA=Network Bandwidth Durations=2m ComponentName=vastbase	Simulates network bandwidth fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to bandwidth network.
	DNS Error	PASSED	HA=DNS Error Durations=2m ComponentName=vastbase	Simulates conditions where pods experience DNS service errors for a period of time either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to DNS service errors.

Mssql ( Topology = cluster ) failover

Engine	FailoverOps	State	Props	Description
Mssql ( Topology = cluster )	OOM	PASSED	HA=OOM Durations=2m ComponentName=mssql	Simulates conditions where pods experience OOM either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high Memory load.
	Network Bandwidth	PASSED	HA=Network Bandwidth Durations=2m ComponentName=mssql	Simulates network bandwidth fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to bandwidth network.
	Time Offset	PASSED	HA=Time Offset Durations=2m ComponentName=mssql	Simulates a time offset scenario thereby testing the application's resilience to potential slowness/unavailability of some replicas due to time offset.
	Network Duplicate	PASSED	HA=Network Duplicate Durations=2m ComponentName=mssql	Simulates network duplicate fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to duplicate network.
	Pod Failure	PASSED	HA=Pod Failure Durations=2m ComponentName=mssql	Simulates conditions where pods experience failure for a period of time either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to failure.
	Network Loss	PASSED	HA=Network Loss Durations=2m ComponentName=mssql	Simulates network loss fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to loss network.
	DNS Random	PASSED	HA=DNS Random Durations=2m ComponentName=mssql	Simulates conditions where pods experience random IP addresses being returned by the DNS service for a period of time either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to the DNS service returning random IP addresses.
	Network Corrupt	PASSED	HA=Network Corrupt Durations=2m ComponentName=mssql	Simulates network corrupt fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to corrupt network.
	Full CPU	PASSED	HA=Full CPU Durations=2m ComponentName=mssql	Simulates conditions where pods experience CPU full either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high CPU load.
	Evicting Pod	PASSED	HA=Evicting Pod ComponentName=mssql	Simulates conditions where pods evicting either due to node drained thereby testing the application's resilience to unavailability of some replicas due to evicting.
	DNS Error	PASSED	HA=DNS Error Durations=2m ComponentName=mssql	Simulates conditions where pods experience DNS service errors for a period of time either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to DNS service errors.
	Network Partition	PASSED	HA=Network Partition Durations=2m ComponentName=mssql	Simulates network partition fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to partition network.
	Network Delay	PASSED	HA=Network Delay Durations=2m ComponentName=mssql	Simulates network delay fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to delay network.

ChaosMesh Introduction

ChaosMesh is a chaos engineering experimentation platform targeting Kubernetes environments, aimed at helping users test system stability and fault tolerance by simulating various failure scenarios.

Pod Failure Introduction

Working Principle:

Implementation Method:

ChaosMesh uses the PodChaos feature to simulate Pod failures. Users can configure the Pod Failure experiment in a YAML file by setting the following parameters:

mode: Specifies the experiment object selection mode (one/all/fixed/fixed-percent, etc.)
selector: Specifies the label selector for the target Pods

Significance:

- Verifying High Availability

Simulating Pod failures helps verify whether the system can automatically detect and recover from Pod failures, ensuring service availability even under adverse conditions.

- Testing Failover Mechanisms

This experiment tests the system's ability to handle Pod failures, including rescheduling Pods on other nodes and ensuring data consistency during recovery.

- Identifying Potential Issues

- Enhancing System Robustness

Delete Pod Introduction

Simulating a database instance failover scenario by deleting a Pod involves the following principles and significance:

Principles:

Significance:

-Verifying the high availability of the database cluster

-Testing the performance of database failover

-Discovering potential issues with database failover

Real-world failover scenarios may expose some failures and edge cases, such as data inconsistency, split-brain, etc. Simulation helps to discover and fix these potential issues.

-Enhancing the robustness of applications

Kill process 1 Introduction

Principle:

Significance:

-Simulating actual process crash scenarios

In production environments, processes may crash due to code errors, resource depletion, etc. Killing the number 1 process can simulate this extreme situation.

-Testing edge cases for instance failover

-Verifying the automatic recovery capability of the container orchestration system

When the main process exits abnormally, KubeBlocks needs to be able to quickly detect and rebuild the Pod, ensuring service continuity. This is a test of its automatic recovery capability.

-Discovering potential system vulnerabilities and defects

OOM Introduction

Full CPU Introduction

ChaosMesh's CPU Stress fault experiment aims to simulate scenarios where the system encounters CPU resource stress. Its working principle and implementation method are as follows:

Working Principle:

Implementation Method:

Users can set the following parameters in the configuration file for the CPU Stress experiment:

workers: Specifies the number of threads consuming CPU resources
load: Specifies the percentage of CPU resources to be consumed (0-100)
mode: Specifies the experiment object selection mode (one/all/fixed/fixed-percent, etc.)
selector: Specifies the label selector for the target Pods

Network Delay Introduction

ChaosMesh's Network Delay fault experiment aims to simulate network latency scenarios. Its working principle and implementation method are as follows:

Working Principle:

Implementation Method:

Users can set the following key parameters in the configuration file for the Network Delay experiment:

latency: Specifies the network delay time to be injected, such as 10ms
correlation: Specifies the correlation between the current delay and the previous delay (0-100)
jitter: Specifies the range of delay jitter By setting these parameters, ChaosMesh can simulate various complex network delay scenarios, such as fixed delay, random delay, and correlated delay.

Network Loss Introduction

ChaosMesh's Network Loss fault experiment aims to simulate packet loss scenarios. Its working principle and implementation method are as follows:

Working Principle:

Implementation Method:

Users can set the following key parameters in the configuration file for the Network Loss experiment:

loss: Specifies the packet loss rate to be injected (0-100%)
correlation: Specifies the correlation between the current packet loss and the previous packet loss (0-100)

By setting these parameters, ChaosMesh can simulate various complex packet loss scenarios, such as fixed loss, random loss, and correlated loss.

Network Duplicate Introduction

ChaosMesh's Network Duplicate fault experiment aims to simulate packet duplication scenarios.

Working Principle:

Implementation Method:

Users can set the following key parameters in the configuration file for the Network Duplicate experiment:

duplicate: Specifies the probability of packet duplication (0-100%)
correlation: Specifies the correlation between consecutive duplication events (0-100)

By setting these parameters, ChaosMesh can simulate various complex packet duplication scenarios.

Network Corrupt Introduction

ChaosMesh's Network Corrupt fault experiment aims to simulate packet corruption scenarios.

Working Principle:

Implementation Method:

Users can set the following key parameters in the configuration file for the Network Corrupt experiment:

corrupt: Specifies the probability of packet corruption (0-100%)
correlation: Specifies the correlation between consecutive corruption events (0-100)

By setting these parameters, ChaosMesh can simulate various complex packet corruption scenarios.

Network Partition Introduction

ChaosMesh's Network Partition fault experiment aims to simulate network partitioning scenarios.

Working Principle:

ChaosMesh isolates certain nodes in the cluster, creating independent network partitions, to test the system's fault tolerance and consistency.

Implementation Method:

ChaosMesh uses iptables or nftables rules to create network isolation between target nodes. These rules can block traffic to specific IP addresses or ports.

Users can set the following key parameters in the configuration file for the Network Partition experiment:

partitions: Defines the range of network partitions to be isolated
duration: Specifies the duration of the partition

Network Bandwidth Introduction

ChaosMesh's Network Bandwidth fault experiment aims to simulate bandwidth limitation scenarios.

Working Principle:

ChaosMesh restricts the network bandwidth available to the target Pod, thereby simulating low-bandwidth network environments.

Implementation Method:

Users can set the following key parameters in the configuration file for the Network Bandwidth experiment:

rate: Specifies the maximum bandwidth limit (e.g., 1Mbps, 10Mbps)
limit: Specifies the queue length
buffer: Specifies the buffer size

By setting these parameters, ChaosMesh can simulate various complex bandwidth limitation scenarios.

TimeOffset Introduction

ChaosMesh's Time Chaos feature aims to simulate time offset scenarios. Its working principle and implementation method are as follows:

Working Principle:

Implementation Method:

timeOffset: The time offset to be injected, e.g., +30m indicates a 30-minute shift forward
clockIds: The clock type to be affected, e.g., CLOCK_REALTIME affects the real-time clock
pid: The ID of the target process to be attacked By setting these parameters, ChaosMesh can simulate various time error scenarios, such as time advancing, time lagging, and specific clock type errors.

Evicting Pod Introduction

Working Principle:

The eviction command sends a signal to the kubelet on the node to terminate the Pod gracefully.
The kubelet then starts terminating the Pod by sending termination signals to the containers within the Pod.
Once the Pod is terminated, Kubernetes reschedules the Pod on another healthy node in the cluster.
The new node initializes the Pod using the same persistent data volume (if applicable) and performs any necessary recovery operations.

Implementation Method:

To evict Pods from a node, you can use the kubectl drain command, which safely evicts all Pods from the specified node, ensuring minimal disruption to services.

Significance:

-Verifying High Availability

By evicting Pods using kubectl drain, you can check whether the system can automatically recover when a node becomes unavailable, ensuring that services remain available and resilient.

-Testing Failover Mechanisms

-Discovering Potential Issues

-Enhancing System Robustness

Connection Stress Introduction

Working Principle:

Implementation Method:

Users can set the following parameters for the Connection Stress experiment:

connections: Specifies the total number of connections to establish.
concurrent: Specifies the number of concurrent connections to maintain.
duration: Specifies the duration of the connection stress test.
target_service: Specifies the endpoint or service to which connections are made.
interval: Specifies the interval between connection attempts (if applicable).

Significance:

-Testing Connection Handling Limits

By injecting a large number of connections, you can verify whether the system can handle the maximum number of connections it is designed for and identify any bottlenecks or limitations.

-Evaluating Auto-scaling Mechanisms

Under high connection stress, the system may trigger auto-scaling policies to add more resources. This experiment helps evaluate the effectiveness and responsiveness of these mechanisms.

-Identifying Potential Issues

High connection loads can expose issues such as connection leaks, timeouts, or degraded performance. Identifying these issues early can help improve system stability and reliability.

-Optimizing Resource Allocation

Understanding how the system behaves under connection stress allows for better resource allocation and capacity planning, ensuring optimal performance even under heavy loads.

DNS Error Introduction

The DNS Error experiment in ChaosMesh is designed to simulate DNS resolution failure scenarios. Below are the working principles and implementation methods:

Working Principle:

Implementation Method:

Users can define a DNS Error experiment by setting the following key parameters in the configuration file:

action: Specifies the action to perform (error indicates injecting an error)
patterns: Selects a domain template that matches faults.

DNS Random Introduction

The DNS Random experiment in ChaosMesh is designed to simulate scenarios where DNS resolution results are randomized. Below are the working principles and implementation methods:

Working Principle:

Implementation Method:

Users can define a DNS Random experiment by setting the following key parameters in the configuration file:

action: Specifies the action to perform (random indicates randomizing resolution results)
patterns: Selects a domain template that matches faults.

During the experiment, ChaosMesh continuously monitors the network state. After the experiment ends, ChaosMesh automatically cleans up the injected network rules and restores the normal network environment. By conducting a DNS Random experiment, users can evaluate the stability and availability of distributed systems under DNS resolution result randomization conditions, thereby optimizing DNS configurations and enhancing system robustness.

Postgresql ( Topology = replication ) failover

Engine	FailoverOps	State	Props	Description
Postgresql ( Topology = replication )	DNS Random	PASSED	HA=DNS Random Durations=2m ComponentName=postgresql	Simulates conditions where pods experience random IP addresses being returned by the DNS service for a period of time either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to the DNS service returning random IP addresses.
	DNS Error	PASSED	HA=DNS Error Durations=2m ComponentName=postgresql	Simulates conditions where pods experience DNS service errors for a period of time either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to DNS service errors.
	OOM	PASSED	HA=OOM Durations=2m ComponentName=postgresql	Simulates conditions where pods experience OOM either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high Memory load.
	Network Loss	PASSED	HA=Network Loss Durations=2m ComponentName=postgresql	Simulates network loss fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to loss network.
	Full CPU	PASSED	HA=Full CPU Durations=2m ComponentName=postgresql	Simulates conditions where pods experience CPU full either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high CPU load.
	Time Offset	PASSED	HA=Time Offset Durations=2m ComponentName=postgresql	Simulates a time offset scenario thereby testing the application's resilience to potential slowness/unavailability of some replicas due to time offset.
	Evicting Pod	PASSED	HA=Evicting Pod ComponentName=postgresql	Simulates conditions where pods evicting either due to node drained thereby testing the application's resilience to unavailability of some replicas due to evicting.
	Network Bandwidth	PASSED	HA=Network Bandwidth Durations=2m ComponentName=postgresql	Simulates network bandwidth fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to bandwidth network.
	Network Partition	PASSED	HA=Network Partition Durations=2m ComponentName=postgresql	Simulates network partition fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to partition network.
	Kill 1	PASSED	HA=Kill 1 ComponentName=postgresql	Simulates conditions where process 1 killed either due to expected/undesired processes thereby testing the application's resilience to unavailability of some replicas due to abnormal termination signals.
	Network Corrupt	PASSED	HA=Network Corrupt Durations=2m ComponentName=postgresql	Simulates network corrupt fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to corrupt network.
	Network Duplicate	PASSED	HA=Network Duplicate Durations=2m ComponentName=postgresql	Simulates network duplicate fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to duplicate network.
	Pod Failure	PASSED	HA=Pod Failure Durations=2m ComponentName=postgresql	Simulates conditions where pods experience failure for a period of time either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to failure.
	Connection Stress	PASSED	HA=Connection Stress ComponentName=postgresql	Simulates conditions where pods experience connection stress either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high Connection load.
	Network Delay	PASSED	HA=Network Delay Durations=2m ComponentName=postgresql	Simulates network delay fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to delay network.

Redis ( Topology = replication ) failover

Engine	FailoverOps	State	Props	Description
Redis ( Topology = replication )	Evicting Pod	PASSED	HA=Evicting Pod ComponentName=redis	Simulates conditions where pods evicting either due to node drained thereby testing the application's resilience to unavailability of some replicas due to evicting.
	Network Loss	PASSED	HA=Network Loss Durations=2m ComponentName=redis	Simulates network loss fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to loss network.
	Network Delay Failover	PASSED	HA=Network Delay Failover Durations=2m ComponentName=redis	Simulates network delay fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to delay network.
	Pod Failure	PASSED	HA=Pod Failure Durations=2m ComponentName=redis	Simulates conditions where pods experience failure for a period of time either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to failure.
	Network Corrupt	PASSED	HA=Network Corrupt Durations=2m ComponentName=redis	Simulates network corrupt fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to corrupt network.
	Time Offset	PASSED	HA=Time Offset Durations=2m ComponentName=redis	Simulates a time offset scenario thereby testing the application's resilience to potential slowness/unavailability of some replicas due to time offset.
	Network Bandwidth	PASSED	HA=Network Bandwidth Durations=2m ComponentName=redis	Simulates network bandwidth fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to bandwidth network.
	DNS Error	PASSED	HA=DNS Error Durations=2m ComponentName=redis	Simulates conditions where pods experience DNS service errors for a period of time either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to DNS service errors.
	DNS Random	PASSED	HA=DNS Random Durations=2m ComponentName=redis	Simulates conditions where pods experience random IP addresses being returned by the DNS service for a period of time either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to the DNS service returning random IP addresses.
	Network Duplicate	PASSED	HA=Network Duplicate Durations=2m ComponentName=redis	Simulates network duplicate fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to duplicate network.
	Connection Stress	PASSED	HA=Connection Stress ComponentName=redis	Simulates conditions where pods experience connection stress either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high Connection load.
	Full CPU Failover	PASSED	HA=Full CPU Failover Durations=2m ComponentName=redis	Simulates conditions where pods experience CPU full either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high CPU load.
	Kill 1	PASSED	HA=Kill 1 ComponentName=redis	Simulates conditions where process 1 killed either due to expected/undesired processes thereby testing the application's resilience to unavailability of some replicas due to abnormal termination signals.
	OOM	PASSED	HA=OOM Durations=2m ComponentName=redis	Simulates conditions where pods experience OOM either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high Memory load.
	Network Partition	PASSED	HA=Network Partition Durations=2m ComponentName=redis	Simulates network partition fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to partition network.

Kafka failover

Engine	FailoverOps	State	Props	Description
Kafka	Network Bandwidth	PASSED	HA=Network Bandwidth Durations=2m ComponentName=kafka-combine	Simulates network bandwidth fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to bandwidth network.
	Network Corrupt	PASSED	HA=Network Corrupt Durations=2m ComponentName=kafka-combine	Simulates network corrupt fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to corrupt network.
	DNS Error	PASSED	HA=DNS Error Durations=2m ComponentName=kafka-combine	Simulates conditions where pods experience DNS service errors for a period of time either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to DNS service errors.
	Network Loss	PASSED	HA=Network Loss Durations=2m ComponentName=kafka-combine	Simulates network loss fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to loss network.
	Evicting Pod	PASSED	HA=Evicting Pod ComponentName=kafka-combine	Simulates conditions where pods evicting either due to node drained thereby testing the application's resilience to unavailability of some replicas due to evicting.
	OOM	PASSED	HA=OOM Durations=2m ComponentName=kafka-combine	Simulates conditions where pods experience OOM either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high Memory load.
	DNS Random	PASSED	HA=DNS Random Durations=2m ComponentName=kafka-combine	Simulates conditions where pods experience random IP addresses being returned by the DNS service for a period of time either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to the DNS service returning random IP addresses.
	Network Duplicate	PASSED	HA=Network Duplicate Durations=2m ComponentName=kafka-combine	Simulates network duplicate fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to duplicate network.
	Kill 1	PASSED	HA=Kill 1 ComponentName=kafka-combine	Simulates conditions where process 1 killed either due to expected/undesired processes thereby testing the application's resilience to unavailability of some replicas due to abnormal termination signals.
	Time Offset	PASSED	HA=Time Offset Durations=2m ComponentName=kafka-combine	Simulates a time offset scenario thereby testing the application's resilience to potential slowness/unavailability of some replicas due to time offset.
	Network Partition	PASSED	HA=Network Partition Durations=2m ComponentName=kafka-combine	Simulates network partition fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to partition network.
	Network Delay	PASSED	HA=Network Delay Durations=2m ComponentName=kafka-combine	Simulates network delay fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to delay network.
	Full CPU	PASSED	HA=Full CPU Durations=2m ComponentName=kafka-combine	Simulates conditions where pods experience CPU full either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high CPU load.
	Pod Failure	PASSED	HA=Pod Failure Durations=2m ComponentName=kafka-combine	Simulates conditions where pods experience failure for a period of time either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to failure.
	Connection Stress	PASSED	HA=Connection Stress ComponentName=kafka-combine	Simulates conditions where pods experience connection stress either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high Connection load.

Qdrant ( Topology = cluster ) failover

Engine	FailoverOps	State	Props	Description
Qdrant ( Topology = cluster )	Connection Stress	PASSED	HA=Connection Stress ComponentName=qdrant	Simulates conditions where pods experience connection stress either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high Connection load.

Mysql ( Topology = replication ) failover

Engine	FailoverOps	State	Props	Description
Mysql ( Topology = replication )	Network Loss	PASSED	HA=Network Loss Durations=2m ComponentName=mysql	Simulates network loss fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to loss network.
	Evicting Pod	FAILED	HA=Evicting Pod ComponentName=mysql	Simulates conditions where pods evicting either due to node drained thereby testing the application's resilience to unavailability of some replicas due to evicting.
	Kill 1	PASSED	HA=Kill 1 ComponentName=mysql	Simulates conditions where process 1 killed either due to expected/undesired processes thereby testing the application's resilience to unavailability of some replicas due to abnormal termination signals.
	Network Bandwidth	PASSED	HA=Network Bandwidth Durations=2m ComponentName=mysql	Simulates network bandwidth fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to bandwidth network.
	Network Delay	PASSED	HA=Network Delay Durations=2m ComponentName=mysql	Simulates network delay fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to delay network.
	Connection Stress	PASSED	HA=Connection Stress ComponentName=mysql	Simulates conditions where pods experience connection stress either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high Connection load.
	Pod Failure	PASSED	HA=Pod Failure Durations=2m ComponentName=mysql	Simulates conditions where pods experience failure for a period of time either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to failure.
	Network Corrupt	PASSED	HA=Network Corrupt Durations=2m ComponentName=mysql	Simulates network corrupt fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to corrupt network.
	OOM	PASSED	HA=OOM Durations=2m ComponentName=mysql	Simulates conditions where pods experience OOM either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high Memory load.
	DNS Error	PASSED	HA=DNS Error Durations=2m ComponentName=mysql	Simulates conditions where pods experience DNS service errors for a period of time either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to DNS service errors.
	Network Partition	PASSED	HA=Network Partition Durations=2m ComponentName=mysql	Simulates network partition fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to partition network.
	Network Duplicate	PASSED	HA=Network Duplicate Durations=2m ComponentName=mysql	Simulates network duplicate fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to duplicate network.
	DNS Random	PASSED	HA=DNS Random Durations=2m ComponentName=mysql	Simulates conditions where pods experience random IP addresses being returned by the DNS service for a period of time either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to the DNS service returning random IP addresses.
	Full CPU	FAILED	HA=Full CPU Durations=2m ComponentName=mysql	Simulates conditions where pods experience CPU full either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high CPU load.
	Time Offset	PASSED	HA=Time Offset Durations=2m ComponentName=mysql	Simulates a time offset scenario thereby testing the application's resilience to potential slowness/unavailability of some replicas due to time offset.

Clickhouse ( Topology = cluster ) failover

Engine	FailoverOps	State	Props	Description
Clickhouse ( Topology = cluster )	Kill 1	PASSED	HA=Kill 1 ComponentName=clickhouse	Simulates conditions where process 1 killed either due to expected/undesired processes thereby testing the application's resilience to unavailability of some replicas due to abnormal termination signals.
	DNS Random	PASSED	HA=DNS Random Durations=2m ComponentName=clickhouse	Simulates conditions where pods experience random IP addresses being returned by the DNS service for a period of time either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to the DNS service returning random IP addresses.
	DNS Error	PASSED	HA=DNS Error Durations=2m ComponentName=clickhouse	Simulates conditions where pods experience DNS service errors for a period of time either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to DNS service errors.
	Pod Failure	PASSED	HA=Pod Failure Durations=2m ComponentName=clickhouse	Simulates conditions where pods experience failure for a period of time either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to failure.
	Network Delay	PASSED	HA=Network Delay Durations=2m ComponentName=clickhouse	Simulates network delay fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to delay network.
	Network Partition	PASSED	HA=Network Partition Durations=2m ComponentName=clickhouse	Simulates network partition fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to partition network.
	Evicting Pod	PASSED	HA=Evicting Pod ComponentName=clickhouse	Simulates conditions where pods evicting either due to node drained thereby testing the application's resilience to unavailability of some replicas due to evicting.
	Network Loss	PASSED	HA=Network Loss Durations=2m ComponentName=clickhouse	Simulates network loss fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to loss network.
	Time Offset	PASSED	HA=Time Offset Durations=2m ComponentName=clickhouse	Simulates a time offset scenario thereby testing the application's resilience to potential slowness/unavailability of some replicas due to time offset.
	Connection Stress	PASSED	HA=Connection Stress ComponentName=clickhouse	Simulates conditions where pods experience connection stress either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high Connection load.
	Network Bandwidth	PASSED	HA=Network Bandwidth Durations=2m ComponentName=clickhouse	Simulates network bandwidth fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to bandwidth network.
	Full CPU	PASSED	HA=Full CPU Durations=2m ComponentName=clickhouse	Simulates conditions where pods experience CPU full either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high CPU load.
	Network Corrupt	PASSED	HA=Network Corrupt Durations=2m ComponentName=clickhouse	Simulates network corrupt fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to corrupt network.
	Network Duplicate	PASSED	HA=Network Duplicate Durations=2m ComponentName=clickhouse	Simulates network duplicate fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to duplicate network.
	OOM	PASSED	HA=OOM Durations=2m ComponentName=clickhouse	Simulates conditions where pods experience OOM either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high Memory load.

ElasticSearch ( Topology = multi-node ) failover

Engine	FailoverOps	State	Props	Description
ElasticSearch ( Topology = multi-node )	Connection Stress	PASSED	HA=Connection Stress ComponentName=master	Simulates conditions where pods experience connection stress either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high Connection load.

OceanBase Ent ( Topology = distribution ) failover

Engine	FailoverOps	State	Props	Description
OceanBase Ent ( Topology = distribution )	Connection Stress	PASSED	HA=Connection Stress ComponentName=oceanbase	Simulates conditions where pods experience connection stress either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high Connection load.

Starrocks Ent ( Topology = shared-nothing ) failover

Engine	FailoverOps	State	Props	Description
Starrocks Ent ( Topology = shared-nothing )	Connection Stress	PASSED	HA=Connection Stress ComponentName=be	Simulates conditions where pods experience connection stress either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high Connection load.

Minio failover

Engine	FailoverOps	State	Props	Description
Minio	Connection Stress	PASSED	HA=Connection Stress ComponentName=minio	Simulates conditions where pods experience connection stress either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high Connection load.

Damengdb ( Topology = realtime-replication ) failover

Engine	FailoverOps	State	Props	Description
Damengdb ( Topology = realtime-replication )	Kill 1	PASSED	HA=Kill 1 ComponentName=dmdb	Simulates conditions where process 1 killed either due to expected/undesired processes thereby testing the application's resilience to unavailability of some replicas due to abnormal termination signals.
	Pod Failure	PASSED	HA=Pod Failure Durations=2m ComponentName=dmdb	Simulates conditions where pods experience failure for a period of time either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to failure.
	Network Bandwidth	PASSED	HA=Network Bandwidth Durations=2m ComponentName=dmdb	Simulates network bandwidth fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to bandwidth network.
	Network Corrupt	PASSED	HA=Network Corrupt Durations=2m ComponentName=dmdb	Simulates network corrupt fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to corrupt network.
	Network Partition	PASSED	HA=Network Partition Durations=2m ComponentName=dmdb	Simulates network partition fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to partition network.
	Network Delay	PASSED	HA=Network Delay Durations=2m ComponentName=dmdb	Simulates network delay fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to delay network.
	OOM	PASSED	HA=OOM Durations=2m ComponentName=dmdb	Simulates conditions where pods experience OOM either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high Memory load.
	Network Duplicate	PASSED	HA=Network Duplicate Durations=2m ComponentName=dmdb	Simulates network duplicate fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to duplicate network.
	DNS Random	PASSED	HA=DNS Random Durations=2m ComponentName=dmdb	Simulates conditions where pods experience random IP addresses being returned by the DNS service for a period of time either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to the DNS service returning random IP addresses.
	Connection Stress	PASSED	HA=Connection Stress ComponentName=dmdb	Simulates conditions where pods experience connection stress either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high Connection load.
	Time Offset	PASSED	HA=Time Offset Durations=2m ComponentName=dmdb	Simulates a time offset scenario thereby testing the application's resilience to potential slowness/unavailability of some replicas due to time offset.
	Evicting Pod	FAILED	HA=Evicting Pod ComponentName=dmdb	Simulates conditions where pods evicting either due to node drained thereby testing the application's resilience to unavailability of some replicas due to evicting.
	DNS Error	PASSED	HA=DNS Error Durations=2m ComponentName=dmdb	Simulates conditions where pods experience DNS service errors for a period of time either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to DNS service errors.
	Full CPU	PASSED	HA=Full CPU Durations=2m ComponentName=dmdb	Simulates conditions where pods experience CPU full either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high CPU load.
	Network Loss	PASSED	HA=Network Loss Durations=2m ComponentName=dmdb	Simulates network loss fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to loss network.

Kingbase ( Topology = kingbase-cluster ) failover

Engine	FailoverOps	State	Props	Description
Kingbase ( Topology = kingbase-cluster )	Network Partition	PASSED	HA=Network Partition Durations=2m ComponentName=kingbase	Simulates network partition fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to partition network.
	OOM	PASSED	HA=OOM Durations=2m ComponentName=kingbase	Simulates conditions where pods experience OOM either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high Memory load.
	Time Offset	PASSED	HA=Time Offset Durations=2m ComponentName=kingbase	Simulates a time offset scenario thereby testing the application's resilience to potential slowness/unavailability of some replicas due to time offset.
	Network Loss	PASSED	HA=Network Loss Durations=2m ComponentName=kingbase	Simulates network loss fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to loss network.
	Network Delay	PASSED	HA=Network Delay Durations=2m ComponentName=kingbase	Simulates network delay fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to delay network.
	Evicting Pod	PASSED	HA=Evicting Pod ComponentName=kingbase	Simulates conditions where pods evicting either due to node drained thereby testing the application's resilience to unavailability of some replicas due to evicting.
	Kill 1	FAILED	HA=Kill 1 ComponentName=kingbase	Simulates conditions where process 1 killed either due to expected/undesired processes thereby testing the application's resilience to unavailability of some replicas due to abnormal termination signals.
	Network Duplicate	PASSED	HA=Network Duplicate Durations=2m ComponentName=kingbase	Simulates network duplicate fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to duplicate network.
	Pod Failure	PASSED	HA=Pod Failure Durations=2m ComponentName=kingbase	Simulates conditions where pods experience failure for a period of time either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to failure.
	Connection Stress	PASSED	HA=Connection Stress ComponentName=kingbase	Simulates conditions where pods experience connection stress either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high Connection load.
	Full CPU	PASSED	HA=Full CPU Durations=2m ComponentName=kingbase	Simulates conditions where pods experience CPU full either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high CPU load.
	Network Bandwidth	PASSED	HA=Network Bandwidth Durations=2m ComponentName=kingbase	Simulates network bandwidth fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to bandwidth network.
	DNS Error	FAILED	HA=DNS Error Durations=2m ComponentName=kingbase	Simulates conditions where pods experience DNS service errors for a period of time either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to DNS service errors.
	DNS Random	PASSED	HA=DNS Random Durations=2m ComponentName=kingbase	Simulates conditions where pods experience random IP addresses being returned by the DNS service for a period of time either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to the DNS service returning random IP addresses.
	Network Corrupt	FAILED	HA=Network Corrupt Durations=2m ComponentName=kingbase	Simulates network corrupt fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to corrupt network.

Vastbase ( Topology = replication ) failover

Engine	FailoverOps	State	Props	Description
Vastbase ( Topology = replication )	Pod Failure	PASSED	HA=Pod Failure Durations=2m ComponentName=vastbase	Simulates conditions where pods experience failure for a period of time either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to failure.
	Connection Stress	PASSED	HA=Connection Stress ComponentName=vastbase	Simulates conditions where pods experience connection stress either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high Connection load.
	OOM	PASSED	HA=OOM Durations=2m ComponentName=vastbase	Simulates conditions where pods experience OOM either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high Memory load.
	Full CPU	PASSED	HA=Full CPU Durations=2m ComponentName=vastbase	Simulates conditions where pods experience CPU full either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high CPU load.
	DNS Random	PASSED	HA=DNS Random Durations=2m ComponentName=vastbase	Simulates conditions where pods experience random IP addresses being returned by the DNS service for a period of time either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to the DNS service returning random IP addresses.
	Evicting Pod	FAILED	HA=Evicting Pod ComponentName=vastbase	Simulates conditions where pods evicting either due to node drained thereby testing the application's resilience to unavailability of some replicas due to evicting.
	Network Corrupt	FAILED	HA=Network Corrupt Durations=2m ComponentName=vastbase	Simulates network corrupt fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to corrupt network.
	Network Partition	PASSED	HA=Network Partition Durations=2m ComponentName=vastbase	Simulates network partition fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to partition network.
	Network Duplicate	PASSED	HA=Network Duplicate Durations=2m ComponentName=vastbase	Simulates network duplicate fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to duplicate network.
	Time Offset	PASSED	HA=Time Offset Durations=2m ComponentName=vastbase	Simulates a time offset scenario thereby testing the application's resilience to potential slowness/unavailability of some replicas due to time offset.
	Network Delay	PASSED	HA=Network Delay Durations=2m ComponentName=vastbase	Simulates network delay fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to delay network.
	Network Loss	FAILED	HA=Network Loss Durations=2m ComponentName=vastbase	Simulates network loss fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to loss network.
	Network Bandwidth	FAILED	HA=Network Bandwidth Durations=2m ComponentName=vastbase	Simulates network bandwidth fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to bandwidth network.
	DNS Error	PASSED	HA=DNS Error Durations=2m ComponentName=vastbase	Simulates conditions where pods experience DNS service errors for a period of time either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to DNS service errors.

Mssql ( Topology = cluster ) failover

Engine	FailoverOps	State	Props	Description
Mssql ( Topology = cluster )	OOM	PASSED	HA=OOM Durations=2m ComponentName=mssql	Simulates conditions where pods experience OOM either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high Memory load.
	Network Bandwidth	PASSED	HA=Network Bandwidth Durations=2m ComponentName=mssql	Simulates network bandwidth fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to bandwidth network.
	Time Offset	PASSED	HA=Time Offset Durations=2m ComponentName=mssql	Simulates a time offset scenario thereby testing the application's resilience to potential slowness/unavailability of some replicas due to time offset.
	Network Duplicate	PASSED	HA=Network Duplicate Durations=2m ComponentName=mssql	Simulates network duplicate fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to duplicate network.
	Pod Failure	PASSED	HA=Pod Failure Durations=2m ComponentName=mssql	Simulates conditions where pods experience failure for a period of time either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to failure.
	Network Loss	PASSED	HA=Network Loss Durations=2m ComponentName=mssql	Simulates network loss fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to loss network.
	DNS Random	PASSED	HA=DNS Random Durations=2m ComponentName=mssql	Simulates conditions where pods experience random IP addresses being returned by the DNS service for a period of time either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to the DNS service returning random IP addresses.
	Network Corrupt	PASSED	HA=Network Corrupt Durations=2m ComponentName=mssql	Simulates network corrupt fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to corrupt network.
	Full CPU	PASSED	HA=Full CPU Durations=2m ComponentName=mssql	Simulates conditions where pods experience CPU full either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to high CPU load.
	Evicting Pod	PASSED	HA=Evicting Pod ComponentName=mssql	Simulates conditions where pods evicting either due to node drained thereby testing the application's resilience to unavailability of some replicas due to evicting.
	DNS Error	PASSED	HA=DNS Error Durations=2m ComponentName=mssql	Simulates conditions where pods experience DNS service errors for a period of time either due to expected/undesired processes thereby testing the application's resilience to potential slowness/unavailability of some replicas due to DNS service errors.
	Network Partition	PASSED	HA=Network Partition Durations=2m ComponentName=mssql	Simulates network partition fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to partition network.
	Network Delay	PASSED	HA=Network Delay Durations=2m ComponentName=mssql	Simulates network delay fault thereby testing the application's resilience to potential slowness/unavailability of some replicas due to delay network.

Database High Availability (HA) Test Report v0.9.3

Created by

Reviewed by

Approved by

ChaosMesh Introduction

Pod Failure Introduction

Working Principle:

Implementation Method:

Significance:

- Verifying High Availability

- Testing Failover Mechanisms

- Identifying Potential Issues

- Enhancing System Robustness

Delete Pod Introduction

Principles:

Significance:

-Verifying the high availability of the database cluster

-Testing the performance of database failover

-Discovering potential issues with database failover

-Enhancing the robustness of applications

Kill process 1 Introduction

Principle:

Significance:

-Simulating actual process crash scenarios

-Testing edge cases for instance failover

-Verifying the automatic recovery capability of the container orchestration system

-Discovering potential system vulnerabilities and defects

OOM Introduction

Full CPU Introduction

Working Principle:

Implementation Method:

Network Delay Introduction

Working Principle:

Implementation Method:

Network Loss Introduction

Working Principle:

Implementation Method:

Network Duplicate Introduction

Working Principle:

Implementation Method:

Network Corrupt Introduction

Working Principle:

Implementation Method:

Network Partition Introduction

Working Principle:

Implementation Method:

Network Bandwidth Introduction

Working Principle:

Implementation Method:

TimeOffset Introduction

Working Principle:

Implementation Method:

Evicting Pod Introduction

Working Principle:

Implementation Method:

Significance:

-Verifying High Availability

-Testing Failover Mechanisms

-Discovering Potential Issues

-Enhancing System Robustness

Connection Stress Introduction

Working Principle:

Implementation Method:

Significance:

-Testing Connection Handling Limits

-Evaluating Auto-scaling Mechanisms

-Identifying Potential Issues

-Optimizing Resource Allocation

DNS Error Introduction

Working Principle:

Implementation Method:

DNS Random Introduction

Working Principle:

Implementation Method:

Postgresql ( Topology = replication ) failover

Redis ( Topology = replication ) failover

Kafka failover

Qdrant ( Topology = cluster ) failover

Mysql ( Topology = replication ) failover

Clickhouse ( Topology = cluster ) failover