Operations
Backup And Restores
Custom Secret
Monitoring
tpl
Rebuilding PostgreSQL Replicas in KubeBlocks
This guide demonstrates how to rebuild replicas using both in-place and non-in-place methods.
What is Replica Rebuilding?
Replica rebuilding is the process of recreating a PostgreSQL replica from scratch or from a backup while maintaining:
- Data Consistency: Ensures the replica has an exact copy of primary data
- High Availability: Minimizes downtime during the rebuild process
During this process:
- The problematic replica is identified and isolated
- A new base backup is taken from the primary
- WAL (Write-Ahead Log) segments are streamed to catch up
- The replica rejoins the replication cluster
When to Rebuild a PostgreSQL Instance?
Rebuilding becomes necessary in these common scenarios:
- Replica falls too far behind primary (irrecoverable lag), or Replication slot corruption
- WAL file gaps that can't be automatically resolved
- Data Corruption: with storage-level corruption (disk/volume issues), inconsistent data between primary and replica, etc
- Infrastructure Issues: Node failure, storage device failure or cross Zone/Region migration
Prerequisites
Before proceeding, ensure the following:
- Environment Setup:
- A Kubernetes cluster is up and running.
- The kubectl CLI tool is configured to communicate with your cluster.
- KubeBlocks CLI and KubeBlocks Operator are installed. Follow the installation instructions here.
- Namespace Preparation: To keep resources isolated, create a dedicated namespace for this tutorial:
kubectl create ns demo
namespace/demo created
Deploy a PostgreSQL Cluster
KubeBlocks uses a declarative approach for managing PostgreSQL clusters. Below is an example configuration for deploying a PostgreSQL cluster with 2 replicas (1 primary, 1 replicas).
Apply the following YAML configuration to deploy the cluster:
apiVersion: apps.kubeblocks.io/v1
kind: Cluster
metadata:
name: pg-cluster
namespace: demo
spec:
terminationPolicy: Delete
clusterDef: postgresql
topology: replication
componentSpecs:
- name: postgresql
serviceVersion: 16.4.0
labels:
apps.kubeblocks.postgres.patroni/scope: pg-cluster-postgresql
disableExporter: true
replicas: 2
resources:
limits:
cpu: "0.5"
memory: "0.5Gi"
requests:
cpu: "0.5"
memory: "0.5Gi"
volumeClaimTemplates:
- name: data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi
Verifying the Deployment
Monitor the cluster status until it transitions to the Running state:
kubectl get cluster pg-cluster -n demo -w
Expected Output:
NAME CLUSTER-DEFINITION TERMINATION-POLICY STATUS AGE
pg-cluster postgresql Delete Creating 50s
pg-cluster postgresql Delete Running 4m2s
Once the cluster status becomes Running, your PostgreSQL cluster is ready for use.
If you are creating the cluster for the very first time, it may take some time to pull images before running.
Connect to the Primary PostgreSQL Replcia and Write Mock Data
Check replica roles with command:
kubectl get pods -n demo -l app.kubernetes.io/instance=pg-cluster -L kubeblocks.io/role
Example Output:
NAME READY STATUS RESTARTS AGE ROLE
pg-cluster-postgresql-0 4/4 Running 0 13m secondary
pg-cluster-postgresql-1 4/4 Running 0 12m primary
Step 1: Connect to the Primary Instance
KubeBlocks automatically creates a Secret containing the PostgreSQL postgres credentials. Retrieve the PostgreSQL postgres credentials:
NAME=`kubectl get secrets -n demo pg-cluster-postgresql-account-postgres -o jsonpath='{.data.username}' | base64 -d`
PASSWD=`kubectl get secrets -n demo pg-cluster-postgresql-account-postgres -o jsonpath='{.data.password}' | base64 -d`
Connect to the primary replica through service pg-cluster-postgresql-postgresql
which routes data to primary replica.
kubectl exec -ti -n demo pg-cluster-postgresql-0 -- env PGUSER=${NAME} PGPASSWORD=${PASSWD} psql -h pg-cluster-postgresql-postgresql
Step 2: Write Data to the Primary Instance
Connect to the primary instance and write a record to the database:
postgrel> CREATE DATABASE test;
postgrel> \c test;
postgrel> CREATE TABLE t1 (id INT PRIMARY KEY, name VARCHAR(255));
postgrel> INSERT INTO t1 VALUES (1, 'John Doe');
Step 3: Verify Data Replication
Connect to the replica instance (e.g. pg-cluster-postgresql-0) to verify that the data has been replicated:
kubectl exec -ti -n demo pg-cluster-postgresql-0 -- env PGUSER=${NAME} PGPASSWORD=${PASSWD} psql -h 127.0.0.1
If the primary instance is 'pg-cluster-postgresql-0', you should connect to 'pg-cluster-postgresql-1' instead. Make sure to check the role of each instance before connecting.
postgrel> \c test;
postgrel> SELECT * FROM test.t1;
Example Output:
id | name
----+----------
1 | John Doe
(1 row)
Rebuild the Replica
KubeBlocks provides two approaches for rebuilding replicas: in-place and non-in-place.
In-Place Rebuild
Workflow:
- Original Pod (e.g. 'pg-cluster-postgresql-0') is terminated
- New Pod is created with same name, New PVC is provisioned.
- Data is synchronized from primary
Rebuild the replica in-place using the following configuration:
apiVersion: operations.kubeblocks.io/v1alpha1
kind: OpsRequest
metadata:
name: pg-rebuild-replica-inplace
namespace: demo
spec:
clusterName: pg-cluster
force: true
preConditionDeadlineSeconds: 0
rebuildFrom:
- componentName: postgresql
inPlace: true # set inPlace to true
instances:
- name: pg-cluster-postgresql-0
type: RebuildInstance
In this configuration, "pg-cluster-postgresql-0" refers to the instance name (Pod name) that will be repaired.
Monitor the rebuild operation:
kubectl get ops pg-rebuild-replica-inplace -n demo -w
Example Output:
NAME TYPE CLUSTER STATUS PROGRESS AGE
pg-rebuild-replica-inplace RebuildInstance pg-cluster Running 0/1 5s
pg-rebuild-replica-inplace RebuildInstance pg-cluster Running 0/1 5s
pg-rebuild-replica-inplace RebuildInstance pg-cluster Running 0/1 46s
pg-rebuild-replica-inplace RebuildInstance pg-cluster Running 1/1 46s
pg-rebuild-replica-inplace RebuildInstance pg-cluster Succeed 1/1 47s
Verify the Pods to confirm the replica ("pg-cluster-postgresql-0") , its PVC and PV have been recreated.
kubectl get po,pvc,pv -l app.kubernetes.io/instance=pg-cluster -ndemo
Example Output:
kubectl get po,pvc,pv -l app.kubernetes.io/instance=pg-cluster -ndemo
NAME READY STATUS RESTARTS AGE
pod/pg-cluster-postgresql-0 4/4 Running 0 5m6s
pod/pg-cluster-postgresql-1 4/4 Running 0 14m
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE
persistentvolumeclaim/data-pg-cluster-postgresql-0 Bound pvc-xxx 20Gi RWO <SC> <unset> 5m6s
persistentvolumeclaim/data-pg-cluster-postgresql-1 Bound pvc-yyy 20Gi RWO <SC> <unset> 14m
Connect to the replica and check if the data has been restored:
kubectl exec -ti -n demo pg-cluster-postgresql-0 -- env PGUSER=${NAME} PGPASSWORD=${PASSWD} psql -h 127.0.0.1
postgrel> \c test;
postgrel> select * from t1;
id | name
----+----------
1 | John Doe
(1 row)
Non-In-Place Rebuild
Workflow:
- New Pod (e.g. 'pg-cluster-postgresql-2') is created
- Data is synchronized from primary
- Original Pod is terminated after new replica is ready
Rebuild the replica by creating a new instance:
apiVersion: operations.kubeblocks.io/v1alpha1
kind: OpsRequest
metadata:
name: pg-rebuild-replica-non-inplace
namespace: demo
spec:
clusterName: pg-cluster
force: true
preConditionDeadlineSeconds: 0
rebuildFrom:
- componentName: postgresql
inPlace: false
instances:
- name: pg-cluster-postgresql-0
type: RebuildInstance
In this configuration, "pg-cluster-postgresql-0" refers to the instance name (Pod name) that will be repaired.
Monitor the rebuild operation:
kubectl get ops pg-rebuild-replica-inplace -n demo -w
Example Output:
NAME TYPE CLUSTER STATUS PROGRESS AGE
pg-rebuild-replica-non-inplace RebuildInstance pg-cluster Running 0/1 5s
pg-rebuild-replica-non-inplace RebuildInstance pg-cluster Running 0/1 5s
pg-rebuild-replica-non-inplace RebuildInstance pg-cluster Running 0/1 46s
pg-rebuild-replica-non-inplace RebuildInstance pg-cluster Running 1/1 46s
pg-rebuild-replica-non-inplace RebuildInstance pg-cluster Succeed 1/1 47s
kubectl get pods -l app.kubernetes.io/instance=pg-cluster -n demo -w
NAME READY STATUS RESTARTS AGE
pg-cluster-postgresql-0 4/4 Running 0 53m
pg-cluster-postgresql-1 4/4 Running 0 2m52s
pg-cluster-postgresql-2 0/4 Pending 0 0s
pg-cluster-postgresql-2 0/4 Pending 0 4s
pg-cluster-postgresql-2 0/4 Init:0/4 0 4s
pg-cluster-postgresql-2 0/4 Init:1/4 0 5s
pg-cluster-postgresql-2 0/4 Init:2/4 0 6s
pg-cluster-postgresql-2 0/4 Init:3/4 0 7s
pg-cluster-postgresql-2 0/4 PodInitializing 0 8s
pg-cluster-postgresql-2 2/4 Running 0 9s
pg-cluster-postgresql-2 2/4 Running 0 12s
pg-cluster-postgresql-2 2/4 Running 0 14s
pg-cluster-postgresql-2 3/4 Running 0 14s
pg-cluster-postgresql-2 3/4 Running 0 16s
pg-cluster-postgresql-2 4/4 Running 0 3m30s
pg-cluster-postgresql-0 4/4 Terminating 0 4m3s
pg-cluster-postgresql-0 4/4 Terminating 0 4m3s
pg-cluster-postgresql-0 4/4 Terminating 0 4m3s
Connect to the new replica instance ('pg-cluster-postgresql-2') and verify the data:
kubectl exec -ti -n demo pg-cluster-postgresql-2 -- env PGUSER=${NAME} PGPASSWORD=${PASSWD} psql -h 127.0.0.1
postgrel> \c test;
postgrel> select * from t1;
id | name
----+----------
1 | John Doe
(1 row)
Rebuild from Backups
This configuration below shows recovering a failed replica by restoring it from a known backup using backupName
:
apiVersion: operations.kubeblocks.io/v1alpha1
kind: OpsRequest
metadata:
name: pg-rebuild-from-backup
namespace: demo
spec:
clusterName: pg-cluster
force: true
rebuildFrom:
- backupName: <PG_BACKUP_NAME>
componentName: postgresql
inPlace: true
instances:
- name: pg-cluster-postgresql-1
type: RebuildInstance
Rebuild to Specific Node
To rebuild the new replica on the specific node, you may use targetNodeName
:
apiVersion: operations.kubeblocks.io/v1alpha1
kind: OpsRequest
metadata:
name: pg-rebuild-from-backup
namespace: demo
spec:
clusterName: pg-cluster
force: true
rebuildFrom:
- backupName: <PG_BACKUP_NAME>
componentName: postgresql
inPlace: true
instances:
- name: pg-cluster-postgresql-1
targetNodeName: <NODE_NAME> # new pod will be scheduled to the specified nod
type: RebuildInstance
Summary
Key takeaways:
- In-Place Rebuild: Successfully rebuilt the replica and restored the deleted data.
- Non-In-Place Rebuild: Created a new replica instance and successfully restored the data.
Both methods effectively recover the replica and ensure data consistency.