Operations
Backup And Restores
Custom Secret
Monitoring
tpl
KubeBlocks PostgreSQL uses the Kubernetes API itself as DCS (Distributed Config Store) by default. But when the control plane is under extreme high load, it may lead to unexpected demotion of the primary replica. And it's recommended to use ETCD as DCS in such extreme cases.
apiVersion: apps.kubeblocks.io/v1
kind: Cluster
metadata:
name: pg-cluster-etcd
namespace: demo
spec:
terminationPolicy: Delete
clusterDef: postgresql
topology: replication
componentSpecs:
- name: postgresql
serviceVersion: "16.4.0"
env:
- name: DCS_ENABLE_KUBERNETES_API
value: "" # disable Kubernetes API DCS; required when using etcd or ZooKeeper with Patroni/Spilo
- name: ETCD3_HOST
value: 'etcd-cluster-etcd-headless.demo.svc.cluster.local:2379' # Spilo/Patroni etcd v3 endpoint(s); adjust to your etcd Service
# - name: ZOOKEEPER_HOSTS
# value: 'myzk-zookeeper-0.myzk-zookeeper-headless.demo.svc.cluster.local:2181' # where is your zookeeper?
replicas: 2
resources:
limits:
cpu: "0.5"
memory: "0.5Gi"
requests:
cpu: "0.5"
memory: "0.5Gi"
volumeClaimTemplates:
- name: data
spec:
storageClassName: ""
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi
The key fields are:
DCS_ENABLE_KUBERNETES_API: Set to "" (empty string) so Patroni does not use the Kubernetes API as DCSETCD3_HOST or ETCD3_HOSTS: Etcd endpoint(s) for Spilo/Patroni when using etcd as DCSPreferred on KubeBlocks: declare a serviceRefs entry named etcd on the postgresql component that points at your etcd Cluster. The PostgreSQL ComponentDefinition maps that to PATRONI_DCS_ETCD_SERVICE_ENDPOINT, which the startup script uses to switch off Kubernetes DCS and configure etcd (see the postgresql addon cmpd.yaml). The env-only snippet above is the same pattern as examples/postgresql/cluster-with-etcd.yaml in kubeblocks-addons.
You can also use ZooKeeper as DCS by setting DCS_ENABLE_KUBERNETES_API to "" and setting ZOOKEEPER_HOSTS to your ZooKeeper endpoints (per Spilo environment variables).
KubeBlocks has ETCD and ZooKeeper Addons in the kubeblocks-addons repository. You can refer to the following links for more details.
You can shell into one of the etcd container to view the etcd data, and view the etcd data with etcdctl.
etcdctl get /service --prefix
PostgreSQL log files can accumulate and consume significant disk space over time. Here are several approaches to manage log file storage:
First, check the disk usage of your PostgreSQL pod:
kubectl exec -it <pod-name> -n <namespace> -- df -h /home/postgres/pgdata/pgroot/data/log
You can adjust PostgreSQL's built-in log filename pattern and log verbosity settings by modifying the cluster configuration.
The addon's shipped postgresql.conf templates (addons/postgresql/config/pg*-config.tpl) set log_directory = 'log' (relative to PGDATA under /home/postgres/pgdata/pgroot/data) and log_filename = 'postgresql-%u.log' by default. The % sequences follow PostgreSQL's strftime(3)-style rules; %u is the ISO-8601 weekday (1–7, Monday = 1), so filenames cycle within a week—not the same as “keep 7 calendar days” of daily-dated files.
For example, to switch to one log file per calendar day:
apiVersion: operations.kubeblocks.io/v1alpha1
kind: OpsRequest
metadata:
name: pg-reconfigure-logs
namespace: <namespace>
spec:
clusterName: <cluster-name>
reconfigures:
- componentName: postgresql
parameters:
- key: log_filename
value: "'postgresql-%Y-%m-%d.log'"
- key: log_statement
value: "'none'" # none, ddl, mod, all
type: Reconfiguring
log_filename: Filename pattern; see the PostgreSQL documentation for allowed % escapes.log_statement: Controls which SQL statements are logged (none, ddl, mod, all)Due to the interaction between the YAML parser and PostgreSQL’s configuration parser, string values must be enclosed in single quotes and then wrapped again in double quotes. This ensures that the single quotes are preserved.
If this additional quoting is omitted, certain parameters, such as log_filename, may be parsed incorrectly, leading to errors like:
syntax error in file "/home/postgres/conf/postgresql.conf" line 113, near token "%".
In this addon, log_filename is a static parameter (see addons/postgresql/config/pg*-config-effect-scope.yaml), so changing it requires an instance restart, not only a configuration reload.
If you need immediate space relief, you can manually remove old log files:
# Find and remove log files older than 7 days
kubectl exec -it <pod-name> -n <namespace> -- find /home/postgres/pgdata/pgroot/data/log -name "*.log" -mtime +7 -delete
Be careful when deleting log files manually. Ensure you have backups or have reviewed the logs before deletion.
If log management isn't sufficient, consider expanding the persistent volume here
This will increase the storage capacity for the data volume, which typically includes log files.
PostgreSQL may fail to start when the password contains certain special characters. By checking POD logs, it shows like this:
File "/usr/lib/python3/dist-packages/yaml/scanner.py", line 116, in check_token
self.fetch_more_tokens()
File "/usr/lib/python3/dist-packages/yaml/scanner.py", line 231, in fetch_more_tokens
return self.fetch_anchor()
File "/usr/lib/python3/dist-packages/yaml/scanner.py", line 621, in fetch_anchor
self.tokens.append(self.scan_anchor(AnchorToken))
File "/usr/lib/python3/dist-packages/yaml/scanner.py", line 929, in scan_anchor
raise ScannerError("while scanning an %s" % name, start_mark,
yaml.scanner.ScannerError: while scanning an anchor
in "<unicode string>", line 45, column 17:
password: &JgE#F5x&eNwis*2dW!7& ...
^
Upgrade KubeBlocks to v1.0.1-beta.6 or v0.9.5-beta.4 or later.
To avoid this, you can explicitly set the list of symbols allowed in password generation policy.
apiVersion: apps.kubeblocks.io/v1
kind: Cluster
metadata:
spec:
componentSpecs:
- name: postgresql
systemAccounts:
- name: postgres
passwordConfig:
length: 20 # Password length: 20 characters
numDigits: 4 # At least 4 digits
numSymbols: 2 # At least 2 symbols
letterCase: MixedCases # Uppercase and lowercase letters
symbolCharacters: '!' # set the allowed symbols when generating password
# other fields in the Cluster manifest are omitted for brevity
Pick the correct pods: the standby runs pg_last_xact_replay_timestamp(); pg_stat_replication exists only on the primary. Patroni may place the primary on any ordinal (e.g. postgresql-0 or postgresql-1)—use kubectl get pods -n <namespace> -l app.kubernetes.io/instance=<cluster> --show-labels and the kubeblocks.io/role (or Patroni role) labels instead of assuming pod indexes.
Connect to a standby pod and query the replication status:
kubectl exec -it pg-cluster-postgresql-1 -n demo -- \
env PGUSER=kbadmin PGPASSWORD=<password> PGDATABASE=postgres \
psql -c "SELECT now() - pg_last_xact_replay_timestamp() AS replication_lag;"
From the primary, check all connected standbys:
kubectl exec -it pg-cluster-postgresql-0 -n demo -- \
env PGUSER=kbadmin PGPASSWORD=<password> PGDATABASE=postgres \
psql -c "SELECT client_addr, state, sent_lsn, write_lsn, flush_lsn, replay_lsn,
(sent_lsn - replay_lsn) AS lag_bytes
FROM pg_stat_replication;"
A NULL result for pg_last_xact_replay_timestamp() means no WAL has been replayed yet — the replica may still be catching up from a base backup.
The default ClusterIP service ({cluster}-postgresql-postgresql) always routes to the primary pod via roleSelector. To send queries to a specific replica, connect through the headless service using the pod's stable DNS name:
{pod-name}.{cluster}-postgresql-headless.{namespace}.svc.cluster.local:5432
For example, to connect to pg-cluster-postgresql-1 in namespace demo:
kubectl exec -it pg-cluster-postgresql-0 -n demo -- \
env PGUSER=kbadmin PGPASSWORD=<password> PGDATABASE=postgres \
psql -h pg-cluster-postgresql-1.pg-cluster-postgresql-headless.demo.svc.cluster.local
Hot standbys accept only read-only transactions; writes fail with ERROR: cannot execute ... in a read-only transaction. Patroni-managed replicas run PostgreSQL in recovery; session behavior matches standard PostgreSQL standby semantics.
Failover is fully automatic and requires no manual intervention. The sequence is:
bootstrap.dcs settings (often tens of seconds; verify in your config).scope objects in the API).roleProbe on the dbctl sidecar (HTTP GET /v1.0/getrole on port 5001, see the PostgreSQL addon cmpd.yaml) and updates the kubeblocks.io/role pod label.Endpoints automatically switch to the new primary.Total failover time is typically on the order of about one to two probe/TTL cycles, depending on Patroni and network latency.
For a planned switchover (e.g., before maintenance), use the Switchover operation, which performs a graceful demotion with zero data loss.
Each pod runs a per-pod pgbouncer instance on port 6432. The addon template sets pool_mode = session by default (see addons/postgresql/config/pgbouncer-ini.tpl); it proxies connections to the local pod's PostgreSQL instance (not to the primary).
You cannot change pgbouncer with a Reconfiguring OpsRequest. Reconfiguring only applies to PostgreSQL parameters in postgresql.conf. pgbouncer.ini is supplied from a separate config template in the addon and is not part of that mechanism.
That is a real gap today: pool sizing and other pgbouncer knobs are not first-class tunables on the Cluster after deployment. The shipped pgbouncer-ini.tpl does not set default_pool_size, so PgBouncer's built-in default (20 connections per database/user pair) applies unless you customize the addon (template / chart) or manage the generated config through your own operational process.
Built-in template defaults (reference):
| Parameter | Default | Description |
|---|---|---|
pool_mode | session | Pooling granularity in the shipped template (session, transaction, or statement if you change the template). |
max_client_conn | Template-derived | In pgbouncer-ini.tpl: if PostgreSQL container memory is visible, min(memory_bytes / 9531392, 5000) (integer division, cap 5000); otherwise 10000. |
default_pool_size | 20 (PgBouncer default) | Not set in the shipped template; not adjustable via Reconfiguring. |
pgbouncer is bundled on each PostgreSQL pod in this addon and cannot be disabled via a single flag. Because it proxies to the local pod, steering read/write traffic to the current primary is handled by the ClusterIP service's roleSelector, not by pgbouncer.