PostgreSQL FAQs

Q: 4. How to check replication lag on standby replicas

Pick the correct pods: the standby runs pglastxactreplaytimestamp(); pgstatreplication exists only on the primary. Patroni may place the primary on any ordinal (e.g. postgresql-0 or postgresql-1)—use kubectl get pods -n -l app.kubernetes.io/instance= --show-labels and the kubeblocks.io/role (or Patroni role) labels instead of assuming pod indexes. Connect to a standby pod and query the replication status: From the primary, check all connected standbys: A NULL result f

Q: 5. How to connect to a read replica directly

The default ClusterIP service ({cluster}-postgresql-postgresql) always routes to the primary pod via roleSelector. To send queries to a specific replica, connect through the headless service using the pod's stable DNS name: For example, to connect to pg-cluster-postgresql-1 in namespace demo: :::note Hot standbys accept only read-only transactions; writes fail with ERROR: cannot execute ... in a read-only transaction. Patroni-managed replicas run PostgreSQL in recovery; session behavior ma

Q: 7. pgbouncer connection pooling

Each pod runs a per-pod pgbouncer instance on port 6432. The addon template sets poolmode = session by default (see addons/postgresql/config/pgbouncer-ini.tpl); it proxies connections to the local pod's PostgreSQL instance (not to the primary). You cannot change pgbouncer with a Reconfiguring OpsRequest. Reconfiguring only applies to PostgreSQL parameters in postgresql.conf. pgbouncer.ini is supplied from a separate config template in the addon and is not part of that mechanism. That is a real

1. Use ETCD as Patroni DCS

KubeBlocks PostgreSQL uses the Kubernetes API itself as DCS (Distributed Config Store) by default. But when the control plane is under extreme high load, it may lead to unexpected demotion of the primary replica. And it's recommended to use ETCD as DCS in such extreme cases.


apiVersion: apps.kubeblocks.io/v1
kind: Cluster
metadata:
  name: pg-cluster-etcd
  namespace: demo
spec:
  terminationPolicy: Delete
  clusterDef: postgresql
  topology: replication
  componentSpecs:
    - name: postgresql
      serviceVersion: "16.4.0"
      env:
      - name: DCS_ENABLE_KUBERNETES_API
        value: "" # disable Kubernetes API DCS; required when using etcd or ZooKeeper with Patroni/Spilo
      - name: ETCD3_HOST
        value: 'etcd-cluster-etcd-headless.demo.svc.cluster.local:2379' # Spilo/Patroni etcd v3 endpoint(s); adjust to your etcd Service
      # - name: ZOOKEEPER_HOSTS
      #   value: 'myzk-zookeeper-0.myzk-zookeeper-headless.demo.svc.cluster.local:2181' # where is your zookeeper?
      replicas: 2
      resources:
        limits:
          cpu: "0.5"
          memory: "0.5Gi"
        requests:
          cpu: "0.5"
          memory: "0.5Gi"
      volumeClaimTemplates:
        - name: data
          spec:
            storageClassName: ""
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 20Gi

The key fields are:

DCS_ENABLE_KUBERNETES_API: Set to "" (empty string) so Patroni does not use the Kubernetes API as DCS
ETCD3_HOST or ETCD3_HOSTS: Etcd endpoint(s) for Spilo/Patroni when using etcd as DCS

Preferred on KubeBlocks: declare a serviceRefs entry named etcd on the postgresql component that points at your etcd Cluster. The PostgreSQL ComponentDefinition maps that to PATRONI_DCS_ETCD_SERVICE_ENDPOINT, which the startup script uses to switch off Kubernetes DCS and configure etcd (see the postgresql addon cmpd.yaml). The env-only snippet above is the same pattern as examples/postgresql/cluster-with-etcd.yaml in kubeblocks-addons.

You can also use ZooKeeper as DCS by setting DCS_ENABLE_KUBERNETES_API to "" and setting ZOOKEEPER_HOSTS to your ZooKeeper endpoints (per Spilo environment variables).

KubeBlocks has ETCD and ZooKeeper Addons in the kubeblocks-addons repository. You can refer to the following links for more details.

You can shell into one of the etcd container to view the etcd data, and view the etcd data with etcdctl.

etcdctl get /service --prefix

2. What to do if log files consume too much space

PostgreSQL log files can accumulate and consume significant disk space over time. Here are several approaches to manage log file storage:

How to check current log file usage

First, check the disk usage of your PostgreSQL pod:


kubectl exec -it <pod-name> -n <namespace> -- df -h /home/postgres/pgdata/pgroot/data/log

Option 1. Configure log filename pattern and reduce log verbosity

You can adjust PostgreSQL's built-in log filename pattern and log verbosity settings by modifying the cluster configuration.

The addon's shipped postgresql.conf templates (addons/postgresql/config/pg*-config.tpl) set log_directory = 'log' (relative to PGDATA under /home/postgres/pgdata/pgroot/data) and log_filename = 'postgresql-%u.log' by default. The % sequences follow PostgreSQL's strftime(3)-style rules; %u is the ISO-8601 weekday (1–7, Monday = 1), so filenames cycle within a week—not the same as “keep 7 calendar days” of daily-dated files.

For example, to switch to one log file per calendar day:


apiVersion: operations.kubeblocks.io/v1alpha1
kind: OpsRequest
metadata:
  name: pg-reconfigure-logs
  namespace: <namespace>
spec:
  clusterName: <cluster-name>
  reconfigures:
  - componentName: postgresql
    parameters:
      - key: log_filename
        value: "'postgresql-%Y-%m-%d.log'"
      - key: log_statement
        value: "'none'" # none, ddl, mod, all
  type: Reconfiguring

log_filename: Filename pattern; see the PostgreSQL documentation for allowed % escapes.
log_statement: Controls which SQL statements are logged (none, ddl, mod, all)

CAUTION

Due to the interaction between the YAML parser and PostgreSQL’s configuration parser, string values must be enclosed in single quotes and then wrapped again in double quotes. This ensures that the single quotes are preserved.

If this additional quoting is omitted, certain parameters, such as log_filename, may be parsed incorrectly, leading to errors like: syntax error in file "/home/postgres/conf/postgresql.conf" line 113, near token "%".

NOTE

In this addon, log_filename is a static parameter (see addons/postgresql/config/pg*-config-effect-scope.yaml), so changing it requires an instance restart, not only a configuration reload.

Option 2. Clean up old logs manually

If you need immediate space relief, you can manually remove old log files:


# Find and remove log files older than 7 days
kubectl exec -it <pod-name> -n <namespace> -- find /home/postgres/pgdata/pgroot/data/log -name "*.log" -mtime +7 -delete

CAUTION

Be careful when deleting log files manually. Ensure you have backups or have reviewed the logs before deletion.

Option 3. Increase storage capacity

If log management isn't sufficient, consider expanding the persistent volume here

This will increase the storage capacity for the data volume, which typically includes log files.

3. PostgreSQL fails to start with special characters in password

Problem Description

PostgreSQL may fail to start when the password contains certain special characters. By checking POD logs, it shows like this:


File "/usr/lib/python3/dist-packages/yaml/scanner.py", line 116, in check_token
    self.fetch_more_tokens()
  File "/usr/lib/python3/dist-packages/yaml/scanner.py", line 231, in fetch_more_tokens
    return self.fetch_anchor()
  File "/usr/lib/python3/dist-packages/yaml/scanner.py", line 621, in fetch_anchor
    self.tokens.append(self.scan_anchor(AnchorToken))
  File "/usr/lib/python3/dist-packages/yaml/scanner.py", line 929, in scan_anchor
    raise ScannerError("while scanning an %s" % name, start_mark,
yaml.scanner.ScannerError: while scanning an anchor
  in "<unicode string>", line 45, column 17:
          password: &amp;JgE#F5x&amp;eNwis*2dW!7&amp ...
                    ^

Affected Version

KubeBlocks v0.9.4 and before
KubeBlocks v1.0.0

Solution

Upgrade KubeBlocks to v1.0.1-beta.6 or v0.9.5-beta.4 or later.

To avoid this, you can explicitly set the list of symbols allowed in password generation policy.


apiVersion: apps.kubeblocks.io/v1
kind: Cluster
metadata:
spec:
  componentSpecs:
    - name: postgresql
      systemAccounts:
        - name: postgres
          passwordConfig:
            length: 20           # Password length: 20 characters
            numDigits: 4         # At least 4 digits
            numSymbols: 2        # At least 2 symbols
            letterCase: MixedCases # Uppercase and lowercase letters
            symbolCharacters: '!' # set the allowed symbols when generating password
# other fields in the Cluster manifest are omitted for brevity

4. How to check replication lag on standby replicas

Pick the correct pods: the standby runs pg_last_xact_replay_timestamp(); pg_stat_replication exists only on the primary. Patroni may place the primary on any ordinal (e.g. postgresql-0 or postgresql-1)—use kubectl get pods -n <namespace> -l app.kubernetes.io/instance=<cluster> --show-labels and the kubeblocks.io/role (or Patroni role) labels instead of assuming pod indexes.

Connect to a standby pod and query the replication status:


kubectl exec -it pg-cluster-postgresql-1 -n demo -- \
  env PGUSER=kbadmin PGPASSWORD=<password> PGDATABASE=postgres \
  psql -c "SELECT now() - pg_last_xact_replay_timestamp() AS replication_lag;"

From the primary, check all connected standbys:


kubectl exec -it pg-cluster-postgresql-0 -n demo -- \
  env PGUSER=kbadmin PGPASSWORD=<password> PGDATABASE=postgres \
  psql -c "SELECT client_addr, state, sent_lsn, write_lsn, flush_lsn, replay_lsn,
                  (sent_lsn - replay_lsn) AS lag_bytes
           FROM pg_stat_replication;"

A NULL result for pg_last_xact_replay_timestamp() means no WAL has been replayed yet — the replica may still be catching up from a base backup.

5. How to connect to a read replica directly

The default ClusterIP service ({cluster}-postgresql-postgresql) always routes to the primary pod via roleSelector. To send queries to a specific replica, connect through the headless service using the pod's stable DNS name:

{pod-name}.{cluster}-postgresql-headless.{namespace}.svc.cluster.local:5432

For example, to connect to pg-cluster-postgresql-1 in namespace demo:


kubectl exec -it pg-cluster-postgresql-0 -n demo -- \
  env PGUSER=kbadmin PGPASSWORD=<password> PGDATABASE=postgres \
  psql -h pg-cluster-postgresql-1.pg-cluster-postgresql-headless.demo.svc.cluster.local

NOTE

Hot standbys accept only read-only transactions; writes fail with ERROR: cannot execute ... in a read-only transaction. Patroni-managed replicas run PostgreSQL in recovery; session behavior matches standard PostgreSQL standby semantics.

6. What triggers automatic failover, and does it require manual intervention?

Failover is fully automatic and requires no manual intervention. The sequence is:

The primary pod stops sending Patroni heartbeats (process crash, OOM kill, node failure, or network partition).
Patroni's leader lease in DCS expires — duration depends on Patroni / bootstrap.dcs settings (often tens of seconds; verify in your config).
A replica acquires leadership in DCS (with the Kubernetes DCS backend, this is coordinated through the Patroni scope objects in the API).
KubeBlocks detects the role change via the roleProbe on the dbctl sidecar (HTTP GET /v1.0/getrole on port 5001, see the PostgreSQL addon cmpd.yaml) and updates the kubeblocks.io/role pod label.
The ClusterIP service's Endpoints automatically switch to the new primary.

Total failover time is typically on the order of about one to two probe/TTL cycles, depending on Patroni and network latency.

For a planned switchover (e.g., before maintenance), use the Switchover operation, which performs a graceful demotion with zero data loss.

7. pgbouncer connection pooling

Each pod runs a per-pod pgbouncer instance on port 6432. The addon template sets pool_mode = session by default (see addons/postgresql/config/pgbouncer-ini.tpl); it proxies connections to the local pod's PostgreSQL instance (not to the primary).

You cannot change pgbouncer with a Reconfiguring OpsRequest. Reconfiguring only applies to PostgreSQL parameters in postgresql.conf. pgbouncer.ini is supplied from a separate config template in the addon and is not part of that mechanism.

That is a real gap today: pool sizing and other pgbouncer knobs are not first-class tunables on the Cluster after deployment. The shipped pgbouncer-ini.tpl does not set default_pool_size, so PgBouncer's built-in default (20 connections per database/user pair) applies unless you customize the addon (template / chart) or manage the generated config through your own operational process.

Built-in template defaults (reference):

Parameter	Default	Description
`pool_mode`	session	Pooling granularity in the shipped template (`session`, `transaction`, or `statement` if you change the template).
`max_client_conn`	Template-derived	In `pgbouncer-ini.tpl`: if PostgreSQL container memory is visible, `min(memory_bytes / 9531392, 5000)` (integer division, cap 5000); otherwise 10000.
`default_pool_size`	20 (PgBouncer default)	Not set in the shipped template; not adjustable via `Reconfiguring`.

NOTE

pgbouncer is bundled on each PostgreSQL pod in this addon and cannot be disabled via a single flag. Because it proxies to the local pod, steering read/write traffic to the current primary is handled by the ClusterIP service's roleSelector, not by pgbouncer.

PostgreSQL FAQs

1. Use ETCD as Patroni DCS


apiVersion: apps.kubeblocks.io/v1
kind: Cluster
metadata:
  name: pg-cluster-etcd
  namespace: demo
spec:
  terminationPolicy: Delete
  clusterDef: postgresql
  topology: replication
  componentSpecs:
    - name: postgresql
      serviceVersion: "16.4.0"
      env:
      - name: DCS_ENABLE_KUBERNETES_API
        value: "" # disable Kubernetes API DCS; required when using etcd or ZooKeeper with Patroni/Spilo
      - name: ETCD3_HOST
        value: 'etcd-cluster-etcd-headless.demo.svc.cluster.local:2379' # Spilo/Patroni etcd v3 endpoint(s); adjust to your etcd Service
      # - name: ZOOKEEPER_HOSTS
      #   value: 'myzk-zookeeper-0.myzk-zookeeper-headless.demo.svc.cluster.local:2181' # where is your zookeeper?
      replicas: 2
      resources:
        limits:
          cpu: "0.5"
          memory: "0.5Gi"
        requests:
          cpu: "0.5"
          memory: "0.5Gi"
      volumeClaimTemplates:
        - name: data
          spec:
            storageClassName: ""
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 20Gi

The key fields are:

DCS_ENABLE_KUBERNETES_API: Set to "" (empty string) so Patroni does not use the Kubernetes API as DCS
ETCD3_HOST or ETCD3_HOSTS: Etcd endpoint(s) for Spilo/Patroni when using etcd as DCS

You can also use ZooKeeper as DCS by setting DCS_ENABLE_KUBERNETES_API to "" and setting ZOOKEEPER_HOSTS to your ZooKeeper endpoints (per Spilo environment variables).

KubeBlocks has ETCD and ZooKeeper Addons in the kubeblocks-addons repository. You can refer to the following links for more details.

You can shell into one of the etcd container to view the etcd data, and view the etcd data with etcdctl.

etcdctl get /service --prefix

2. What to do if log files consume too much space

PostgreSQL log files can accumulate and consume significant disk space over time. Here are several approaches to manage log file storage:

How to check current log file usage

First, check the disk usage of your PostgreSQL pod:


kubectl exec -it <pod-name> -n <namespace> -- df -h /home/postgres/pgdata/pgroot/data/log

Option 1. Configure log filename pattern and reduce log verbosity

You can adjust PostgreSQL's built-in log filename pattern and log verbosity settings by modifying the cluster configuration.

For example, to switch to one log file per calendar day:


apiVersion: operations.kubeblocks.io/v1alpha1
kind: OpsRequest
metadata:
  name: pg-reconfigure-logs
  namespace: <namespace>
spec:
  clusterName: <cluster-name>
  reconfigures:
  - componentName: postgresql
    parameters:
      - key: log_filename
        value: "'postgresql-%Y-%m-%d.log'"
      - key: log_statement
        value: "'none'" # none, ddl, mod, all
  type: Reconfiguring

log_filename: Filename pattern; see the PostgreSQL documentation for allowed % escapes.
log_statement: Controls which SQL statements are logged (none, ddl, mod, all)

CAUTION

NOTE

Option 2. Clean up old logs manually

If you need immediate space relief, you can manually remove old log files:


# Find and remove log files older than 7 days
kubectl exec -it <pod-name> -n <namespace> -- find /home/postgres/pgdata/pgroot/data/log -name "*.log" -mtime +7 -delete

CAUTION

Be careful when deleting log files manually. Ensure you have backups or have reviewed the logs before deletion.

Option 3. Increase storage capacity

If log management isn't sufficient, consider expanding the persistent volume here

This will increase the storage capacity for the data volume, which typically includes log files.

3. PostgreSQL fails to start with special characters in password

Problem Description

PostgreSQL may fail to start when the password contains certain special characters. By checking POD logs, it shows like this:


File "/usr/lib/python3/dist-packages/yaml/scanner.py", line 116, in check_token
    self.fetch_more_tokens()
  File "/usr/lib/python3/dist-packages/yaml/scanner.py", line 231, in fetch_more_tokens
    return self.fetch_anchor()
  File "/usr/lib/python3/dist-packages/yaml/scanner.py", line 621, in fetch_anchor
    self.tokens.append(self.scan_anchor(AnchorToken))
  File "/usr/lib/python3/dist-packages/yaml/scanner.py", line 929, in scan_anchor
    raise ScannerError("while scanning an %s" % name, start_mark,
yaml.scanner.ScannerError: while scanning an anchor
  in "<unicode string>", line 45, column 17:
          password: &amp;JgE#F5x&amp;eNwis*2dW!7&amp ...
                    ^

Affected Version

KubeBlocks v0.9.4 and before
KubeBlocks v1.0.0

Solution

Upgrade KubeBlocks to v1.0.1-beta.6 or v0.9.5-beta.4 or later.

To avoid this, you can explicitly set the list of symbols allowed in password generation policy.


apiVersion: apps.kubeblocks.io/v1
kind: Cluster
metadata:
spec:
  componentSpecs:
    - name: postgresql
      systemAccounts:
        - name: postgres
          passwordConfig:
            length: 20           # Password length: 20 characters
            numDigits: 4         # At least 4 digits
            numSymbols: 2        # At least 2 symbols
            letterCase: MixedCases # Uppercase and lowercase letters
            symbolCharacters: '!' # set the allowed symbols when generating password
# other fields in the Cluster manifest are omitted for brevity

4. How to check replication lag on standby replicas

Connect to a standby pod and query the replication status:


kubectl exec -it pg-cluster-postgresql-1 -n demo -- \
  env PGUSER=kbadmin PGPASSWORD=<password> PGDATABASE=postgres \
  psql -c "SELECT now() - pg_last_xact_replay_timestamp() AS replication_lag;"

From the primary, check all connected standbys:


kubectl exec -it pg-cluster-postgresql-0 -n demo -- \
  env PGUSER=kbadmin PGPASSWORD=<password> PGDATABASE=postgres \
  psql -c "SELECT client_addr, state, sent_lsn, write_lsn, flush_lsn, replay_lsn,
                  (sent_lsn - replay_lsn) AS lag_bytes
           FROM pg_stat_replication;"

A NULL result for pg_last_xact_replay_timestamp() means no WAL has been replayed yet — the replica may still be catching up from a base backup.

5. How to connect to a read replica directly

{pod-name}.{cluster}-postgresql-headless.{namespace}.svc.cluster.local:5432

For example, to connect to pg-cluster-postgresql-1 in namespace demo:


kubectl exec -it pg-cluster-postgresql-0 -n demo -- \
  env PGUSER=kbadmin PGPASSWORD=<password> PGDATABASE=postgres \
  psql -h pg-cluster-postgresql-1.pg-cluster-postgresql-headless.demo.svc.cluster.local

NOTE

6. What triggers automatic failover, and does it require manual intervention?

Failover is fully automatic and requires no manual intervention. The sequence is:

The primary pod stops sending Patroni heartbeats (process crash, OOM kill, node failure, or network partition).
Patroni's leader lease in DCS expires — duration depends on Patroni / bootstrap.dcs settings (often tens of seconds; verify in your config).
A replica acquires leadership in DCS (with the Kubernetes DCS backend, this is coordinated through the Patroni scope objects in the API).
KubeBlocks detects the role change via the roleProbe on the dbctl sidecar (HTTP GET /v1.0/getrole on port 5001, see the PostgreSQL addon cmpd.yaml) and updates the kubeblocks.io/role pod label.
The ClusterIP service's Endpoints automatically switch to the new primary.

Total failover time is typically on the order of about one to two probe/TTL cycles, depending on Patroni and network latency.

For a planned switchover (e.g., before maintenance), use the Switchover operation, which performs a graceful demotion with zero data loss.

7. pgbouncer connection pooling

Built-in template defaults (reference):

Parameter	Default	Description
`pool_mode`	session	Pooling granularity in the shipped template (`session`, `transaction`, or `statement` if you change the template).
`max_client_conn`	Template-derived	In `pgbouncer-ini.tpl`: if PostgreSQL container memory is visible, `min(memory_bytes / 9531392, 5000)` (integer division, cap 5000); otherwise 10000.
`default_pool_size`	20 (PgBouncer default)	Not set in the shipped template; not adjustable via `Reconfiguring`.

NOTE