Automating PostgreSQL Operations In Kubernetes Using KubeDB

New to KubeDB? Please start here .

Ensuring Rock-Solid PostgreSQL Uptime

A Guide to KubeDB’s High Availability and Auto-Failover

In today’s data-driven world, database downtime is not just an inconvenience; it can be a critical business failure. For teams running stateful applications on Kubernetes, ensuring the resilience of their databases is paramount. This is where KubeDB steps in, offering a robust, cloud-native way to manage PostgreSQL on Kubernetes.

One of KubeDB’s most powerful features is its built-in support for High Availability (HA) and automated failover. The KubeDB operator continuously monitors the health of your PostgreSQL cluster and along with the db sidecar injected for maintaining failover, it can automatically respond to failures, ensuring your database remains available with minimal disruption.

This article will guide you through KubeDB’s automated failover capabilities for PostgreSQL. We will set up an HA cluster and then simulate a leader failure to see KubeDB’s auto-recovery mechanism in action.

You will see how fast the failover happens when it’s truly necessary. Failover in KubeDB-managed PostgreSQL will generally happen within 2–10 seconds depending on your cluster networking. There is an exception scenario that we discussed later in this doc where failover might take a bit longer up to 45 seconds. But that is a bit rare though.

Before You Start

To follow along with this tutorial, you will need:

  1. A running Kubernetes cluster.
  2. KubeDB installed in your cluster.
  3. kubectl command-line tool configured to communicate with your cluster.

Step 1: Create a High-Availability PostgreSQL Cluster

First, we need to deploy a PostgreSQL cluster configured for High Availability. Unlike a Standalone instance, a HA cluster consists of a primary pod and one or more standby pods that are ready to take over if the leader fails.

Save the following YAML as pg-ha-demo.yaml. This manifest defines a 3-node PostgreSQL cluster with streaming replication enabled.

apiVersion: kubedb.com/v1
kind: Postgres
metadata:
  name: pg-ha-demo
  namespace: demo
spec:
  replicas: 3
  storageType: Durable
  podTemplate:
    spec:
      containers:
      - name: postgres
        resources:
          requests:
            cpu: "200m"
            memory: "256Mi"
  storage:
    accessModes:
    - ReadWriteOnce
    resources:
      requests:
        storage: 7Gi
  version: "17.2"

Now, create the namespace and apply the manifest:

# Create the namespace if it doesn't exist
kubectl create ns demo

# Apply the manifest to deploy the cluster
kubectl apply -f pg-ha-demo.yaml

You can monitor the status until all pods are ready:

watch kubectl get pg,petset,pods -n demo

See the database is ready.

➤ kubectl get pg,petset,pods -n demo
NAME                             VERSION   STATUS   AGE
postgres.kubedb.com/pg-ha-demo   17.2      Ready    4m45s

NAME                                      AGE
petset.apps.k8s.appscode.com/pg-ha-demo   4m41s

NAME               READY   STATUS    RESTARTS   AGE
pod/pg-ha-demo-0   2/2     Running   0          4m41s
pod/pg-ha-demo-1   2/2     Running   0          2m45s
pod/pg-ha-demo-2   2/2     Running   0          2m39s

Inspect who is primary and who is standby.

# you can inspect who is primary
# and who is secondary like below

➤ kubectl get pods -n demo --show-labels | grep role
pg-ha-demo-0   2/2     Running   0          20m   app.kubernetes.io/component=database,app.kubernetes.io/instance=pg-ha-demo,app.kubernetes.io/managed-by=kubedb.com,app.kubernetes.io/name=postgreses.kubedb.com,apps.kubernetes.io/pod-index=0,controller-revision-hash=pg-ha-demo-6c5954fd77,kubedb.com/role=primary,statefulset.kubernetes.io/pod-name=pg-ha-demo-0
pg-ha-demo-1   2/2     Running   0          19m   app.kubernetes.io/component=database,app.kubernetes.io/instance=pg-ha-demo,app.kubernetes.io/managed-by=kubedb.com,app.kubernetes.io/name=postgreses.kubedb.com,apps.kubernetes.io/pod-index=1,controller-revision-hash=pg-ha-demo-6c5954fd77,kubedb.com/role=standby,statefulset.kubernetes.io/pod-name=pg-ha-demo-1
pg-ha-demo-2   2/2     Running   0          18m   app.kubernetes.io/component=database,app.kubernetes.io/instance=pg-ha-demo,app.kubernetes.io/managed-by=kubedb.com,app.kubernetes.io/name=postgreses.kubedb.com,apps.kubernetes.io/pod-index=2,controller-revision-hash=pg-ha-demo-6c5954fd77,kubedb.com/role=standby,statefulset.kubernetes.io/pod-name=pg-ha-demo-2

The pod having kubedb.com/role=primary is the primary and kubedb.com/role=standby are the standby’s.

Lets create a table in the primary.

# find the primary pod
➤ kubectl get pods -n demo --show-labels | grep primary | awk '{ print $1 }'
pg-ha-demo-0

# exec into the primary pod
➤ kubectl exec -it -n demo pg-ha-demo-0  -- bash
Defaulted container "postgres" out of: postgres, pg-coordinator, postgres-init-container (init)
pg-ha-demo-0:/$ psql
psql (17.2)
Type "help" for help.

postgres=# create table hello(id int);
CREATE TABLE

Verify the table creation in standby’s.

➤ kubectl exec -it -n demo pg-ha-demo-1  -- bash
Defaulted container "postgres" out of: postgres, pg-coordinator, postgres-init-container (init)
pg-ha-demo-1:/$ psql
psql (17.2)
Type "help" for help.

postgres=# \dt
               List of relations
 Schema |        Name        | Type  |  Owner   
--------+--------------------+-------+----------
 public | hello              | table | postgres # this was created in primary earlier, so replication working
 public | kubedb_write_check | table | postgres
(2 rows)

Step 2: Simulating a Failover

Before simulating failover, let’s discuss how we handle these failover scenarios in KubeDB-managed Postgresql. We use sidecar container with all db pods, and inside that sidecar container, we use raft protocol to detect the viable primary of the postgresql cluster. Raft will choose a db pod as a leader of the postgresql cluster, we will check if that pod can really run as a leader. If everything is good with that chosen pod, we will run it as primary. This whole process of failover generally takes less than 10 seconds to complete. So you can expect very rapid failover to ensure high availability of your postgresql cluster.

Now current running primary is pg-ha-demo-0. Let’s open another terminal and run the command below.

watch -n 2 "kubectl get pods -n demo -o jsonpath='{range .items[*]}{.metadata.name} {.metadata.labels.kubedb\\.com/role}{\"\\n\"}{end}'"

It will show current pg cluster roles.

img.png

Case 1: Delete the current primary

Lets delete the current primary and see how the role change happens almost immediately.

➤ kubectl delete pods -n demo pg-ha-demo-0 
pod "pg-ha-demo-0" deleted

img_1.png

You see almost immediately the failover happened. Here’s what happened internally:

  • Distributed raft algorithm implementation is running 24 * 7 in your each db sidecar. You can configure this behavior as shown below.
  • As soon as pg-ha-demo-0 was being deleted and raft inside pg-ha-demo-0 senses the termination, it immediately switches the leadership to any other viable leader before termination.
  • In our case, raft inside pg-ha-demo-1 got the leadership.
  • Now this leader switch only means raft leader switch, not the database leader switch(aka failover) yet. So pg-ha-demo-1 still running as replica. It will be primary after the next step.
  • Once raft sidecar inside pg-ha-demo-1 see it has become leader of the cluster, it initiates the database failover process and start running as primary.
  • So, now pg-ha-demo-1 is running as primary.
# You can find this part in your db yaml by running
# kubectl get pg -n demo pg-ha-demo -oyaml
# under db.spec section
# vist below link for more information
# https://github.com/kubedb/apimachinery/blob/97c18a62d4e33a112e5f887dc3ad910edf3f3c82/apis/kubedb/v1/postgres_types.go#L204

leaderElection:
  electionTick: 10
  heartbeatTick: 1
  maximumLagBeforeFailover: 67108864
  period: 300ms
  transferLeadershipInterval: 1s
  transferLeadershipTimeout: 1m0s
  

Now we know how failover is done, let’s check if the new primary is working.

➤ kubectl exec -it -n demo pg-ha-demo-1  -- bash
Defaulted container "postgres" out of: postgres, pg-coordinator, postgres-init-container (init)
pg-ha-demo-1:/$ psql
psql (17.2)
Type "help" for help.

postgres=# create table hi(id int);
CREATE TABLE # See we were able to create the database. so failover was successful.
postgres=# 

You will see the deleted pod (pg-ha-demo-0) is brought back by the kubedb operator and it is now assigned to standby role.

img_2.png

Lets check if the standby(pg-ha-demo-0) got the updated data from new primary pg-ha-demo-1.

➤ kubectl exec -it -n demo pg-ha-demo-0  -- bash
Defaulted container "postgres" out of: postgres, pg-coordinator, postgres-init-container (init)
pg-ha-demo-0:/$ psql
psql (17.2)
Type "help" for help.

postgres=# \dt
               List of relations
 Schema |        Name        | Type  |  Owner   
--------+--------------------+-------+----------
 public | hello              | table | postgres
 public | hi                 | table | postgres # this was created in the new primary
 public | kubedb_write_check | table | postgres
(3 rows)

Case 2: Delete the current primary and One replica

➤ kubectl delete pods -n demo pg-ha-demo-1 pg-ha-demo-2
pod "pg-ha-demo-1" deleted
pod "pg-ha-demo-2" deleted

Again we can see the failover happened pretty quickly.

img_3.png

After 10-30 second, the deleted pods will be back and will have its role.

img_4.png

Lets validate the cluster state from new primary(pg-ha-demo-0).

➤ kubectl exec -it -n demo pg-ha-demo-0  -- bash
Defaulted container "postgres" out of: postgres, pg-coordinator, postgres-init-container (init)
pg-ha-demo-0:/$ psql
psql (17.2)
Type "help" for help.

postgres=# select * from pg_stat_replication;
 pid  | usesysid | usename  | application_name | client_addr | client_hostname | client_port |         backend_start         | backend_xmin |   state   | sent_lsn  | write_lsn | flush_lsn | replay_lsn |    write_lag    |    flush_lag    |   replay_lag    | sync_priority | sync_state |          reply_time           
------+----------+----------+------------------+-------------+-----------------+-------------+-------------------------------+--------------+-----------+-----------+-----------+-----------+------------+-----------------+-----------------+-----------------+---------------+------------+-------------------------------
 1098 |       10 | postgres | pg-ha-demo-1     | 10.42.0.191 |                 |       49410 | 2025-06-20 09:56:36.989448+00 |              | streaming | 0/70016A8 | 0/70016A8 | 0/70016A8 | 0/70016A8  | 00:00:00.000142 | 00:00:00.00066  | 00:00:00.000703 |             0 | async      | 2025-06-20 09:59:40.217223+00
 1129 |       10 | postgres | pg-ha-demo-2     | 10.42.0.192 |                 |       35216 | 2025-06-20 09:56:39.042789+00 |              | streaming | 0/70016A8 | 0/70016A8 | 0/70016A8 | 0/70016A8  | 00:00:00.000219 | 00:00:00.000745 | 00:00:00.00079  |             0 | async      | 2025-06-20 09:59:40.217308+00
(2 rows)

Case3: Delete any of the replica’s

Let’s delete both of the standby’s.

kubectl delete pods -n demo pg-ha-demo-1 pg-ha-demo-2
pod "pg-ha-demo-1" deleted
pod "pg-ha-demo-2" deleted

img_5.png

Shortly both of the pods will be back with its role.

img_6.png

Lets verify cluster state.

➤ kubectl exec -it -n demo pg-ha-demo-0  -- bash
Defaulted container "postgres" out of: postgres, pg-coordinator, postgres-init-container (init)
pg-ha-demo-0:/$ psql
psql (17.2)
Type "help" for help.

postgres=# select * from pg_stat_replication;
 pid  | usesysid | usename  | application_name | client_addr | client_hostname | client_port |         backend_start         | backend_xmin |   state   | sent_lsn  | write_lsn | flush_lsn | replay_lsn |    write_lag    |    flush_lag    |   replay_lag    | sync_priority | sync_state |          reply_time           
------+----------+----------+------------------+-------------+-----------------+-------------+-------------------------------+--------------+-----------+-----------+-----------+-----------+------------+-----------------+-----------------+-----------------+---------------+------------+-------------------------------
 5564 |       10 | postgres | pg-ha-demo-2     | 10.42.0.194 |                 |       51560 | 2025-06-20 10:06:26.988807+00 |              | streaming | 0/7014A58 | 0/7014A58 | 0/7014A58 | 0/7014A58  | 00:00:00.000178 | 00:00:00.000811 | 00:00:00.000848 |             0 | async      | 2025-06-20 10:07:50.218299+00
 5572 |       10 | postgres | pg-ha-demo-1     | 10.42.0.193 |                 |       36158 | 2025-06-20 10:06:27.980841+00 |              | streaming | 0/7014A58 | 0/7014A58 | 0/7014A58 | 0/7014A58  | 00:00:00.000194 | 00:00:00.000818 | 00:00:00.000895 |             0 | async      | 2025-06-20 10:07:50.218337+00
(2 rows)

Case 4: Delete both primary and all replicas

Let’s delete all the pods.

➤ kubectl delete pods -n demo pg-ha-demo-0 pg-ha-demo-1 pg-ha-demo-2
pod "pg-ha-demo-0" deleted
pod "pg-ha-demo-1" deleted
pod "pg-ha-demo-2" deleted

img_7.png

Within 20-30 second, all of the pod should be back.

img_8.png

Lets verify the cluster state now.

➤ kubectl exec -it -n demo pg-ha-demo-0  -- bash
Defaulted container "postgres" out of: postgres, pg-coordinator, postgres-init-container (init)
pg-ha-demo-0:/$ psql
psql (17.2)
Type "help" for help.

postgres=# select * from pg_stat_replication;
 pid | usesysid | usename  | application_name | client_addr | client_hostname | client_port |         backend_start         | backend_xmin |   state   | sent_lsn  | write_lsn | flush_lsn | replay_lsn |    write_lag    |    flush_lag    |   replay_lag    | sync_priority | sync_state |          reply_time           
-----+----------+----------+------------------+-------------+-----------------+-------------+-------------------------------+--------------+-----------+-----------+-----------+-----------+------------+-----------------+-----------------+-----------------+---------------+------------+-------------------------------
 132 |       10 | postgres | pg-ha-demo-2     | 10.42.0.197 |                 |       34244 | 2025-06-20 10:09:20.27726+00  |              | streaming | 0/9001848 | 0/9001848 | 0/9001848 | 0/9001848  | 00:00:00.00021  | 00:00:00.000841 | 00:00:00.000894 |             0 | async      | 2025-06-20 10:11:02.527633+00
 133 |       10 | postgres | pg-ha-demo-1     | 10.42.0.196 |                 |       40102 | 2025-06-20 10:09:20.279987+00 |              | streaming | 0/9001848 | 0/9001848 | 0/9001848 | 0/9001848  | 00:00:00.000225 | 00:00:00.000848 | 00:00:00.000905 |             0 | async      | 2025-06-20 10:11:02.527653+00
(2 rows)

We make sure the pod with highest lsn (you can think lsn as the highest data point available in your cluster) always run as primary, so if a case occur where the pod with highest lsn is being terminated, we will not perform the failover until the highest lsn pod is back online. So in a case, where that highest lsn primary is not recoverable, read this to do a force failover.

A Guide to Postgres Backup And Restore

You can configure Backup and Restore following the below documentation.

Backup and Restore

Youtube video Links: link

A Guide to Postgres PITR

Documentaion Link: PITR

Concepts and Demo: link

Basic Demo: link

Full Demo: link

A Guide to Handling Postgres Storage

It is often possible that your database storage become full and your database has stopped working. We have got you covered. You just apply a VolumeExpansion PostgresOpsRequest and your database storage will be increased, and the database will be ready to use again.

Disaster Scenario and Recovery

Scenario

You deploy a PostgreSQL database. The database was running fine. Someday, your database storage becomes full. As your postgres process can’t write to the filesystem, clients won’t be able to connect to the database. Your database status will be Not Ready.

Recovery

In order to recover from this, you can create a VolumeExpansion PostgresOpsRequest with expanded resource requests. As soon as you create this, KubeDB will trigger the necessary steps to expand your volume based on your specifications on the PostgresOpsRequest manifest. A sample PostgresOpsRequest manifest for VolumeExpansion is given below:

apiVersion: ops.kubedb.com/v1alpha1
kind: PostgresOpsRequest
metadata:
  name: pgops-vol-exp-ha-demo
  namespace: demo
spec:
  apply: Always
  databaseRef:
    name: pg-ha-demo
  type: VolumeExpansion
  volumeExpansion:
    mode: Online # see the notes, your storageclass must support this mode
    postgres: 20Gi # expanded resource

For more details, please check the full section here .

Note: There are two ways to update your volume: 1.Online 2.Offline. Which Mode to choose? It depends on your StorageClass. If your storageclass supports online volume expansion, you can go with it. Otherwise, you can go with Ofline Volume Expansion.

A Guide to Postgres Ops Requests

A PostgresOpsRequest lets you manage various database operational and day-2 features. For example, managing Database TLS, custom configuration, version upgrade, scaling, and so on.

Managing Postgresql database TLS

If you want to use encrypted connection or certificate-based authentication for clients, you can use PostgresOpsRequest. Based on your requirements, you can add, remove or rotate tls certificates. For more information, please follow the documentation section link1 , link2 .

Upgrade Postgresql Version

Upgrading a Postgresql version can be a nightmare for the DBA’s. We make this process a lot easier. You can apply a PostgresOpsRequest and your database will be upgraded to your desired versions. For more information, check this section of documentation.

Note: Before Upgrading, make sure your current version and the version you want to upgrade to, has the same base image. Also do not try to make a major jump where the major version difference is greater than one.

Scaling Postgresql Database

Being able to scale the database both horizontally and vertically is a blessing for database to handle more incoming loads. But sadly, just increasing the database replica should not work for most of the databases. Because the databases need to join the cluster and perform a few other database-specific tasks before joining the cluster. Don’t worry, we take care of those for you. You simply need to create a PostgresOpsRequest, and the scaling will be handled automatically.

Horizontal Scaling

For scaling Horizontally, follow this section of the documentation.

Vertical Scaling

For vertical scaling, follow this section.

Auto Scaling

We also support autoscaling! You can configure auto-scaling your database and forget about the loads that your system might face during peak hours!

To set up and configure, visit here for compute autoscaling and here for storage.

VolumeExpansion of Postgresql Database

If you need to increase your database storage, you can use VolumeExpansion PostgresOpsRequest.

For more details, please check the full section here .

Re-configure Postgresql configuration parameters

Do you need to update your PostgreSQL shared_buffers, max_connections, or other parameters? You can use our Reconfigure PostgresOpsRequest. Follow here

Remote Replica Support

Do you want to have a backup data center where you want to run your postgresql database to recover from a data center failure as soon as possible?

The concept of a remote replica is as follows:

  • You create two data centers. Let’s say one is in Singapore (client-serving) and the other is in London (disaster recovery cluster).
  • You create a client facing Postgresql Database using Kubedb in Singapore, and then create another Postgresql(as remote replica) in London.
  • Kubedb will connect this remote replica with the primary cluster (i.e Singapore) so that in case of a disaster in the Singapore cluster, you can promote the London cluster to serve the client faster.

For more information, follow here

Monitoring Postgresql Database

When uninterrupted service for your application and database matters, monitoring is a must for your cluster. Follow here for more.

CleanUp

kubectl delete pg -n demo pg-ha-demo

What Next?

Please try the latest release and give us your valuable feedback.

  • If you want to install KubeDB, please follow the installation instruction from here .

  • If you want to upgrade KubeDB from a previous version, please follow the upgrade instruction from here .

Support

To speak with us, please leave a message on our website .

To receive product announcements, follow us on Twitter .

If you have found a bug with KubeDB or want to request for new features, please file an issue .


TAGS

Get Up and Running Quickly

Deploy, manage, upgrade Kubernetes on any cloud and automate deployment, scaling, and management of containerized applications.