Announcing KubeDB v2023.12.28

We are pleased to announce the release of KubeDB v2023.12.28 . This release was mainly focused on improving the kubedb-autoscaler feature. We have also included point-in-time-recovery feature for MySQL. This post lists all the changes done in this release since the last release. Find the detailed changelogs HERE . Let’s see the changes done in this release.

Improving KubeDB Autoscaler

Here is an overall workflow of the kubedb-autoscaler to better understand the problem, we solved in this release.

  • The autoscaler operator watches the usages of compute resources (cpu, memory) & storage resources, and generates OpsRequest CR to automatically change the resources.
  • The ops-manager operator watches the created VerticalOpsRequest for compute resources, update db’s statefulsets & evict the db pods following pod disruption budget.
  • k8s scheduler sees the updated resource requests in those pods, & finds an appropriate node for scheduling.
  • If k8s scheduler can’t find an appropriate node, cloud provider’s cluster autoscaler (if enabled) scales one of the nodepools to make room for that pod.

This procedure works fine while up-scaling the compute resources. Some nodes from bigger nodepools will be automatically created by the cluster autoscaler whenever some scheduling issues occur. But this procedure becomes very resource-intensive while down-scaling the compute resources. As the k8s scheduler sees some big nodes are already available for scheduling, it does not choose a smaller node where these down-scaled pods could be easily running.

So to solve this issue, we need a way so that we can forcefully schedule those smaller pods into smaller nodepools. We have introduced a new CRD, called NodeTopology to achieve it. Here is an example NodeTopology CR:

apiVersion: node.k8s.appscode.com/v1alpha1
kind: NodeTopology
metadata:
  name: gke-n1-standard
spec:
  nodeSelectionPolicy: Taint
  topologyKey: nodepool_type
  nodeGroups:
  - allocatable:
      cpu: 940m
      memory: 2.56Gi
    topologyValue: n1-standard-1
  - allocatable:
      cpu: 1930m
      memory: 5.48Gi
    topologyValue: n1-standard-2
  - allocatable:
      cpu: 3920m
      memory: 12.07Gi
    topologyValue: n1-standard-4
  - allocatable:
      cpu: 7910m
      memory: 25.89Gi
    topologyValue: n1-standard-8
  - allocatable:
      cpu: 15890m
      memory: 53.57Gi
    topologyValue: n1-standard-16
  - allocatable:
      cpu: 31850m
      memory: 109.03Gi
    topologyValue: n1-standard-32
  - allocatable:
      cpu: 63770m
      memory: 224.45Gi
    topologyValue: n1-standard-64

It is a cluster-scoped resource. It supports two types of nodeSelectionPolicy : LabelSelector and Taint. Here is the general rule to choose between these two.

If you want to run the database pods in some dedicated nodes, and don’t want to allow any other pods to be scheduled there, the Taint policy is appropriate for you. For other general cases, use LabelSelector.

It is also possible to schedule different types of db pods into different nodepools. Here is an example MongoDB CR yaml :

apiVersion: kubedb.com/v1alpha2
kind: MongoDB
metadata:
  name: mg-database
  namespace: demo
spec:
  version: "4.4.26"
  terminationPolicy: WipeOut
  replicas: 3
  replicaSet:
    name: "rs"
  podTemplate:
    spec:
      nodeSelector:
        app: "kubedb"
        instance: "mongodb"
        component: "mg-database"
      tolerations:
      - key: nodepool_type
        value: n1-standard-2
        effect: NoSchedule
      - key: app
        value: kubedb
        effect: NoSchedule
      - key: instance
        value: mongodb
        effect: NoSchedule
      - key: component
        value: mg-database
        effect: NoSchedule
      resources:
        requests:
          "cpu": "1435m"
          "memory": "4.02Gi"
        limits:
          "cpu": "1930m"
          "memory": "5.48Gi"
  storage:
    accessModes:
    - ReadWriteOnce
    resources:
      requests:
        storage: 5Gi

IMPORTANT : The node pool sizes, the starting resource requests, and the auto scaler configuration must be carefully choreographed for optimal behavior.

  • Database’s initial resource request will be in the mid-point of the extra resource this nodepool provides. More specifically , It will be, (current nodepool’s allocatable - immediate lower nodepool’s allocatable)/2 + immediate lower nodepool’s allocatable.
  • Database’s initial resource limit should match the initial nodepool’s allocatable resources.
  • Autoscaler CR’s minAllowed will be database’s initial request * 0.9 if this current nodePool is the smallest nodePool. Otherwise, You have to calculate, what could be the initial resource request if this db was provisioned in the smallest nodepool , and then multiply it with 0.9.
  • Autoscaler CR’s maxAllowed will be biggest nodePool’s allocatable resources.

For example, in the above db yaml, requested cpu = (1930m - 940m)/2 + 940m = 1435m. And it’s cpu limit is the allocatable cpu of n1-standard-2 pool, which is 1930m.

In the below autoscaler yaml, minAllowed cpu = (940m / 2) * 0.9 = 423m, memory = (2.56Gi / 2) * 0.9 = 1.15Gi. It’s maxAllowed cpu & memory will be biggest nodepool’s allocatable. So, 63770m & 224.45Gi respectively.

You can find a list of pre-calculated values in this spreadsheet .

Lastly, for autoscaling, all we need is to specify the name of the nodeTopology in the autoscaler yaml.

apiVersion: autoscaling.kubedb.com/v1alpha1
kind: MongoDBAutoscaler
metadata:
  name: mg-database
  namespace: demo
spec:
  databaseRef:
    name: mg-database
  opsRequestOptions:
    timeout: 10m
    apply: IfReady
  compute:
    replicaSet:
#      podLifeTimeThreshold: 15m
#      resourceDiffPercentage: 50
      trigger: "On"
      minAllowed:  # By considering `n1-standard-2` as your smallest db nodepool
        cpu: "1292m"
        memory: "3.62Gi"
#      minAllowed:  # By considering `n1-standard-1` as your smallest nodepool
#        cpu: "423m"
#        memory: "1.15Gi"
      maxAllowed:
        cpu: "63770m"
        memory: "224.45Gi"
      controlledResources: ["cpu", "memory"]
      containerControlledValues: "RequestsAndLimits"
    nodeTopology:
      name: gke-n1-standard

Now, kubedb-autoscaler operator will decide what is the minimum node-configuration for the scaled (up or down) pods to be scheduled. And it will create the VerticalScale opsRequest specifying the tolerations so that the pods are scheduled on the desired nodepool.

apiVersion: ops.kubedb.com/v1alpha1
kind: MongoDBOpsRequest
metadata:
  name: mops-jghfjd
  namespace: demo
spec:
  type: VerticalScaling
  databaseRef:
    name: mg-database
  verticalScaling:
    replicaSet:
      resources:
        requests:
          memory: "8.78Gi"
          cpu: "2925m"
        limits:
          memory: "12.07Gi"
          cpu: "3920m"
      nodeSelectionPolicy: Taint
      topology:
        key: nodepool_type
        value: n1-standard-4

MySQL Archiver

This feature supports continuous archiving of a MySQL database. You can also do point-in-time recovery (PITR) restoration of the database at any point.

To use this feature, You need KubeStash installed in your cluster. KubeStash (aka Stash 2.0) is a ground up rewrite of Stash with various improvements planned. KubeStash works with any existing KubeDB or Stash license key. To use continuous archiving feature, We have introduced a CRD also in KubeDB side, named MySQLArchiver.

Here is all the details of using MySQLArchiver . In short, you need to create the following resources:

  • BackupStorage which refers a cloud storage backend (like s3, gcs etc.) you prefer.
  • RetentionPolicy allows you to set how long you’d like to retain the backup data.
  • Secret holds restic password which will be used to encrypt the backup snapshots.
  • VolumeSnapshotClass which holds the csi-driver information which is responsible for taking VolumeSnapshots. This is vendor specific.
  • MySQLArchiver which holds all of these metadata information.

NB: All the archiver related yamls are available in this git repository .

apiVersion: archiver.kubedb.com/v1alpha1
kind: MySQLArchiver
metadata:
  name: mysqlarchiver-sample
  namespace: demo
spec:
  pause: false
  databases:
    namespaces:
      from: Selector
      selector:
        matchLabels:
          kubernetes.io/metadata.name: demo
    selector:
      matchLabels:
        archiver: "true"
  retentionPolicy:
    name: mysql-retention-policy
    namespace: demo
  encryptionSecret:
    name: "encrypt-secret"
    namespace: "demo"
  fullBackup:
    driver: "VolumeSnapshotter"
    task:
      params:
        volumeSnapshotClassName: "longhorn-snapshot-vsc"
    scheduler:
      successfulJobsHistoryLimit: 1
      failedJobsHistoryLimit: 1
      schedule: "30 3 * * *"
    sessionHistoryLimit: 2
  manifestBackup:
    scheduler:
      successfulJobsHistoryLimit: 1
      failedJobsHistoryLimit: 1
      schedule: "30 3 * * *"
    sessionHistoryLimit: 2
  backupStorage:
    ref:
      name: "linode-storage"
      namespace: "demo"

Now after creating this archiver CR, if we create a MySQL with archiver: "true" label, in the same namespace (as per the double-optin configured in .spec.databases field), The KubeDB operator will start doing 3 separate things:

  • Create 2 Repository with convention <db-name>-full & <db-name>-manifest.
  • Take full backup in every day at 3:30 (.spec.fullBackup.scheduler) to <db-name>-full repository.
  • Take manifest backup in every day at 3:30 (.spec.manifestBackup.scheduler) to <db-name>-manifest repository.
  • Start syncing mysql wal files to the directory <db-namespace>/<db-name>.

For point-in-time-recovery, all you need is to set the repository names & set a recoveryTimestamp in mysql.spec.init.archiver section.

Here is an example of MySQL CR for point-in-time-recovery.

apiVersion: kubedb.com/v1alpha2
kind: MySQL
metadata:
  name: restore-mysql
  namespace: demo
spec:
  init:
    archiver:
      encryptionSecret:
        name: encrypt-secret
        namespace: demo
      fullDBRepository:
        name: mysql-repository
        namespace: demo
      recoveryTimestamp: "2023-12-28T17:10:54Z"
  version: "8.2.0"
  replicas: 1
  storageType: Durable
  storage:
    storageClassName: "longhorn"
    accessModes:
      - ReadWriteOnce
    resources:
      requests:
        storage: 10Gi
  terminationPolicy: WipeOut

Postgres Archiver

We supports point in time recovery feature for postgres from KubeDB v2023.12.11 for s3 backends. In this release, we have added some other backends support, namely gcs, azure & local nfs.

To use these backends, you have to configure two things BackupStorage & VolumeSnapshotClass.

Example YAMLs for azure:

apiVersion: storage.kubestash.com/v1alpha1
kind: BackupStorage
metadata:
  name: azure-storage
  namespace: demo
spec:
  storage:
    provider: azure
    azure:
      storageAccount: storageAccountName
      container: container
      prefix: pg
      secret: azure-secret  # this secret holds the AZURE_ACCOUNT_KEY info
  usagePolicy:
    allowedNamespaces:
      from: All
  deletionPolicy: WipeOut 
  
---
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
  name: az-vsc
driver: disk.csi.azure.com
deletionPolicy: Delete

NB: All the archiver related yamls are available in this git repository .

Example YAMLs for GCS:

apiVersion: storage.kubestash.com/v1alpha1
kind: BackupStorage
metadata:
  name: gcs-storage
  namespace: demo
spec:
  storage:
    provider: gcs
    gcs:
      bucket: kubestash-qa
      prefix: pg
      secret: gcs-secret # This secret holds the GOOGLE_PROJECT_ID & GOOGLE_SERVICE_ACCOUNT_JSON_KEY info
  usagePolicy:
    allowedNamespaces:
      from: All
  deletionPolicy: WipeOut 

---
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
  name: gke-vsc
driver: pd.csi.storage.gke.io
deletionPolicy: Delete

Example YAMLs for NFS:

apiVersion: storage.kubestash.com/v1alpha1
kind: BackupStorage
metadata:
  name: local-storage
  namespace: demo
spec:
  storage:
    provider: local
    local:
      mountPath: /pg/walg
      nfs:
        server: "use the server address here"
        path: "use the shared path here"
  usagePolicy:
    allowedNamespaces:
      from: All
  default: false
  deletionPolicy: WipeOut
  runtimeSettings:
    pod:
      securityContext:
        fsGroup: 70
        runAsUser: 70

And lastly, you need to specify them in the PostgresArchiver yaml like below :

apiVersion: archiver.kubedb.com/v1alpha1
kind: PostgresArchiver
metadata:
  name: pg-archiver
  namespace: demo
spec:
  pause: false
  retentionPolicy:
    name: postgres-retention-policy
    namespace: demo
  encryptionSecret:
    name: "encrypt-secret"
    namespace: "demo"
  fullBackup:
    jobTemplate:
      spec:
        securityContext:
          runAsUser: 70
          runAsGroup: 70
          fsGroup: 70
    driver: "VolumeSnapshotter"
    task:
      params:
        volumeSnapshotClassName: "longhorn-vsc" # "gke-vsc" # "az-vsc"  # Set accordingly
    scheduler:
      successfulJobsHistoryLimit: 2
      failedJobsHistoryLimit: 2
      schedule: "30 3 * * *"
    sessionHistoryLimit: 3
  backupStorage:
    ref:
      name: "s3-storage"
      namespace: "demo"

We are setting them in the spec.fullBackup.task.params.volumeSnapshotClassName & spec.backupStorage.ref fields.

What Next?

Please try the latest release and give us your valuable feedback.

  • If you want to install KubeDB, please follow the installation instruction from KubeDB Setup .

  • If you want to upgrade KubeDB from a previous version, please follow the upgrade instruction from KubeDB Upgrade .

Support

To speak with us, please leave a message on our website .

To receive product announcements, follow us on Twitter .

To watch tutorials of various Production-Grade Kubernetes Tools Subscribe our YouTube channel.

Learn More about Production-Grade Databases in Kubernetes

If you have found a bug with KubeDB or want to request for new features, please file an issue .


TAGS

Get Up and Running Quickly

Deploy, manage, upgrade Kubernetes on any cloud and automate deployment, scaling, and management of containerized applications.