has_many :codes

Postgres on Kubernetes with the Zalando operator

Published  

When I need a database I usually reach out for MySQL, since I have been using it for a while and I am most familiar with it. However I am planning to switch to Postgres for an app, so I was looking for options to run Postgres in Kubernetes. I found three candidates: the Zalando operator, the CrunchyData operator and KubeDB. KubeDB is a cool operator that support various databases including MySQL, Postgres, Redis and more. I tried it and I really liked the features, however there are many threads on Github about data loss and other serious issues, so I am not going to use that for now. Admittedly I couldn’t figure out how to set up and use the CrunchyData operator… maybe it’s just me but I don’t find the documentation straightforward in this regard.

So after quite a bit of testing I think I’m going to use the Zalando operator, which AFAIK has been used in production by Zalando for a couple of years already with 100s of clusters and databases. So in theory it should be reliable. While testing I found some issues though because I am using Scaleway object storage for backups instead of AWS S3 (Scaleway is S3 compatible), and I had to figure out some workarounds by looking at the code because the documentation is lacking with regards to some configuration details. A Reddit user helped with some of this.

Installation

The installation is quite straightforward with the Helm chart in the Github repo. So you first need to clone the repo, and checkout the latest version/tag (currently v1.3.1). Then edit the values file ./charts/postgres-operator/values-crd.yaml at least as follows:

configKubernetes:
  enable_pod_antiaffinity: true
  pod_environment_configmap: "postgres-pod-config"
  watched_namespace: "*"

configAwsOrGcp:
  aws_region: "<s3 region>"
  aws_endpoint: "<s3 endpoint url>"
  wal_s3_bucket: "<s3 bucket>"

configLogicalBackup:
  logical_backup_s3_access_key_id: "<s3 access key id>"
  logical_backup_s3_bucket: "<s3 bucket>"
  logical_backup_s3_endpoint: "<s3 endpoint url>"
  logical_backup_s3_secret_access_key: "<s3 secret access key>"
  logical_backup_s3_sse: "<server side encryption method, e.g. AES256; I leave this blank since it's not supported by Scaleway>"
  logical_backup_schedule: "<backup schedule in cron format>"

pod_environment_configmap specifies the name of a config map where we define some settings for the cluster to enable WAL archivation to object storage, and cloning/point in time recovery. watched_namespace is set to an empty string by default, which means that the operator will only watch its namespace for the custom resources required to create the clusters. Setting that to * makes it possible to create clusters in different namespaces, which is likely what you want. I am not 100% sure whether the configAwsOrGcp section is actually needed when using s3 compatible storage. The logical backups settings should be self explanatory. Please note that there currently is a missing setting for the region, so I have submitted a PR to add the setting logical_backup_s3_region since this is required with some s3 compatible services like Scaleway, otherwise the upload of logical backups fails. Hopefully it will be merged soon. In the meantime I have build a custom image for the logical backups using the relevant Dockerfile in the repo, and adding the --region parameter to the aws CLI command in dump.sh.

Once the configuration is ready, we can install the operator with Helm:

kubectl create ns postgres-operator

helm upgrade --install \
  --namespace postgres-operator \
  -f ./charts/postgres-operator/values-crd.yaml \
  postgres-operator \
  ./charts/postgres-operator 

I am assuming Helm 3 here so you need to create the namespace first (I use postgres-operator).

PgAdmin

I like to install PgAdmin as well so I can have a handy web interface for my clusters:

helm upgrade --install \
  --namespace postgres-operator \
  --set env.email=<your email used as username for login> \
  --set env.password=<password> \
  --set persistence.enabled=true \
  --set persistence.size=1Gi \
  --set service.type=ClusterIP \
  --set ingress.enabled=false \
    pgadmin \
  stable/pgadmin

You can optionally enable an ingress, but I prefer not to expose PgAdmin to the web and open it with a port-forward instead:

kubectl -n postgres-operator port-forward svc/pgadmin <some local port>:80

Creating a cluster

To create a cluster, first create a namespace, e.g. postgres-cluster. Then you need to create a config map which will be used by the database pods. As I mentioned earlier, this config map includes the information required for both archiving of WAL to S3, and clone/point-in-time recovery from S3 using the WAL archives. To create the config map run

kubectl create ns postgres-cluster

kubectl apply -f - <<EOF
apiVersion: v1 
kind: ConfigMap 
metadata: 
  name: postgres-pod-config # The name must be same as configured in the values file earlier
  namespace: postgres-cluster
data: 
  BACKUP_SCHEDULE: "0 */12 * * *" # Schedule a base backup every 12 hours; you can customise as you wish
  USE_WALG_BACKUP: "true" # Use the golang backup tool (faster) 
  BACKUP_NUM_TO_RETAIN: "14" # For 2 backups per day, keep 7 days of base backups
  AWS_ACCESS_KEY_ID: <s3 access key>
  AWS_SECRET_ACCESS_KEY: <s3 secret key>
  AWS_ENDPOINT: <s3 endpoint url>
  AWS_REGION: <s3 region>
  WALG_DISABLE_S3_SSE: "true" # I disable this because it's not supported by Scaleway
EOF

Next, edit manifests/complete-postgres-manifest.yaml as follows:

metadata:
  name: postgres-cluster
  namespace: postgres-cluster

spec:
  teamId: "postgres"
  volume:
    size: <volume size>
    storageClass: <storage class>
  numberOfInstances: 2
  enableShmVolume: true
  postgresql:
    parameters:
      max_connections: "250"
  enableLogicalBackup: true
  logicalBackupSchedule: "00 05 * * *"

The teamId must be set to the first word of the name of the cluster if I remember/understood correctly. I specify at least 2 instances so to have a replica for HA. By default the cluster allows for 10 connections, so I bump this since I usually need more than that. I think you can use fewer connections using something like pgbouncer, but I’m not familiar with it and I am not sure if I would actually need it since I can just increase the number of connections in the cluster directly. Of course I enable the logical backups as well; I would likely use the point in time recovery if needed but it’s nice to have daily dumps too. In the manifest comment out the section about init containers because you will likely not need it. I also comment out the sections about users and databases since I create them once I have access to the cluster, but you can also let the operator create these if you prefer.

Finally, to create the cluster apply the manifest:

kubectl apply -f manifests/complete-postgres-manifest.yaml

This will usually take a few minutes. Once ready, to access the cluster with a client or from an app you can retrieve the password for the postgres user with the following command:

kubectl -n postgres-cluster get secret postgres.postgres-cluster.credentials.postgresql.acid.zalan.do -o 'jsonpath={.data.password}' | base64 -d

Of course change the namespace if needed. That’s it. You now have a working Postgres cluster. You can create as many as you wish and the operator will take care of managing them.

Logical backups

Together with the cluster, the operator will create a cron job in the same namespace for logical backups/dumps, if you have enabled them. You can trigger a logical backup manually at any time with this command:

kubectl -n postgres-cluster create job backup-name --from=cronjob/logical-backup-postgres-cluster

You should see that a path like /spilo///logical-backups has been created in the s3 bucket and it includes a compressed dump.

Restoring a logical backup

As far as I know, the operator doesn’t support restoring from logical backups directly in an automated way, yet. But restoring a logical backup is pretty straightforward anyway. What we need to do is create a new cluster first - unless we want to restore the backup in an existing cluster, deploy a temporary container with e.g. the Ubuntu image, download and import the desired dump in the cluster (download and import are done in one step with streaming to avoid the need to download the dump first). If you are restoring to an existing cluster you should first delete the databases you are restoring. Perhaps it’s cleaner to create a new cluster first. To download/stream the backup we can use s3cmd, a simple s3 client that we can install with pip/python. So the whole process is done with the following commands:

apt update
apt install -y python3-pip postgresql-client

pip3 install s3cmd

cat > ~/.pgpass <<EOF
postgres-cluster:5432:postgres:postgres:<password for the postgres user in the target cluster>
EOF

chmod 0600 /root/.pgpass

DUMP_URL=s3://..... (URL of the .sql.gz dump)

/usr/local/bin/s3cmd \
  --no-check-certificate \
  --access_key=<s3 access key> \
  --secret_key=<s3 secret key> \
  --host=<s3 endpoint url> \
  --host-bucket="<s3 host bucket - see s3cmd reference for your object storage service>" \
  get --no-progress ${DUMP_URL} - | gunzip | psql -U postgres -h postgres-cluster

The hostname for the import with psql must be the name of the target cluster. This process will take some time depending on the size of the backup of course.

Point in time recovery with WAL

Logical backups are handy to have, but even nicer is the ability to restore the Postgres cluster at (almost) any point in time within the X days as per the configuration in the config map we created earlier, using the WAL archive; if I am not mistaken, by default the WAL archive is updated in the bucket every 16MB of changes or something like that. That’s why you can revocer to “almost” any point in time. I am sure this is customisable but I haven’t looked into it. 

For point in time recovery, you need to create a cluster almost the same way as we created a new cluster earlier. The only difference is that we need to add a section to the config map which specifies where the WAL archive is located in s3, and the date/time we want to restore. So, like we did for a brand new cluster, you need to create the namespace first and then the config map as follows:

kubectl create ns postgres-cluster

kubectl apply -f - <<EOF
apiVersion: v1 
kind: ConfigMap 
metadata: 
  name: postgres-pod-config # The name must be same as configured in the values file earlier
  namespace: postgres-cluster
data: 
  BACKUP_SCHEDULE: "0 */12 * * *" # Schedule a base backup every 12 hours; you can customise as you wish
  USE_WALG_BACKUP: "true" # Use the golang backup tool (faster) 
  BACKUP_NUM_TO_RETAIN: "14" # For 2 backups per day, keep 7 days of base backups
  AWS_ACCESS_KEY_ID: <s3 access key>
  AWS_SECRET_ACCESS_KEY: <s3 secret key>
  AWS_ENDPOINT: <s3 endpoint url>
  AWS_REGION: <s3 region>
  WALG_DISABLE_S3_SSE: "true" # I disable this because it's not supported by Scaleway


  # Optional for point in time recovery/restore
  USEWALG_RESTORE: "true" 
  CLONE_METHOD: CLONE_WITH_WALE
  CLONE_AWS_ACCESS_KEY_ID: <s3 access key> 
  CLONE_AWS_SECRET_ACCESS_KEY: <s3 secret key>
  CLONE_AWS_ENDPOINT: <s3 endpoint url>
  CLONE_AWS_REGION: <s3 region>
  CLONE_WAL_S3_BUCKET: <s3 bucket>
  CLONE_WAL_BUCKET_SCOPE_SUFFIX: "/<id of the original cluster>"
  CLONE_TARGET_TIME: "2020-02-04T12:49:03+00:00" # time in the WAL archive that we want to restore
  CLONE_SCOPE: postgres-cluster # name of the original cluster
EOF

Pay attention to the CLONE_WAL_BUCKET_SCOPE_SUFFIX setting, which should be a string including the ID of the original cluster prefixed with a “/”. Finally, to restore the cluster just edit and apply the manifest like we did for a brand new cluster.

Conclusion

I have only occasionally used Postgres in the past, but I have never looked into it much. So far I like what I see with this operator by Zalando. Creating a cluster and performing backups and restores isn’t complicated and this setup should be pretty reliable considering that Zalando has been using this in production for a while now with lots of clusters. There’s also a nice UI you can optionally install to create clusters etc if you don’t want to use manifests manually. Hope it was useful!
© Vito Botta