How to backup a Postgres database running inside of Docker to S3

Published: March 13, 2019

One of the first things to setup when you're running your own database is to make sure there are regular backups of the data. The data should be stored somewhere that's easy to access for you, secure, and has little chance of disappearing. Amazon's S3 service fits this bill, and it's super cheap to store data there.

To automate these backups is fairly straightforward, you can do it using a bash script that runs on the nightly cron job, and the aws cli tools.

The process of backing up data looks like this:

  • Dump the database contents to the file inside the running container
  • Copy the dump file outside to the host running the docker daemon
  • Remove the dump file inside the container, as we've got a copy on the host
  • Compress the SQL dump using gzip
  • Upload to the file to an S3 bucket
  • Remove the file locally to cleanup

The script which orchestrates all this looks something like this:

#!/bin/bash

if [ "$1" = "" ] || [ "$2" = "" ]
then
    echo "Usage: $0 <service_name> <database>..."
    echo "Example: $0 yourapp_service_name_postgres dbname"
    exit 1
fi

# https://vaneyckt.io/posts/safer_bash_scripts_with_set_euxo_pipefail/
set -euxo pipefail

export PATH=/usr/local/bin:/usr/local/sbin:/bin:/sbin:/usr/bin:/usr/sbin

service_name=$1
database_name=$2
date=$(date +%Y-%m-%d"_"%H_%M_%S)
backup_filename="${database_name}_${date}.sql"
backup_filename_zipped="${backup_filename}.gz"
s3_location="s3://my-s3-bucket-name/database/"

docker_bin=$(which docker)
aws_bin=$(which aws)

container_id=$(docker ps | grep $service_name | awk '{print $1}')

# create the backup
$docker_bin exec $container_id pg_dump -U postgres -f /tmp/$backup_filename $database_name

# copy file inside contaienr to host
$docker_bin cp $container_id:/tmp/$backup_filename .

# remove file in container
$docker_bin exec $container_id rm /tmp/$backup_filename

# compress
gzip $backup_filename

# upload to s3
$aws_bin s3 cp $backup_filename_zipped $s3_location

rm $backup_filename_zipped

echo "Done."

This script assumes a few things:

  • You're running the docker container as part of the Docker swarm service. If this is not the case, you can change the container_id variable so that it's the ID of your postgres docker container
  • The database is owned by the postgres user and has no password. If this is not the case, you can update the user/password that is passed to pg_dump
  • The AWS cli tools are installed on the host machine, with the appropriate credentials to access the S3 bucket that you're backing up to

Once you've got the script working for your environment, install a cron job that will run it nightly, something like:

0 0 * * * (/home/user/backup_postgres.sh swarm_service_postgres dbname > /home/user/cron.log 2>&1)