One of the first things to setup when you're running your own database is to make sure there are regular backups of the data. The data should be stored somewhere that's easy to access for you, secure, and has little chance of disappearing. Amazon's S3 service fits this bill, and it's super cheap to store data there.
To automate these backups is fairly straightforward, you can do it using a bash script that runs on the nightly cron job, and the aws cli tools.
The process of backing up data looks like this:
- Dump the database contents to the file inside the running container
- Copy the dump file outside to the host running the docker daemon
- Remove the dump file inside the container, as we've got a copy on the host
- Compress the SQL dump using gzip
- Upload to the file to an S3 bucket
- Remove the file locally to cleanup
The script which orchestrates all this looks something like this:
#!/bin/bash
if [ "$1" = "" ] || [ "$2" = "" ]
then
    echo "Usage: $0 <service_name> <database>..."
    echo "Example: $0 yourapp_service_name_postgres dbname"
    exit 1
fi
# https://vaneyckt.io/posts/safer_bash_scripts_with_set_euxo_pipefail/
set -euxo pipefail
export PATH=/usr/local/bin:/usr/local/sbin:/bin:/sbin:/usr/bin:/usr/sbin
service_name=$1
database_name=$2
date=$(date +%Y-%m-%d"_"%H_%M_%S)
backup_filename="${database_name}_${date}.sql"
backup_filename_zipped="${backup_filename}.gz"
s3_location="s3://my-s3-bucket-name/database/"
docker_bin=$(which docker)
aws_bin=$(which aws)
container_id=$(docker ps | grep $service_name | awk '{print $1}')
# create the backup
$docker_bin exec $container_id pg_dump -U postgres -f /tmp/$backup_filename $database_name
# copy file inside contaienr to host
$docker_bin cp $container_id:/tmp/$backup_filename .
# remove file in container
$docker_bin exec $container_id rm /tmp/$backup_filename
# compress
gzip $backup_filename
# upload to s3
$aws_bin s3 cp $backup_filename_zipped $s3_location
rm $backup_filename_zipped
echo "Done."This script assumes a few things:
- You're running the docker container as part of the Docker swarm service. If this is not the case, you can change the container_idvariable so that it's the ID of your postgres docker container
- The database is owned by the postgresuser and has no password. If this is not the case, you can update the user/password that is passed topg_dump
- The AWS cli tools are installed on the host machine, with the appropriate credentials to access the S3 bucket that you're backing up to
Once you've got the script working for your environment, install a cron job that will run it nightly, something like:
0 0 * * * (/home/user/backup_postgres.sh swarm_service_postgres dbname > /home/user/cron.log 2>&1)