Infrastructure at your Service

Daniel Westermann

How we build our customized PostgreSQL Docker image

Docker becomes more and more popular these days and a lot of companies start to really use it. At one project we decided to build our own customized Docker image instead of using the official PostgreSQL one. The main reason for that is that we wanted to compile from source so that we only get want is really required. Why having PostgreSQL compiled with tcl support when nobody will ever use that? Here is how we did it …

To dig in right away, this is the simplified Dockerfile:

FROM debian

# make the "en_US.UTF-8" locale so postgres will be utf-8 enabled by default
ENV LANG en_US.utf8
ENV PG_MAJOR 10
ENV PG_VERSION 10.1
ENV PG_SHA256 3ccb4e25fe7a7ea6308dea103cac202963e6b746697366d72ec2900449a5e713
ENV PGDATA /u02/pgdata
ENV PGDATABASE "" \
    PGUSERNAME "" \
    PGPASSWORD ""

COPY docker-entrypoint.sh /

RUN set -ex \
        \
        && apt-get update && apt-get install -y \
           ca-certificates \
           curl \
           procps \
           sysstat \
           libldap2-dev \
           libpython-dev \
           libreadline-dev \
           libssl-dev \
           bison \
           flex \
           libghc-zlib-dev \
           libcrypto++-dev \
           libxml2-dev \
           libxslt1-dev \
           bzip2 \
           make \
           gcc \
           unzip \
           python \
           locales \
        \
        && rm -rf /var/lib/apt/lists/* \
        && localedef -i en_US -c -f UTF-8 en_US.UTF-8 \
        && mkdir /u01/ \
        \
        && groupadd -r postgres --gid=999 \
        && useradd -m -r -g postgres --uid=999 postgres \
        && chown postgres:postgres /u01/ \
        && mkdir -p "$PGDATA" \
        && chown -R postgres:postgres "$PGDATA" \
        && chmod 700 "$PGDATA" \
        \
        && curl -o /home/postgres/postgresql.tar.bz2 "https://ftp.postgresql.org/pub/source/v$PG_VERSION/postgresql-$PG_VERSION.tar.bz2" \
        && echo "$PG_SHA256 /home/postgres/postgresql.tar.bz2" | sha256sum -c - \
        && mkdir -p /home/postgres/src \
        && chown -R postgres:postgres /home/postgres \
        && su postgres -c "tar \
                --extract \
                --file /home/postgres/postgresql.tar.bz2 \
                --directory /home/postgres/src \
                --strip-components 1" \
        && rm /home/postgres/postgresql.tar.bz2 \
        \
        && cd /home/postgres/src \
        && su postgres -c "./configure \
                --enable-integer-datetimes \
                --enable-thread-safety \
                --with-pgport=5432 \
                --prefix=/u01/app/postgres/product/$PG_VERSION \\
                --with-ldap \
                --with-python \
                --with-openssl \
                --with-libxml \
                --with-libxslt" \
        && su postgres -c "make -j 4 all" \
        && su postgres -c "make install" \
        && su postgres -c "make -C contrib install" \
        && rm -rf /home/postgres/src \
        \
        && apt-get update && apt-get purge --auto-remove -y \
           libldap2-dev \
           libpython-dev \
           libreadline-dev \
           libssl-dev \
           libghc-zlib-dev \
           libcrypto++-dev \
           libxml2-dev \
           libxslt1-dev \
           bzip2 \
           gcc \
           make \
           unzip \
        && apt-get install -y libxml2 \
        && rm -rf /var/lib/apt/lists/*

ENV LANG en_US.utf8
USER postgres
EXPOSE 5432
ENTRYPOINT ["/docker-entrypoint.sh"]

We based the image on the latest Debian image, that is line 1. The following lines define the PostgreSQL version we will use and define some environment variables we will user later. What follows is basically installing all the packages required for building PostgreSQL from source, adding the operating system user and group, preparing the directories, fetching the PostgreSQL source code, configure, make and make install. Pretty much straight forward. Finally, to shrink the image, we remove all the packages that are not any more required after PostgreSQL was compiled and installed.

The final setup of the PostgreSQL instance happens in the docker-entrypoint.sh script which is referenced at the very end of the Dockerfile:

#!/bin/bash

# this are the environment variables which need to be set
PGDATA=${PGDATA}/${PG_MAJOR}
PGHOME="/u01/app/postgres/product/${PG_VERSION}"
PGAUTOCONF=${PGDATA}/postgresql.auto.conf
PGHBACONF=${PGDATA}/pg_hba.conf
PGDATABASENAME=${PGDATABASE}
PGUSERNAME=${PGUSERNAME}
PGPASSWD=${PGPASSWORD}

# create the database and the user
_pg_create_database_and_user()
{
    ${PGHOME}/bin/psql -c "create user ${PGUSERNAME} with login password '${PGPASSWD}'" postgres
    ${PGHOME}/bin/psql -c "create database ${PGDATABASENAME} with owner = ${PGUSERNAME}" postgres
}

# start the PostgreSQL instance
_pg_prestart()
{
    ${PGHOME}/bin/pg_ctl -D ${PGDATA} -w start
}

# start postgres and do not disconnect
# required for docker
_pg_start()
{
    ${PGHOME}/bin/postgres "-D" "${PGDATA}"
}

# stop the PostgreSQL instance
_pg_stop()
{
    ${PGHOME}/bin/pg_ctl -D ${PGDATA} stop -m fast
}

# initdb a new cluster
_pg_initdb()
{
    ${PGHOME}/bin/initdb -D ${PGDATA} --data-checksums
}


# adjust the postgresql parameters
_pg_adjust_config() {
    # PostgreSQL parameters
    echo "shared_buffers='128MB'" >> ${PGAUTOCONF}
    echo "effective_cache_size='128MB'" >> ${PGAUTOCONF}
    echo "listen_addresses = '*'" >> ${PGAUTOCONF}
    echo "logging_collector = 'on'" >> ${PGAUTOCONF}
    echo "log_truncate_on_rotation = 'on'" >> ${PGAUTOCONF}
    echo "log_filename = 'postgresql-%a.log'" >> ${PGAUTOCONF}
    echo "log_rotation_age = '1440'" >> ${PGAUTOCONF}
    echo "log_line_prefix = '%m - %l - %p - %h - %u@%d '" >> ${PGAUTOCONF}
    echo "log_directory = 'pg_log'" >> ${PGAUTOCONF}
    echo "log_min_messages = 'WARNING'" >> ${PGAUTOCONF}
    echo "log_autovacuum_min_duration = '60s'" >> ${PGAUTOCONF}
    echo "log_min_error_statement = 'NOTICE'" >> ${PGAUTOCONF}
    echo "log_min_duration_statement = '30s'" >> ${PGAUTOCONF}
    echo "log_checkpoints = 'on'" >> ${PGAUTOCONF}
    echo "log_statement = 'none'" >> ${PGAUTOCONF}
    echo "log_lock_waits = 'on'" >> ${PGAUTOCONF}
    echo "log_temp_files = '0'" >> ${PGAUTOCONF}
    echo "log_timezone = 'Europe/Zurich'" >> ${PGAUTOCONF}
    echo "log_connections=on" >> ${PGAUTOCONF}
    echo "log_disconnections=on" >> ${PGAUTOCONF}
    echo "log_duration=off" >> ${PGAUTOCONF}
    echo "client_min_messages = 'WARNING'" >> ${PGAUTOCONF}
    echo "wal_level = 'replica'" >> ${PGAUTOCONF}
    echo "hot_standby_feedback = 'on'" >> ${PGAUTOCONF}
    echo "max_wal_senders = '10'" >> ${PGAUTOCONF}
    echo "cluster_name = '${PGDATABASENAME}'" >> ${PGAUTOCONF}
    echo "max_replication_slots = '10'" >> ${PGAUTOCONF}
    echo "work_mem=8MB" >> ${PGAUTOCONF}
    echo "maintenance_work_mem=64MB" >> ${PGAUTOCONF}
    echo "wal_compression=on" >> ${PGAUTOCONF}
    echo "max_wal_senders=20" >> ${PGAUTOCONF}
    echo "shared_preload_libraries='pg_stat_statements'" >> ${PGAUTOCONF}
    echo "autovacuum_max_workers=6" >> ${PGAUTOCONF}
    echo "autovacuum_vacuum_scale_factor=0.1" >> ${PGAUTOCONF}
    echo "autovacuum_vacuum_threshold=50" >> ${PGAUTOCONF}
    # Authentication settings in pg_hba.conf
    echo "host    all             all             0.0.0.0/0            md5" >> ${PGHBACONF}
}

# initialize and start a new cluster
_pg_init_and_start()
{
    # initialize a new cluster
    _pg_initdb
    # set params and access permissions
    _pg_adjust_config
    # start the new cluster
    _pg_prestart
    # set username and password
    _pg_create_database_and_user
}

# check if $PGDATA exists
if [ -e ${PGDATA} ]; then
    # when $PGDATA exists we need to check if there are files
    # because when there are files we do not want to initdb
    if [ -e "${PGDATA}/base" ]; then
        # when there is the base directory this
        # probably is a valid PostgreSQL cluster
        # so we just start it
        _pg_prestart
    else
        # when there is no base directory then we
        # should be able to initialize a new cluster
        # and then start it
        _pg_init_and_start
    fi
else
    # initialze and start the new cluster
    _pg_init_and_start
    # create PGDATA
    mkdir -p ${PGDATA}
    # create the log directory
    mkdir -p ${PGDATA}/pg_log
fi
# restart and do not disconnect from the postgres daemon
_pg_stop
_pg_start

The important point here is: PGDATA is a persistent volume that is linked into the Docker container. When the container comes up we need to check if something that looks like a PostgreSQL data directory is already there. If yes, then we just start the instance with what is there. If nothing is there we create a new instance. Remember: This is just a template and you might need to do more checks in your case. The same is true for what we add to pg_hba.conf here: This is nothing you should do on real systems but can be handy for testing.

Hope this helps …

 

Leave a Reply

Daniel Westermann
Daniel Westermann

Senior Consultant and Technology Leader Open Infrastructure