Docker becomes more and more popular these days and a lot of companies start to really use it. At one project we decided to build our own customized Docker image instead of using the official PostgreSQL one. The main reason for that is that we wanted to compile from source so that we only get want is really required. Why having PostgreSQL compiled with tcl support when nobody will ever use that? Here is how we did it …
To dig in right away, this is the simplified Dockerfile:
FROM debian # make the "en_US.UTF-8" locale so postgres will be utf-8 enabled by default ENV LANG en_US.utf8 ENV PG_MAJOR 10 ENV PG_VERSION 10.1 ENV PG_SHA256 3ccb4e25fe7a7ea6308dea103cac202963e6b746697366d72ec2900449a5e713 ENV PGDATA /u02/pgdata ENV PGDATABASE "" PGUSERNAME "" PGPASSWORD "" COPY docker-entrypoint.sh / RUN set -ex && apt-get update && apt-get install -y ca-certificates curl procps sysstat libldap2-dev libpython-dev libreadline-dev libssl-dev bison flex libghc-zlib-dev libcrypto++-dev libxml2-dev libxslt1-dev bzip2 make gcc unzip python locales && rm -rf /var/lib/apt/lists/* && localedef -i en_US -c -f UTF-8 en_US.UTF-8 && mkdir /u01/ && groupadd -r postgres --gid=999 && useradd -m -r -g postgres --uid=999 postgres && chown postgres:postgres /u01/ && mkdir -p "$PGDATA" && chown -R postgres:postgres "$PGDATA" && chmod 700 "$PGDATA" && curl -o /home/postgres/postgresql.tar.bz2 "https://ftp.postgresql.org/pub/source/v$PG_VERSION/postgresql-$PG_VERSION.tar.bz2" && echo "$PG_SHA256 /home/postgres/postgresql.tar.bz2" | sha256sum -c - && mkdir -p /home/postgres/src && chown -R postgres:postgres /home/postgres && su postgres -c "tar --extract --file /home/postgres/postgresql.tar.bz2 --directory /home/postgres/src --strip-components 1" && rm /home/postgres/postgresql.tar.bz2 && cd /home/postgres/src && su postgres -c "./configure --enable-integer-datetimes --enable-thread-safety --with-pgport=5432 --prefix=/u01/app/postgres/product/$PG_VERSION \ --with-ldap --with-python --with-openssl --with-libxml --with-libxslt" && su postgres -c "make -j 4 all" && su postgres -c "make install" && su postgres -c "make -C contrib install" && rm -rf /home/postgres/src && apt-get update && apt-get purge --auto-remove -y libldap2-dev libpython-dev libreadline-dev libssl-dev libghc-zlib-dev libcrypto++-dev libxml2-dev libxslt1-dev bzip2 gcc make unzip && apt-get install -y libxml2 && rm -rf /var/lib/apt/lists/* ENV LANG en_US.utf8 USER postgres EXPOSE 5432 ENTRYPOINT ["/docker-entrypoint.sh"]
We based the image on the latest Debian image, that is line 1. The following lines define the PostgreSQL version we will use and define some environment variables we will user later. What follows is basically installing all the packages required for building PostgreSQL from source, adding the operating system user and group, preparing the directories, fetching the PostgreSQL source code, configure, make and make install. Pretty much straight forward. Finally, to shrink the image, we remove all the packages that are not any more required after PostgreSQL was compiled and installed.
The final setup of the PostgreSQL instance happens in the docker-entrypoint.sh script which is referenced at the very end of the Dockerfile:
#!/bin/bash # this are the environment variables which need to be set PGDATA=${PGDATA}/${PG_MAJOR} PGHOME="/u01/app/postgres/product/${PG_VERSION}" PGAUTOCONF=${PGDATA}/postgresql.auto.conf PGHBACONF=${PGDATA}/pg_hba.conf PGDATABASENAME=${PGDATABASE} PGUSERNAME=${PGUSERNAME} PGPASSWD=${PGPASSWORD} # create the database and the user _pg_create_database_and_user() { ${PGHOME}/bin/psql -c "create user ${PGUSERNAME} with login password '${PGPASSWD}'" postgres ${PGHOME}/bin/psql -c "create database ${PGDATABASENAME} with owner = ${PGUSERNAME}" postgres } # start the PostgreSQL instance _pg_prestart() { ${PGHOME}/bin/pg_ctl -D ${PGDATA} -w start } # start postgres and do not disconnect # required for docker _pg_start() { ${PGHOME}/bin/postgres "-D" "${PGDATA}" } # stop the PostgreSQL instance _pg_stop() { ${PGHOME}/bin/pg_ctl -D ${PGDATA} stop -m fast } # initdb a new cluster _pg_initdb() { ${PGHOME}/bin/initdb -D ${PGDATA} --data-checksums } # adjust the postgresql parameters _pg_adjust_config() { # PostgreSQL parameters echo "shared_buffers='128MB'" >> ${PGAUTOCONF} echo "effective_cache_size='128MB'" >> ${PGAUTOCONF} echo "listen_addresses = '*'" >> ${PGAUTOCONF} echo "logging_collector = 'on'" >> ${PGAUTOCONF} echo "log_truncate_on_rotation = 'on'" >> ${PGAUTOCONF} echo "log_filename = 'postgresql-%a.log'" >> ${PGAUTOCONF} echo "log_rotation_age = '1440'" >> ${PGAUTOCONF} echo "log_line_prefix = '%m - %l - %p - %h - %u@%d '" >> ${PGAUTOCONF} echo "log_directory = 'pg_log'" >> ${PGAUTOCONF} echo "log_min_messages = 'WARNING'" >> ${PGAUTOCONF} echo "log_autovacuum_min_duration = '60s'" >> ${PGAUTOCONF} echo "log_min_error_statement = 'NOTICE'" >> ${PGAUTOCONF} echo "log_min_duration_statement = '30s'" >> ${PGAUTOCONF} echo "log_checkpoints = 'on'" >> ${PGAUTOCONF} echo "log_statement = 'none'" >> ${PGAUTOCONF} echo "log_lock_waits = 'on'" >> ${PGAUTOCONF} echo "log_temp_files = '0'" >> ${PGAUTOCONF} echo "log_timezone = 'Europe/Zurich'" >> ${PGAUTOCONF} echo "log_connections=on" >> ${PGAUTOCONF} echo "log_disconnections=on" >> ${PGAUTOCONF} echo "log_duration=off" >> ${PGAUTOCONF} echo "client_min_messages = 'WARNING'" >> ${PGAUTOCONF} echo "wal_level = 'replica'" >> ${PGAUTOCONF} echo "hot_standby_feedback = 'on'" >> ${PGAUTOCONF} echo "max_wal_senders = '10'" >> ${PGAUTOCONF} echo "cluster_name = '${PGDATABASENAME}'" >> ${PGAUTOCONF} echo "max_replication_slots = '10'" >> ${PGAUTOCONF} echo "work_mem=8MB" >> ${PGAUTOCONF} echo "maintenance_work_mem=64MB" >> ${PGAUTOCONF} echo "wal_compression=on" >> ${PGAUTOCONF} echo "max_wal_senders=20" >> ${PGAUTOCONF} echo "shared_preload_libraries='pg_stat_statements'" >> ${PGAUTOCONF} echo "autovacuum_max_workers=6" >> ${PGAUTOCONF} echo "autovacuum_vacuum_scale_factor=0.1" >> ${PGAUTOCONF} echo "autovacuum_vacuum_threshold=50" >> ${PGAUTOCONF} # Authentication settings in pg_hba.conf echo "host all all 0.0.0.0/0 md5" >> ${PGHBACONF} } # initialize and start a new cluster _pg_init_and_start() { # initialize a new cluster _pg_initdb # set params and access permissions _pg_adjust_config # start the new cluster _pg_prestart # set username and password _pg_create_database_and_user } # check if $PGDATA exists if [ -e ${PGDATA} ]; then # when $PGDATA exists we need to check if there are files # because when there are files we do not want to initdb if [ -e "${PGDATA}/base" ]; then # when there is the base directory this # probably is a valid PostgreSQL cluster # so we just start it _pg_prestart else # when there is no base directory then we # should be able to initialize a new cluster # and then start it _pg_init_and_start fi else # initialze and start the new cluster _pg_init_and_start # create PGDATA mkdir -p ${PGDATA} # create the log directory mkdir -p ${PGDATA}/pg_log fi # restart and do not disconnect from the postgres daemon _pg_stop _pg_start
The important point here is: PGDATA is a persistent volume that is linked into the Docker container. When the container comes up we need to check if something that looks like a PostgreSQL data directory is already there. If yes, then we just start the instance with what is there. If nothing is there we create a new instance. Remember: This is just a template and you might need to do more checks in your case. The same is true for what we add to pg_hba.conf here: This is nothing you should do on real systems but can be handy for testing.
Hope this helps …