Almost every PostgreSQL I get in touch with is not configured to use huge pages, which is quite a surprise as it can give you a performance boost. Actually it is not the PostgreSQL instance you need to configure but the operating system to provide that. PostgreSQL will use huge pages by default when they are configured and will fall back to normal pages otherwise. The parameter which controls that in PostgreSQL is huge_pages which defaults to “try” leading to the behavior just described: Try to get them, otherwise use normal pages. Lets see how you can do that on RedHat and CentOS. I’ll write another post about how you do that for Debian based distributions shortly.
What you need to know is that RedHat as well as CentOS come with tuned profiles by default. This means kernel parameters and other settings are managed through profiles dynamically and not anymore by adjusting /etc/sysctl (although that works as well). When you are in virtualized environment (VirtualBox in my case) you probably will see something like this:
[email protected]:/home/postgres/ [PG10] tuned-adm active Current active profile: virtual-guest
Virtual guest is maybe not the best solution for database server as it comes with those settings (especially vm.dirty_ratio and vm.swappiness):
[email protected]:/home/postgres/ [PG10] cat /usr/lib/tuned/virtual-guest/tuned.conf | egrep -v "^$|^#" [main] summary=Optimize for running inside a virtual guest include=throughput-performance [sysctl] vm.dirty_ratio = 30 vm.swappiness = 30
What we do at dbi services is to provide our own profile which adjusts the settings better suited for a database server.
[email protected]:/home/postgres/ [PG10] cat /etc/tuned/dbi-postgres/tuned.conf | egrep -v "^$|^#" [main] summary=dbi services tuned profile for PostgreSQL servers [cpu] governor=performance energy_perf_bias=performance min_perf_pct=100 [disk] readahead=>4096 [sysctl] vm.overcommit_memory=2 vm.swappiness=0 vm.dirty_ratio=2 vm.dirty_background_ratio=1
What has all this to do with larges pages you might think. Well, tuning profiles can also be used to configure them and for us this is the preferred method because we can do it all in one file. But we before we do that lets look at the PostgreSQL instance:
postgres=# select version(); version ---------------------------------------------------------------------------------------------------------------------------- PostgreSQL 10.0 build on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16), 64-bit (1 row) postgres=# show huge_pages; huge_pages ------------ try (1 row)
As said at the beginning of this post the default behavior of PostgreSQL is to use them if available. The question now is: How can you check if you have huge pages configured on the operating system level? The answer is in the virtual /proc/meminfo file:
postgres=# \! cat /proc/meminfo | grep -i huge AnonHugePages: 6144 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB
Alle “HugePages” statistics report a zero so this system definitely is not configured to provide huge pages to PostgreSQL. AnonHugePages is for Transparent Hugepage and it is common recommendation to disable them for database servers. So we have two tasks to complete:
- Disable transparent huge pages
- Configure the system to provide enough huge pages for our PostgreSQL instance
For disabling transparent huge pages we just need to add the following lines to our tuning profile:
[email protected]:/home/postgres/ [PG10] sudo echo "[vm] > transparent_hugepages=never" >> /etc/tuned/dbi-postgres/tuned.conf
When transparent huge pages are enabled you can see that in the following file:
[email protected]:/home/postgres/ [PG10] cat /sys/kernel/mm/transparent_hugepage/enabled [always] madvise never
Once we switch the profile to our own profile:
[email protected]:/home/postgres/ [PG10] sudo tuned-adm profile dbi-postgres [email protected]:/home/postgres/ [PG10] sudo tuned-adm active Current active profile: dbi-postgres
… you’ll notice that it is disabled from now on:
[email protected]:/home/postgres/ [PG10] cat /sys/kernel/mm/transparent_hugepage/enabled always madvise [never]
Task one completed. For configuring the operating system to provide huge pages for our PostgreSQL we need to know how many huge pages we require. How do we do that? The procedure is documented in the PostgreSQL documentation. Basically you start your instance and then check how many you would require. In my case, to get the PID of the postmaster process:
[email protected]:/home/postgres/ [PG10] head -1 $PGDATA/postmaster.pid 1640
To get the VmPeak for that process:
[email protected]:/home/postgres/ [PG10] grep ^VmPeak /proc/1640/status VmPeak: 344340 kB
As the huge page size is 2MB on my system (which should be default for most systems):
[email protected]:/home/postgres/ [PG10] grep ^Hugepagesize /proc/meminfo Hugepagesize: 2048 kB
… we will require at least 344340/2048 huge pages for this PostgreSQL instance:
[email protected]:/home/postgres/ [PG10] echo "344340/2048" | bc 168
All we need to do is to add this to our tuning profile in the “[sysctl]” section:
[email protected]:/home/postgres/ [PG10] grep nr_hugepages /etc/tuned/dbi-postgres/tuned.conf vm.nr_hugepages=170
Re-set the profile and we’re done:
[email protected]:/home/postgres/ [PG10] sudo tuned-adm profile dbi-postgres [email protected]:/home/postgres/ [PG10] cat /proc/meminfo | grep -i huge AnonHugePages: 4096 kB HugePages_Total: 170 HugePages_Free: 170 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB
This confirms that we now have 170 huge pages of which all of them are free to consume. Now lets configure PostgreSQL to only start when it can get the amount of huge pages required by switching the “huge_pages” parameter to “on” and restart the instance:
[email protected]:/home/postgres/ [PG10] psql -c "alter system set huge_pages=on" postgres ALTER SYSTEM Time: 0.719 ms [email protected]:/home/postgres/ [PG10] pg_ctl -D $PGDATA restart -m fast waiting for server to shut down.... done server stopped waiting for server to start....2018-02-25 11:21:29.107 CET - 1 - 3170 - - @ LOG: listening on IPv4 address "0.0.0.0", port 5441 2018-02-25 11:21:29.107 CET - 2 - 3170 - - @ LOG: listening on IPv6 address "::", port 5441 2018-02-25 11:21:29.110 CET - 3 - 3170 - - @ LOG: listening on Unix socket "/tmp/.s.PGSQL.5441" 2018-02-25 11:21:29.118 CET - 4 - 3170 - - @ LOG: redirecting log output to logging collector process 2018-02-25 11:21:29.118 CET - 5 - 3170 - - @ HINT: Future log output will appear in directory "pg_log". done server started
As the instance started all should be fine and we can confirm that by looking at the statistics in /proc/meminfo:
[email protected]:/home/postgres/ [PG10] cat /proc/meminfo | grep -i huge AnonHugePages: 4096 kB HugePages_Total: 170 HugePages_Free: 162 HugePages_Rsvd: 64 HugePages_Surp: 0 Hugepagesize: 2048 kB
You might be surprised that not all (actually only 8) huge pages are used right now but this will change as soon as you put some load on the system:
postgres=# create table t1 as select * from generate_series(1,1000000); SELECT 1000000 postgres=# select count(*) from t1; count --------- 1000000 (1 row) postgres=# \! cat /proc/meminfo | grep -i huge AnonHugePages: 4096 kB HugePages_Total: 170 HugePages_Free: 153 HugePages_Rsvd: 55 HugePages_Surp: 0 Hugepagesize: 2048 kB postgres=#
Hope this helps …
Hi Daniel,
concerning transparent hugepages (THP): The indicator that THP are disabled is that AnonHugePages is 0 in /proc/meminfo. I tried that on Redhat Enterprise Linux 7.4:
[[email protected] dbi-postgres]# more /etc/redhat-release
Red Hat Enterprise Linux Server release 7.4 (Maipo)
[[email protected] dbi-postgres]# uname -r
3.10.0-693.17.1.el7.x86_64
When configuring the tunable like you did above with
[vm]
transparent_hugepages=never
in /usr/lib/tuned/dbi-postgres/tuned.conf and enabled the tunable (and rebooted) I do get this:
[[email protected] dbi-postgres]# tuned-adm active
Current active profile: dbi-postgres
[[email protected] dbi-postgres]# cat /sys/kernel/mm/transparent_hugepage/enabled
always madvise [never]
[[email protected] dbi-postgres]# grep AnonHuge /proc/meminfo
AnonHugePages: 6144 kB
So THP are not fully disabled. To really disable it I had to do the following:
1.) Modify the tunable and add
[bootloader]
cmdline = “transparent_hugepage=never”
2.) Adjust grub according my tunable-change
# grub2-mkconfig -o /boot/grub2/grub.cfg
3.) Reboot
Afterwards everything looks correct:
[[email protected] ~]# cat /sys/kernel/mm/transparent_hugepage/enabled
always madvise [never]
[[email protected] ~]# grep AnonHuge /proc/meminfo
AnonHugePages: 0 kB
Regards
Clemens
Hi Clemens,
thanks for pointing that out.
Cheers,
Daniel
Hi. Daniel,
I’m wondering if we need to set it to ‘on’, when it is ‘try’? Would it try to use it once it’s configured.
I just want to avoid shutdown DB.
Thanks,
Betty
Hi Betty,
try will use it if available but only when PostgreSQL is starting up. So you would need to restart anyway.
‘on’ will force to use it and PostgreSQL will not start if it is not available.
Cheers,
Daniel
Calculation number of HugePages with regards to VmPeak is incorrect. VmPeak is relative to particular process, not the absolute system possible virtual memory. You take total system RAM in MB and divide by 2 MB, then take ~70% of that (depends on situation how much you want to leave to system) and get number of Huge Pages you allow. You can add vm.nr_overcommit_hugepages to have < 30% if you need. In PgSQL context configure swap and swapiness=1 also. THB!=HB and should be set to "never" as correctly indicated in post.
Hi Andrius,
the calculation was based on the PostgreSQL documentation: https://www.postgresql.org/docs/10/kernel-resources.html
But apparently the documentation changes starting with version 11:
https://www.postgresql.org/docs/11/kernel-resources.html
No it is based on pmap.
Anyway, thanks for you input.
Cheers,
Daniel
This is very helpful thanks. Could you include a final ‘printout’ of /etc/tuned/dbi-postgres/tuned.conf ?
Hi David, you’re welcome. Here is an example:
cat templates/tuned/dbi-postgres/tuned.conf
#
# dbi services tuned profile for PostgreSQL servers
#
[main]
summary=dbi services tuned profile for PostgreSQL servers
include=throughput-performance
[bootloader]
cmdline = “transparent_hugepage=never”
[cpu]
governor=performance
energy_perf_bias=performance
min_perf_pct=100
[disk]
readahead=>4096
[sysctl]
kernel.sched_min_granularity_ns = 10000000
kernel.sched_wakeup_granularity_ns = 15000000
# this one is for pgpool
## http://www.pgpool.net/docs/latest/en/html/runtime-config-connection.html => num_init_children
net.core.somaxconn=256
vm.overcommit_memory=2
vm.overcommit_ratio=75
vm.swappiness=1
vm.dirty_ratio=2
vm.dirty_background_ratio=1
#vm.nr_hugepages=1200
[vm]
transparent_hugepages=never