Why does the sysctl.conf value for swappiness on Oracle Linux 7.x not survive a reboot
For a project I applied the dbi services best practices for Oracle databases. One of these is to adjust the swappiness parameter. We recommend to use a very low for the swappiness value like 10 or even lower to reduce the risk for Linux to begin swapping! Swapping on a database server is not a problem per se, but it generates activity on disk, which can negatively impact the performance of a database.
At the end of this project I did the handover to our service desk. The service desk has a lot of things to validate for every installation and developed some scripts to check the dbi services best practices against a new system before it gets under contract and/or monitoring. One of this scripts is detected that the swappiness value on the system was set to the value of 30. After a few hours of investigation we identified the issue. More about this focused on swappiness later on this blog.
In previous versions of Linux we apply or modify parameters in the /etc/sysctl.conf file to tune the Linux kernel, network, disk etc. One of these is vm.swappiness.
[root]# grep -A1 "^# dbi" /etc/sysctl.conf # dbi services reduces the possibility for swapping, 0 = disable, 10 = reduce the paging possibility to 10% vm.swappiness = 0
To activate this setting we use then:
[root]# sysctl -p | grep "^vm" vm.swappiness = 0
To control this setting we can request the current value:
[root]# cat /proc/sys/vm/swappiness 0
After a reboot of the system when we check the value again:
[root]# cat /proc/sys/vm/swappiness 30
What a surprise! My value did not survive a reboot.
Why is the default value of 30 applied?
There are some important changes when it comes to setting the system values for kernel,network,disk etc. in the recent versions of Red Hat Linux 7.
- In a minimal installation, since version 7, per default the tuned.service is enabled.
- The tuned.service applies some values after the load of the sysctl.conf values.
- The default tuned profile that gets applied is network-throughput on a physical machine and virtual-guest on a virtual host
Once we got known to this facts we looked at the values which are set by default.
The tuned.service profiles are located under /usr/lib/tuned/
[root]# ls -als /usr/lib/tuned/ total 36 4 drwxr-xr-x. 13 root root 4096 Apr 8 14:13 . 12 dr-xr-xr-x. 41 root root 8192 Mar 11 09:39 .. 0 drwxr-xr-x. 2 root root 23 Apr 6 14:56 balanced 0 drwxr-xr-x. 2 root root 23 Apr 6 14:56 desktop 16 -rw-r--r--. 1 root root 12294 Mar 31 18:46 functions 0 drwxr-xr-x. 2 root root 23 Apr 6 14:56 latency-performance 0 drwxr-xr-x. 2 root root 23 Apr 6 14:56 network-latency 0 drwxr-xr-x. 2 root root 23 Apr 6 14:56 network-throughput 0 drwxr-xr-x. 2 root root 39 Apr 6 14:56 powersave 4 -rw-r--r--. 1 root root 1288 Jul 31 2015 recommend.conf 0 drwxr-xr-x. 2 root root 23 Apr 6 14:56 throughput-performance 0 drwxr-xr-x. 2 root root 23 Apr 6 14:56 virtual-guest <-default 0 drwxr-xr-x. 2 root root 23 Apr 6 14:56 virtual-host
There is a list of predefined profiles.
[root]# tuned-adm list Available profiles: - balanced - desktop - latency-performance - network-latency - network-throughput - powersave - throughput-performance - virtual-guest - virtual-host The currently active profile is: virtual-guest
To find out more about the tuned profiles:
[root]# man tuned-profiles TUNED_PROFILES(7) tuned TUNED_PROFILES(7) NAME tuned-profiles - description of basic tuned profiles DESCRIPTION These are the base profiles which are mostly shipped in the base tuned package. They are targeted to various goals. Mostly they provide perfor‐ mance optimizations but there are also profiles targeted to low power consumption, low latency and others. You can mostly deduce the purpose of the profile by its name or you can see full description bellow. The profiles are stored in subdirectories below /usr/lib/tuned. If you need to customize the profiles, you can copy them to /etc/tuned and mod‐ ify them as you need. When loading profiles with the same name, the /etc/tuned takes precedence. In such case you will not lose your customized profiles between tuned updates. The power saving profiles contain settings that are typically not enabled by default as they will noticeably impact the latency/performance of your system as opposed to the power saving mechanisms that are enabled by default. On the other hand the performance profiles disable the addi‐ tional power saving mechanisms of tuned as they would negatively impact throughput or latency. PROFILES At the moment we're providing the following pre-defined profiles: balanced It is the default profile. It provides balanced power saving and performance. At the moment it enables CPU and disk plugins of tuned and it makes sure the ondemand governor is active (if supported by the current cpufreq driver). It enables ALPM power saving for SATA host adapters and sets the link power management policy to medium_power. It also sets the CPU energy performance bias to normal. It also enables AC97 audio power saving or (it depends on your system) HDA-Intel power savings with 10 seconds timeout. In case your system con‐ tains supported Radeon graphics card (with enabled KMS) it configures it to automatic power saving. powersave Maximal power saving, at the moment it enables USB autosuspend (in case environment variable USB_AUTOSUSPEND is set to 1), enables ALPM power saving for SATA host adapters and sets the link power manamgent policy to min_power. It also enables WiFi power saving, enables multi core power savings scheduler for low wakeup systems and makes sure the ondemand governor is active (if supported by the current cpufreq driver). It sets the CPU energy performance bias to powersave. It also enables AC97 audio power saving or (it depends on your system) HDA-Intel power savings (with 10 seconds timeout). In case your system contains supported Radeon graphics card (with enabled KMS) it configures it to automatic power saving. On Asus Eee PCs dynamic Super Hybrid Engine is enabled. throughput-performance Profile for typical throughput performance tuning. Disables power saving mechanisms and enables sysctl settings that improve the throughput performance of your disk and network IO. CPU governor is set to performance and CPU energy performance bias is set to perfor‐ mance. Disk readahead values are increased. latency-performance Profile for low latency performance tuning. Disables power saving mechanisms. CPU governor is set to performance andlocked to the low C states (by PM QoS). CPU energy performance bias to performance. network-throughput Profile for throughput network tuning. It is based on the throughput-performance profile. It additionaly increases kernel network buf‐ fers. network-latency Profile for low latency network tuning. It is based on the latency-performance profile. It additionaly disables transparent hugepages, NUMA balancing and tunes several other network related sysctl parameters. desktop Profile optimized for desktops based on balanced profile. It additionaly enables scheduler autogroups for better response of interactive applications. virtual-guest Profile optimized for virtual guests based on throughput-performance profile. It additionally decreases virtual memory swappiness and increases dirty_ratio settings. virtual-host Profile optimized for virtual hosts based on throughput-performance profile. It additionally enables more aggresive writeback of dirty pages. FILES /etc/tuned/* /usr/lib/tuned/* SEE ALSO tuned(8) tuned-adm(8) tuned-profiles-atomic(7) tuned-profiles-sap(7) tuned-profiles-sap-hana(7) tuned-profiles-oracle(7) tuned-profiles-real‐ time(7) tuned-profiles-nfv(7) tuned-profiles-compat(7) AUTHOR Jaroslav Škarvada <[email protected]> Jan Kaluža <[email protected]> Jan Včelák <[email protected]> Marcela Mašláňová <[email protected]> Phil Knirsch <[email protected]> Fedora Power Management SIG 23 Sep 2014 TUNED_PROFILES(7)
Watch the values insight the default profile (virtual-guest), which includes the network-throughput. We take the focus on the swappiness value, which is set to 30.
[root]# cat /usr/lib/tuned/virtual-guest/tuned.conf # # tuned configuration # [main] include=throughput-performance [sysctl] # If a workload mostly uses anonymous memory and it hits this limit, the entire # working set is buffered for I/O, and any more write buffering would require # swapping, so it's time to throttle writes until I/O can catch up. Workloads # that mostly use file mappings may be able to use even higher values. # # The generator of dirty data starts writeback at this percentage (system default # is 20%) vm.dirty_ratio = 30 # Filesystem I/O is usually much more efficient than swapping, so try to keep # swapping low. It's usually safe to go even lower than this on systems with # server-grade storage. vm.swappiness = 30
Some important point, the tuned profile virtual-guest includ’s the settings from the tuned profile throughtput-performance:
[root]# cat /usr/lib/tuned/throughput-performance/tuned.conf # # tuned configuration # [cpu] governor=performance energy_perf_bias=performance min_perf_pct=100 [disk] readahead=>4096 [sysctl] # ktune sysctl settings for rhel6 servers, maximizing i/o throughput # # Minimal preemption granularity for CPU-bound tasks: # (default: 1 msec# (1 + ilog(ncpus)), units: nanoseconds) kernel.sched_min_granularity_ns = 10000000 # SCHED_OTHER wake-up granularity. # (default: 1 msec# (1 + ilog(ncpus)), units: nanoseconds) # # This option delays the preemption effects of decoupled workloads # and reduces their over-scheduling. Synchronous workloads will still # have immediate wakeup/sleep latencies. kernel.sched_wakeup_granularity_ns = 15000000 # If a workload mostly uses anonymous memory and it hits this limit, the entire # working set is buffered for I/O, and any more write buffering would require # swapping, so it's time to throttle writes until I/O can catch up. Workloads # that mostly use file mappings may be able to use even higher values. # # The generator of dirty data starts writeback at this percentage (system default # is 20%) vm.dirty_ratio = 40 # Start background writeback (via writeback threads) at this percentage (system # default is 10%) vm.dirty_background_ratio = 10 # PID allocation wrap value. When the kernel's next PID value # reaches this value, it wraps back to a minimum PID value. # PIDs of value pid_max or larger are not allocated. # # A suggested value for pid_max is 1024 * <# of cpu cores/threads in system> # e.g., a box with 32 cpus, the default of 32768 is reasonable, for 64 cpus, # 65536, for 4096 cpus, 4194304 (which is the upper limit possible). #kernel.pid_max = 65536 # The swappiness parameter controls the tendency of the kernel to move # processes out of physical memory and onto the swap disk. # 0 tells the kernel to avoid swapping processes out of physical memory # for as long as possible # 100 tells the kernel to aggressively swap processes out of physical memory # and move them to swap cache vm.swappiness=10
There various approaches to solve this issue:
- – disable the tuned.service for switching back to the /etc/sysctl.conf values
- – adapt the values in the virtual-guest profile but what if they will be a updated automatically by the OS vendor in future patches or releases?
- – create a new tuned profile based on the virtual-guest and adapt the values
- – use the tuned profile which is deployed by Oracle in the repository from Oracle Linux
I prefer solution 4 which is also the much useful way.
What we need to do to solve it like this:
First of all install the corresponding package from the Oracle Linux 7 repository:
[root]# yum info *tuned-profile* Loaded plugins: ulninfo Available Packages Name : tuned-profiles-oracle Arch : noarch Version : 2.5.1 Release : 4.el7_2.3 Size : 1.5 k Repo : installed From repo : ol7_latest Summary : Additional tuned profile(s) targeted to Oracle loads URL : https://fedorahosted.org/tuned/ License : GPLv2+ Description : Additional tuned profile(s) targeted to Oracle loads.
Watch the values insight this tuned profile:
[root]# cat /usr/lib/tuned/oracle/tuned.conf # # tuned configuration # [main] include=throughput-performance [sysctl] vm.swappiness = 1 vm.dirty_background_ratio = 3 vm.dirty_ratio = 80 vm.dirty_expire_centisecs = 500 vm.dirty_writeback_centisecs = 100 kernel.shmmax = 4398046511104 kernel.shmall = 1073741824 kernel.shmmni = 4096 kernel.sem = 250 32000 100 128 fs.file-max = 6815744 fs.aio-max-nr = 1048576 net.ipv4.ip_local_port_range = 9000 65500 net.core.rmem_default = 262144 net.core.rmem_max = 4194304 net.core.wmem_default = 262144 net.core.wmem_max = 1048576 kernel.panic_on_oops = 1 [vm] transparent_hugepages=never
Activate the profile, check which is really active and then check the current configuration value of the swappiness parameter:
[root]# tuned-adm profile oracle [root]# tuned-adm active Current active profile: oracle [root]# cat /proc/sys/vm/swappiness 1
Now we have the oracle tuned profile applied which overwrites some values which do also come from Oracle with the oracle-rdbms-server-11gR2-preinstall or oracle-rdbms-server-12cR1-preinstall packages. In my case this is a list of double parameters:
/usr/lib/tuned/oracle/tuned.conf vm.swappiness = 1 vm.dirty_background_ratio = 3 vm.dirty_ratio = 80 vm.dirty_expire_centisecs = 500 vm.dirty_writeback_centisecs = 100 kernel.shmmax = 4398046511104 kernel.shmall = 1073741824 kernel.shmmni = 4096 kernel.sem = 250 32000 100 128 fs.file-max = 6815744 fs.aio-max-nr = 1048576 net.ipv4.ip_local_port_range = 9000 65500 net.core.rmem_default = 262144 net.core.rmem_max = 4194304 net.core.wmem_default = 262144 net.core.wmem_max = 1048576 kernel.panic_on_oops = 1 /etc/sysctl.conf from oracle-rdbms-server-11gR2-preinstall # oracle-rdbms-server-11gR2-preinstall setting for fs.file-max is 6815744 fs.file-max = 6815744 # oracle-rdbms-server-11gR2-preinstall setting for kernel.sem is '250 32000 100 128' kernel.sem = 250 32000 100 128 # oracle-rdbms-server-11gR2-preinstall setting for kernel.shmmni is 4096 kernel.shmmni = 4096 # oracle-rdbms-server-11gR2-preinstall setting for kernel.shmall is 1073741824 on x86_64 # oracle-rdbms-server-11gR2-preinstall setting for kernel.shmall is 2097152 on i386 kernel.shmall = 1073741824 # oracle-rdbms-server-11gR2-preinstall setting for kernel.shmmax is 4398046511104 on x86_64 # oracle-rdbms-server-11gR2-preinstall setting for kernel.shmmax is 4294967295 on i386 kernel.shmmax = 4398046511104 # oracle-rdbms-server-11gR2-preinstall setting for kernel.panic_on_oops is 1 per Orabug 19212317 kernel.panic_on_oops = 1 # oracle-rdbms-server-11gR2-preinstall setting for net.core.rmem_default is 262144 net.core.rmem_default = 262144 # oracle-rdbms-server-11gR2-preinstall setting for net.core.rmem_max is 4194304 net.core.rmem_max = 4194304 # oracle-rdbms-server-11gR2-preinstall setting for net.core.wmem_default is 262144 net.core.wmem_default = 262144 # oracle-rdbms-server-11gR2-preinstall setting for net.core.wmem_max is 1048576 net.core.wmem_max = 1048576 # oracle-rdbms-server-11gR2-preinstall setting for net.ipv4.conf.all.rp_filter is 2 net.ipv4.conf.all.rp_filter = 2 # oracle-rdbms-server-11gR2-preinstall setting for net.ipv4.conf.default.rp_filter is 2 net.ipv4.conf.default.rp_filter = 2 # oracle-rdbms-server-11gR2-preinstall setting for fs.aio-max-nr is 1048576 fs.aio-max-nr = 1048576 # oracle-rdbms-server-11gR2-preinstall setting for net.ipv4.ip_local_port_range is 9000 65500 net.ipv4.ip_local_port_range = 9000 65500
What we need to take care of is that if we need to modify some values we have to watch exactly what we apply and also the way in which we apply the values. The most important point is, control the outcome, like described in this blog, of setting values in the /etc/sysctl.conf. The tuned profiles are a good solution in which manufacturers or suppliers are able to distribute optimized values within the Linux distributions.