16 Jul 2019

Why should Transparent Huge Pages be disabled for database servers?

Linux treats physical pages as the basic unit of memory, splitting the available memory into page-sized ‘chunks’, and, no matter how little memory an application actually needs to write, it will be allocated at least one page (for example, 4096 bytes). A page is always contiguous, meaning that all of the memory in a page is physically together.

When user-space applications use system memory, they typically do so using virtual addressing, meaning that two processes can have different data at the same virtual address, but, in physical memory they are at different locations. The Memory Management Unit (MMU) then maps this virtual address to the real, physical address by looking the virtual address up in a “page table” to find which physical pages it relates to.

Allocating virtual memory thus has overhead, as it must specifically set up page table entries, and every memory lookup now requires at least two accesses; one for the page table, and another for the physical page. Also, these entries must be mapped by each individual page (as pages are contiguous but multiple pages may be sparse), so given a page size of 4KB, 4MB of virtual memory will require 1000 page table entries.

Lookups in the page table are quick – around 100ns as it’s stored in main memory – but still add overhead, and need to be done for every single memory access. In an attempt to reduce this overhead, the MMU has a cache (often in the CPU itself) called the Translation Lookahead Buffer (TLB) which can be accessed much quicker; in some cases around 20ns.

The TLB is limited to around 100 pages, which is only 400KB of memory if using 4KB pages. Transparent Huge Pages allow these pages to be a lot larger, meaning that the 100-entry cache refers to much more physical memory, and so lookups for contiguous memory can be significantly improved (note that the ‘transparent’ in this context means that it is possible without the application being aware of it). This feature also helps to reduce the overhead caused by the TLB; if a page entry isn’t found in the cache, it will need to be found in the page table, and the result will then be stored in the TLB. The TLB will then be checked again, meaning that each cache miss has at least three TLB accesses plus two memory lookups; or potentially around a 30% overhead compared to using the page table alone.

However, there’s no such thing as a free lunch, and Transparent Huge Pages don’t come without their own downsides.

Firstly, larger pages require more memory usage. The memory allocation function in the kernel will allocate at least the requested page size, and possibly more (rounded up to fit within the available memory). No matter how little memory your application actually requires, at least a full page will be allocated to it.

Secondly, as mentioned earlier, a page must be contiguous in memory and this is true for ‘huge pages’ too. So, if the server cannot find a full page available in a row, it will defragment the memory before allocating it, which can impact performance.

InnoDB is built on a B*-Tree of indices, meaning that its workload will usually have sparse-rather than contiguous-memory access, and, as such, it will likely noticeably perform worse with THP.

Also, if you are using jemalloc with THP, you can eventually end up with the server running out of memory, as it is unable to free unused memory: see Why TokuDB hates Transparent HugePages.

It is for this reason we recommend disabling Transparent Huge Pages for database servers; especially those using InnoDB, TokuDB or RocksDB engines. MongoDB also recommends running without THP for all storage engines and will print a multi-line warning to the logs and database shell if the setting is enabled.

You can check if Transparent Huge Pages are enabled with:

$ cat /sys/kernel/mm/transparent_hugepage/enabled
 [always] madvise never

The keyword highlighted in square brackets is the active setting, so, in the above example they are always enabled.

You can disable the feature until the next reboot by running the following command as root:

echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag

To disable on boot see the following instructions: How to use, monitor, and disable transparent hugepages in Red Hat Enterprise Linux 6 and 7? - Red Hat Customer Portal