JVM Configuration
LogScale runs on the Java Virtual Machine (JVM). For supported Java versions, see the System Requirements page the version you are using. For information on installing JVM, see the Java documentation page.
In this section we describe things you should consider when selecting and configuring your JVM for LogScale.
Java Memory Options
Important
The LogScale LogScale Launcher Script automatically configures a suitable RAM configuration according to the environment. Manual configuration of these values is not required.
We recommend that systems running LogScale have as much RAM as possible, but not for the JVM. LogScale will operate comfortably within 10 GB for most workloads. The remainder of the RAM in your system should remain available for use as filesystem page cache.
Attention
When installing in a bare metal environment, disable the use of swap memory as it gives a false impression to Java and LogScale that more memory than physical memory is available.
To disable swap memory, remove any swap entries within the
/etc/fstab
.
The LogScale Launcher Script will configure values automatically based on your infrastructure automatically. You should not need to configure these values manually.
Garbage Collection
Important
The LogScale LogScale Launcher Script automatically configures a suitable garbage collection configuration. Manual configuration of these values is not required.
Transparent Huge Pages (THP) on Linux Systems
Huge pages are helpful in virtual memory management in Linux systems. As the name suggests, they help in managing pages larger than the standard 4 KB.
In virtual memory management, the kernel maintains a table to map virtual memory addresses to physical addresses. For every page transaction, the kernel needs to load related mapping. If you have small sized pages, then you need to load more pages, which results in the kernel loading more mapping tables. This decreases performance.
Using huge pages means you will need fewer pages. This decreases the number of mapping tables loaded by the kernel, which increases your kernel level performance, ultimately benefiting your application.
In short, enabling huge pages means less system overhead to access and maintain them.
Transparent Huge Pages (THP) is a Linux memory management system that reduces the overhead of Translation Lookaside Buffer (TLB) lookups on machines with large amounts of memory by using larger memory pages.
To start, check the THP setting:
shell# cat /sys/kernel/mm/transparent_hugepage/enabled
The command will return the current setting, with square brackets around the enabled value:
always [madvise] never
This should be updated to
always
to enable THP for all processes.Other settings and values should also be checked using one or all of the following commands:
ini# grep Huge /proc/meminfo AnonHugePages: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 1024 kB # egrep 'trans | thp' /proc/vmstat nr_anon_transparent_hugepages 2018 thp_fault_alloc 7302 thp_fault_fallback 0 thp_collapse_alloc 401 thp_collapse_alloc_failed 0 thp_split 21 # cat /proc/sys/vm/nr_hugepages 1024 # sysctl vm.nr_hugepages vm.nr_hugepages = 1024 # grep -i HugePages_Total /proc/meminfo HugePages_Total: 1024 # cat /proc/sys/vm/nr_hugepages 1024 # sysctl vm.nr_hugepages vm.nr_hugepages = 1024
Check the
grub.conf
configuration parametertransparent_hugepage
:transparent_hugepage=never
To enable transparent huge pages:
Update the
grub.conf
configuration parametertransparent_hugepage
:transparent_hugepage=always
Enable THP in Java by using the JVM flag
-XX:+UseTransparentHugePages
.Update the active configuration to allow all applications:always enable:
shell$
echo always > /sys/kernel/mm/transparent_hugepage/enabled
Note
If you are using containers then huge page support must be enabled in the host operating system.
NUMA Multi-Socket Systems
LogScale fully utilizes the available IO channels (network and disk), physical memory, and CPU during query execution. Coordinating memory across cores will slow LogScale down, leaving hardware resources underutilized. NUMA hardware is more sensitive to this.
Non-uniform memory access (NUMA) on multisocket hardware systems is challenging for multithreaded software with mutable shared state. Multithreaded object-oriented systems fit this description and so this is something to be aware of. Thankfully the JVM is our platform and has some support for NUMA since version 6 (aka 1.6), but limited support for those optimizations across garbage collectors.
There are two strategies for NUMA deployment
Use the operating system to split each socket into a separate logical space, and
Configure the JVM to be NUMA-aware in hopes that it will make good choices about thread and memory affinity.
Strategy 1: Run One JVM per NUMA Node
The intent here is to pin a JVM process to each NUMA node (not CPU, not core, not thread) and that node's associated memory (RAM) sockets only, and thereby avoid the overhead of cross-NUMA-node coordination (which is expensive) using tools provided by the operating system. We have successfully done this on Linux, but certainly other operating systems have similar primitives.
Using this strategy it is important that you do not enable any of the JVM flag related to NUMA.
On Linux you'll use numactl
in
your startup script to confine a JVM process to a NUMA node and that
node's memory.
/usr/bin/numactl --cpunodebind=%i --membind=%i --localalloc -- command {arguments ...}
The command
being
java
and
arguments
being those passed to
the JVM at startup.
Pros:
You have more LogScale nodes in your cluster, which looks cool.
You can use any GC algorithm you'd like.
We at LogScale have deployed on NUMA hardware using this strategy.
Cons:
You have more LogScale nodes in your cluster, which can be confusing.
You'll use a multiple of the RAM used for the JVM (heap
-Xms
, stack-Xms
) for operating each NUMA node, reducing the available RAM for file system buffers.
Strategy 2: Run One JVM per System with a NUMA-aware Garbage Collector
The intent here is to run one JVM per system across all NUMA nodes and let the JVM deal with the performance penalties/benefits of such a hardware layout. In our experience, the JVM does not reach the full potential of NUMA hardware when running in this manner but that is changing (as the JVM and the Scala language mature), and we expect someday that it will be a simpler, higher performance, and more efficient configuration.
For this strategy it is important to enable NUMA support in the JVM by
specifying the -XX:+UseNUMA
option.
Pros:
You'll use less RAM per JVM, leaving more available for filesystem caching.
You'll have less contention on the PCI bus and network hardware.
Cons:
You have to choose a NUMA-aware GC algorithm.
You have to remember to enable the NUMA-specific code in the JVM.
You can't use any GC algorithm you'd like; you have to choose one that is NUMA-aware.
We at LogScale have NUMA hardware in production running LogScale; we don't use this strategy for performance reasons.
NUMA-aware Garbage Collectors in the JVM
A NUMA-aware JVM will partition the heap with respect to the NUMA nodes, and when a thread creates a new object the memory allocated resides on RAM associated with the NUMA node of the core that is running the thread. Later, if the same thread uses that object it will be in cache or the NUMA node's local memory (read: close by, so fast to access). Also when compacting the heap, the NUMA-aware JVM avoids moving large data chunks between nodes (and reduces the length of stop-the-world events).
The parallel collector (to enable use the
-XX:+UseParallelGC
flag) has
been NUMA-aware for years and works well; it should be your first
choice. Should you choose G1GC please also add
-XX:-G1UseAdaptiveIHOP
as it
will
improve predictable performance under load and lower GC
overhead.
Shenandoah GC does not include support specific to running on NUMA hardware at this time which means that it isn't suitable for use on such systems.
The Zero Garbage Collector (ZGC) has only basic NUMA support which it enables by default on multi-socket systems unless pinned to a single NUMA node.
Additional Assistance
Configuring the JVM for optimum performance is a black art, not a science, and LogScale will perform vastly differently on the same hardware with different JVM configurations. Please reach out to us for help; this is an important and subtle topic. When deploying LogScale, please read carefully and feel free to consult with us through our support channels if you have any concerns or simply want advice.
Below are additional Java and JVM resources you might find useful: