JVM Configuration

LogScale runs on the Java Virtual Machine (JVM). For supported Java versions, see the System Requirements page the version you are using. For information on installing JVM, see the Java documentation page.

In this section we describe things you should consider when selecting and configuring your JVM for LogScale.

Java Memory Options

Important

The LogScale Launcher Script automatically configures a suitable RAM configuration according to the environment. Manual configuration of these values is not required.

We recommend that systems running LogScale have as much RAM as possible, but not for the JVM. LogScale will operate comfortably within 10 GB for most workloads. The remainder of the RAM in your system should remain available for use as filesystem page cache.

Attention

When installing in a bare metal environment, disable the use of swap memory as it gives a false impression to Java and LogScale that more memory than physical memory is available.

To disable swap memory, remove any swap entries within the /etc/fstab.

The Launcher Script will configure values automatically based on your infrastructure automatically. You should not need to configure these values manually.

Garbage Collection

Important

The LogScale Launcher Script automatically configures a suitable garbage collection configuration. Manual configuration of these values is not required.

Transparent Huge Pages (THP) on Linux Systems

Huge pages are helpful in virtual memory management in Linux systems. As the name suggests, they help in managing pages larger than the standard 4 KB.

In virtual memory management, the kernel maintains a table to map virtual memory addresses to physical addresses. For every page transaction, the kernel needs to load related mapping. If you have small sized pages, then you need to load more pages, which results in the kernel loading more mapping tables. This decreases performance.

Using huge pages means you will need fewer pages. This decreases the number of mapping tables loaded by the kernel, which increases your kernel level performance, ultimately benefiting your application.

In short, enabling huge pages means less system overhead to access and maintain them.

Transparent Huge Pages (THP) is a Linux memory management system that reduces the overhead of Translation Lookaside Buffer (TLB) lookups on machines with large amounts of memory by using larger memory pages.

To find out what's available and being used on your Linux system you can as root run some of the following commands:

ini
# grep Huge /proc/meminfo
AnonHugePages:         0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       1024 kB
# egrep 'trans
| thp' /proc/vmstat
nr_anon_transparent_hugepages 2018
thp_fault_alloc 7302
thp_fault_fallback 0
thp_collapse_alloc 401
thp_collapse_alloc_failed 0
thp_split 21
# cat /proc/sys/vm/nr_hugepages
1024
# sysctl vm.nr_hugepages
vm.nr_hugepages = 1024
# grep -i HugePages_Total /proc/meminfo
HugePages_Total:       1024
# cat /proc/sys/vm/nr_hugepages
1024
# sysctl vm.nr_hugepages
vm.nr_hugepages = 1024

Make sure that grub.conf doesn't include: transparent_hugepage=never.

The JVM flag to enable the use of this feature is -XX:+UseTransparentHugePages.

One way to add configuration to your Linux system to allow applications to use huge pages, the simplest is to add the following to your /etc/rc.local file.

shell
madvise > /sys/kernel/mm/transparent_hugepage/enabled
echo advise > /sys/kernel/mm/transparent_hugepage/shmem_enabled
echo defer > /sys/kernel/mm/transparent_hugepage/defrag
echo 1 > /sys/kernel/mm/transparent_hugepage/khugepaged/defrag

Note

If you are using containers then huge page support must be enabled in the host operating system.

NUMA Multi-Socket Systems

LogScale fully utilizes the available IO channels (network and disk), physical memory, and CPU during query execution. Coordinating memory across cores will slow LogScale down, leaving hardware resources underutilized. NUMA hardware is more sensitive to this.

Non-uniform memory access (NUMA) on multisocket hardware systems is challenging for multithreaded software with mutable shared state. Multithreaded object-oriented systems fit this description and so this is something to be aware of. Thankfully the JVM is our platform and has some support for NUMA since version 6 (aka 1.6), but limited support for those optimizations across garbage collectors.

There are two strategies for NUMA deployment

  1. Use the operating system to split each socket into a separate logical space, and

  2. Configure the JVM to be NUMA-aware in hopes that it will make good choices about thread and memory affinity.

Strategy 1: Run One JVM per NUMA Node

The intent here is to pin a JVM process to each NUMA node (not CPU, not core, not thread) and that node's associated memory (RAM) sockets only, and thereby avoid the overhead of cross-NUMA-node coordination (which is expensive) using tools provided by the operating system. We have successfully done this on Linux, but certainly other operating systems have similar primitives.

Using this strategy it is important that you do not enable any of the JVM flag related to NUMA.

On Linux you'll use numactl in your startup script to confine a JVM process to a NUMA node and that node's memory.

logscale
/usr/bin/numactl --cpunodebind=%i --membind=%i --localalloc -- command {arguments ...}

The command being java and arguments being those passed to the JVM at startup.

  • Pros:

    • You have more LogScale nodes in your cluster, which looks cool.

    • You can use any GC algorithm you'd like.

    • We at LogScale have deployed on NUMA hardware using this strategy.

  • Cons:

    • You have more LogScale nodes in your cluster, which can be confusing.

    • You'll use a multiple of the RAM used for the JVM (heap -Xms, stack -Xms) for operating each NUMA node, reducing the available RAM for file system buffers.

Strategy 2: Run One JVM per System with a NUMA-aware Garbage Collector

The intent here is to run one JVM per system across all NUMA nodes and let the JVM deal with the performance penalties/benefits of such a hardware layout. In our experience, the JVM does not reach the full potential of NUMA hardware when running in this manner but that is changing (as the JVM and the Scala language mature), and we expect someday that it will be a simpler, higher performance, and more efficient configuration.

For this strategy it is important to enable NUMA support in the JVM by specifying the -XX:+UseNUMA option.

  • Pros:

    • You'll use less RAM per JVM, leaving more available for filesystem caching.

    • You'll have less contention on the PCI bus and network hardware.

  • Cons:

    • You have to choose a NUMA-aware GC algorithm.

    • You have to remember to enable the NUMA-specific code in the JVM.

    • You can't use any GC algorithm you'd like; you have to choose one that is NUMA-aware.

    • We at LogScale have NUMA hardware in production running LogScale; we don't use this strategy for performance reasons.

NUMA-aware Garbage Collectors in the JVM

Collector Version Distribution
ParallelGC JDK8+ *
G1GC JDK14+ *
C4 JDK8+ Azul Zing

A NUMA-aware JVM will partition the heap with respect to the NUMA nodes, and when a thread creates a new object the memory allocated resides on RAM associated with the NUMA node of the core that is running the thread. Later, if the same thread uses that object it will be in cache or the NUMA node's local memory (read: close by, so fast to access). Also when compacting the heap, the NUMA-aware JVM avoids moving large data chunks between nodes (and reduces the length of stop-the-world events).

The parallel collector (to enable use the -XX:+UseParallelGC flag) has been NUMA-aware for years and works well; it should be your first choice. Should you choose G1GC please also add -XX:-G1UseAdaptiveIHOP as it will improve predictable performance under load and lower GC overhead.

Shenandoah GC does not include support specific to running on NUMA hardware at this time which means that it isn't suitable for use on such systems.

The Zero Garbage Collector (ZGC) has only basic NUMA support which it enables by default on multi-socket systems unless pinned to a single NUMA node.

Additional Assistance

Configuring the JVM for optimum performance is a black art, not a science, and LogScale will perform vastly differently on the same hardware with different JVM configurations. Please reach out to us for help; this is an important and subtle topic. When deploying LogScale, please read carefully and feel free to consult with us through our support channels if you have any concerns or simply want advice.

Below are additional Java and JVM resources you might find useful: