Java performance tuning tips or everything you want to know about Java performance in 15 minutes

Last updated: 07 February 2015

This is a summary of Java performance tuning tips described on java-performance.info website. This page will be updated after publishing a new article on Java performance tuning website.

Unlike most of Java performance books, this guide targets tuning your Java code instead of your JVM settings. It means that this guide will be useful for any low-latency or high-throughput application developers (especially for high frequency trading area).

Tools

Introduction to JMH: an overview of JMH - a new microbenchmarking framework from Oracle. I have covered most of essential functionality in the framework. JMH profilers will be the subject of a separate article. This article covers JMH 1.0.

Tags: JMH, microbenchmarking.

Introduction to JMH Profilers: an overview of profilers bundled inside JMH - a new microbenchmarking framework from Oracle. They will let you get an insight in your microbencmarks, which could not be available using the normal profilers, because your tests could be too quick and you should separate test code from the framework code.

Tags: JMH, profiler, microbenchmarking.

JDK classes

NEW:

Large HashMap overview: JDK, FastUtil, Goldman Sachs, HPPC, Koloboke, Trove - January 2015 version: a quick overview of all major libraries implementing hashmaps.

Tags: hash map, FastUtil, GS collections, HPPC, Koloboke, Trove.

Core Java 7 Change Log: a list of all changes in the core Java classes of JDK7 releases.

Tags: Java 7, changes.

I will keep track of all change in core Java JDK7 classes related to performance on this page. All JDK updates up to Java 7u45 are now covered.

Using double/long vs BigDecimal for monetary calculations: double, long, java.math.BigDecimal, java.lang.String:

Tags: finance, money, HFT, low latency.

Changes to String internal representation made in Java 1.7.0_06: java.lang.String, java.util.HashMap, java.util.Hashtable, java.util.HashSet, java.util.LinkedHashMap, java.util.LinkedHashSet, java.util.WeakHashMap and java.util.concurrent.ConcurrentHashMap:

Tags: String.substring, Java 7, memory consumption, low latency.

String deduplication feature (from Java 8 update 20): this article will describe the string deduplication feature added in Java 8 update 20. It will allow you to save memory occupied by the duplicate strings without writing a single line of Java code. While this is not the most efficient memory saving tool in the absolute values, it is definitely a winner in the achievement vs developer efforts nomination.

Tags: String, Java 8, memory consumption.

Performance of various methods of binary serialization in Java: java.nio.ByteBuffer, sun.misc.Unsafe, java.io.DataInputStream, java.io.DataOutputStream, java.io.ByteArrayInputStream, java.io.ByteArrayOutputStream: comparison of binary serialization performance using various classes:

Tags: serialization in Java, unsafe memory access in Java, high throughput, low latency.

Java collections overview: all JDK 1.6/1.7 standard collections are described and categorized in this overview.

Tags: Java 1.6 collections, Java 1.7 collections, Java collections guide, overview.

Here is a very brief summary of all JDK collections:

  Single threaded Concurrent
Lists
  • ArrayList - generic array-based
  • LinkedList - do not use
  • Vector - deprecated
  • CopyOnWriteArrayList - seldom updated, often traversed
Queues / deques
  • ArrayDeque - generic array-based
  • Stack - deprecated
  • PriorityQueue - sorted retrieval operations
  • ArrayBlockingQueue - bounded blocking queue
  • ConcurrentLinkedDeque / ConcurrentLinkedQueue - unbounded linked queue (CAS)
  • DelayQueue - queue with delays on each element
  • LinkedBlockingDeque / LinkedBlockingQueue - optionally bounded linked queue (locks)
  • LinkedTransferQueue - may transfer elements w/o storing
  • PriorityBlockingQueue - concurrent PriorityQueue
  • SynchronousQueue - Exchanger with Queue interface
Maps
  • HashMap - generic map
  • EnumMap - enum keys
  • Hashtable - deprecated
  • IdentityHashMap - keys compared with ==
  • LinkedHashMap - keeps insertion order
  • TreeMap - sorted keys
  • WeakHashMap - useful for caches
  • ConcurrentHashMap - generic concurrent map
  • ConcurrentSkipListMap - sorted concurrent map
Sets
  • HashSet - generic set
  • EnumSet - set of enums
  • BitSet - set of bits/dense integers
  • LinkedHashSet - keeps insertion order
  • TreeSet - sorted set
  • ConcurrentSkipListSet - sorted concurrent set
  • CopyOnWriteArraySet - seldom updated, often traversed

java.util.ArrayList performance guide: java.util.ArrayList:

Tags: low latency, high throughput, CPU cache friendly, Java collections, CPU optimization, memory optimization.

Try to follow these rules while using ArrayList:

java.util.LinkedList performance: java.util.LinkedList, java.util.ArrayDeque:

Tags: Java collections, CPU optimization, avoid it.

If you need to write fast LinkedList code, try to stick to these rules:

Bit sets: java.util.BitSet, java.util.Set<Integer>: representing set of integers in the most compact form, using bit sets to store set of Long/long values:

Tags: low latency, high throughput, CPU cache friendly, Java collections, CPU optimization, memory optimization.

java.util.IdentityHashMap: discussion why an IdentityHashMap is so special and what alternatives does it have.

Tags: Java collections, object graph, avoid it.

Regexp-related methods of String: java.util.regex.Pattern, java.util.regex.Matcher, java.lang.String: pattern/matcher logic:

Tags: low latency, high throughput, CPU optimization.

java.util.Date, java.util.Calendar and java.text.SimpleDateFormat performance: java.util.Date, java.util.Calendar, java.text.SimpleDateFormat: date storage, parsing and converting back to string:

Tags: low latency, high throughput, finance, CPU optimization, memory optimization.

Joda Time library performance: org.joda.time.DateTime, org.joda.time.format.DateTimeFormat, org.joda.time.format.DateTimeFormatter.
This is a comparison of Joda Time library classes performance with standard JDK classes performance (java.util.Date, java.util.Calendar, java.text.SimpleDateFormat). I advice you to read this article in conjunction with a java.util.Date, java.util.Calendar and java.text.SimpleDateFormat performance article. This article was tested on Joda Time ver 2.1-2.3.

Tags: low latency, high throughput, finance, CPU optimization, memory optimization.

JSR 310 - Java 8 Date/Time library performance (as well as Joda Time 2.3 and j.u.Calendar): an overview of a new Java 8 date/time implementation also known as JSR-310 and its performance comparison with Joda Time 2.3 and j.u.GregorianCalendar.

Tags: Java 8, overview, CPU optimization, memory optimization.

java.io.ByteArrayOutputStream: java.io.ByteArrayOutputStream, java.nio.ByteBuffer: why you should not use ByteArrayOutputStream in the performance critical code.

Tags: Java IO, avoid it.

java.io.BufferedInputStream and java.util.zip.GZIPInputStream: java.io.BufferedInputStream, java.util.zip.GZIPInputStream, java.nio.channels.FileChannel: some minor performance pitfalls in these two streams.

Tags: high throughput, CPU optimization, memory optimization, data compression.

NEW:

Performance of various general compression algorithms - some of them are unbelievably fast!: java.util.zip.GZIPInputStream / GZIPOutputStream / DeflaterOutputStream / InflaterInputStream, LZ4, Snappy: checking performance of various general purpose Java compressors.

Tags: high throughput, CPU optimization, storage optimization, data compression, GZIP, deflate, LZ4, Snappy.

java.lang.Byte, Short, Integer, Long, Character (boxing and unboxing): java.lang.Byte, java.lang.Short, java.lang.Integer, java.lang.Long, java.lang.Character:

Tags: low latency, high throughput, CPU optimization, memory optimization.

Byte, Short, Long Character Integer Float, Double
From -128 to 127 From 0 to 127 From -128 to java.lang.Integer.IntegerCache.high or 127, whichever is bigger No caching

Map.containsKey/Set.contains: java.util.Map, java.util.Set and most of their implementations:

Tags: low latency, high throughput, CPU optimization, Java collections.

java.util.zip.CRC32 and java.util.zip.Adler32 performance: java.util.zip.CRC32, java.util.zip.Adler32 and java.util.zip.Checksum:

Tags: CPU optimization, checksum.

hashCode method performance tuning: java.lang.String, java.util.HashMap, java.util.HashSet, java.util.Arrays:

Tags: low latency, high throughput, CPU optimization, memory optimization.

Creating an exception in Java is very slow: why it is too expensive to create exceptions in Java and how can you avoid those costs: java.lang.Throwable, java.lang.Exception, java.lang.RuntimeException, sun.misc.BASE64Decoder, java.lang.NumberFormatException:

Tags: low latency, high throughput, CPU optimization.

Java logging performance pitfalls: how to lose as little as possible performance while writing log messages: java.util.logging.Logger, java.util.logging.Handler, java.util.logging.Formatter, java.text.MessageFormat:

Tags: low latency, high throughput, CPU optimization, logging.

Base64 encoding and decoding performance: an overview of several well-known Base64 Java implementations from the performance perspective: sun.misc.BASE64Encoder, sun.misc.BASE64Decoder, java.util.Base64 (Java 8 specific), javax.xml.bind.DatatypeConverter (Java 6+), org.apache.commons.codec.binary.Base64, com.google.common.io.BaseEncoding (Google Guava), http://iharder.net/base64, MiGBase64:

Tags: low latency, high throughput, CPU optimization, serialization in Java, Java 8.

A possible memory leak in the manual MultiMap implementation: an overview of multimap implementations in Java 8, Google Guava and Scala 2.10 as well as a description of a possible memory leak you can have while manually implementing a multimap using Java 6 or 7.

Tags: Java collections, Java 8, Scala, Google Guava.

java.util.Random and java.util.concurrent.ThreadLocalRandom in multithreaded environments: an overview of java.util.Random and java.util.concurrent.ThreadLocalRandom in single and multithreaded environments as well as some low level analysis of their performance.

Tags: Java Random, Java 7, ThreadLocalRandom, multithreading, CAS.

Charset encoding and decoding in Java 7/8: we will check how fast are Charset encoders/decoders in Java 7 and what are the performance improvements in Java 8.

Tags: Charset, ISO-8859-1, Java 8.

String switch performance: we will check how fast are various ways of implementing a string-based switch.

Tags: switch, Java 8, String.equals/equalsIgnoreCase.

Memory optimization

An overview of memory saving techniques in Java: this article will give you the basic advices on memory optimization in Java. Most of other Java memory optimization techniques are based on those advices.

Tags: low latency, high throughput, finance, CPU optimization, memory optimization.

Memory consumption of popular Java data types - part 1: this article will describe the memory consumption of enums and EnumMap / EnumSet / BitSet / ArrayList / LinkedList / ArrayDeque JDK classes in Java 7.

Tags: low latency, high throughput, finance, CPU optimization, memory optimization, collections.

The following table summarizes the storage occupied per stored value assuming that a Java object reference occupies 4 bytes. Note that you must spend 4 byte per Object reference in any case, so subtract 4 bytes from the values in the following table to find out the storage overhead.

EnumSet, BitSet 1 bit per value
EnumMap 4 bytes (for value, nothing for key)
ArrayList 4 bytes (but may be more if ArrayList capacity is seriously more than its size)
LinkedList 24 bytes (fixed)
ArrayDeque 4 to 8 bytes, 6 bytes on average

Memory consumption of popular Java data types - part 2: this article will describe the memory consumption of HashMap / HashSet, LinkedHashMap / LinkedHashSet, TreeMap / TreeSet and PriorityQueue JDK classes in Java 7 as well as their Trove replacements.

Tags: low latency, high throughput, finance, CPU optimization, memory optimization, collections, primitive collections.

JDK collection Size Possible Trove substitution Size
HashMap 32 * SIZE + 4 * CAPACITY bytes THashMap 8 * CAPACITY bytes
HashSet 32 * SIZE + 4 * CAPACITY bytes THashSet 4 * CAPACITY bytes
LinkedHashMap 40 * SIZE + 4 * CAPACITY bytes None  
LinkedHashSet 40 * SIZE + 4 * CAPACITY bytes TLinkedHashSet 8 * CAPACITY bytes
TreeMap, TreeSet 40 * SIZE bytes None  
PriorityQueue 4 * CAPACITY bytes None  

A few more memory saving techniques in Java: this article describes the advantages of static inner classes, string pooling, boolean flag collections as well as special classes for tiny collections in JDK.

Tags: low latency, high throughput, finance, CPU optimization, memory optimization, collections.

String.intern in Java 6, 7 and 8 - string pooling: This article will describe how String.intern() method was implemented in Java 6 and what changes were made in it in Java 7 and Java 8 (which finally made it extremely useful).

Tags: CPU optimization, memory optimization.

String.intern in Java 6, 7 and 8 - multithreaded access: This article describes the performance impact of the multithreaded calls to String.intern().

Tags: CPU optimization, memory optimization.

String.intern in Java 6, 7 and 8 - part 3: String.intern() usage best practices.

Tags: CPU optimization, memory optimization.

NEW:

Going over Xmx32G heap boundary means you will have less memory available: increasing a Java application heap could be a routine operation in respond to the data growth. A routine until the moment the heap will exceed 32 gigabytes... At this time you will be surprised to see that you application is behaving even worse than before :( What has happened? JVM has switched to using 64 bit object references at Xmx32G, which means that your app memory footprint has momentarily increased by 15-25%. What should you do?

  1. Go directly to Xmx38G and keep increasing Java heap from that point
  2. or tune your application memory consumption!

Tags: memory optimization, Java 32G heap, Java 64 bit object references.

Trove library: using primitive collections for performance: this is an overview of Trove library, which is a primitive type collection library. There is also guidelines for migrating your code from JDK to Trove.

Tags: low latency, high throughput, finance, CPU optimization, memory optimization, primitive collections, CPU cache friendly.

Various types of memory allocation in Java: how to allocate a large memory buffer in Java and how to write any Java types into such buffer.

Tags: low latency, high throughput, finance, CPU optimization, low level memory access in Java.

Memory introspection using sun.misc.Unsafe and reflection: how to find out Java object memory layout using sun.misc.Unsafe and reflection.

Tags: memory usage in Java, memory allocation in Java.

Protobuf data encoding for numeric datatypes: what type of numeric data encoding is used in Google Protobuf, how it impacts the compressed data size and how fast is it.

Tags: low latency, high throughput, finance, CPU optimization, memory optimization, serialization in Java.

Use case: compacting price field disk representation: double, short, java.math.BigDecimal: an example of compacting your data:

Tags: high throughput, finance, memory optimization.

Use case: how to compact a long-to-long mapping: a use case where we try to identify some long-2-long mapping properties in order to represent it in the most compact form.

Tags: low latency, high throughput, CPU optimization, memory optimization, data compression.

String packing part 1: converting characters to bytes: we discuss Java objects memory layout and consumption. After that we try to pack a String into a more compact representation, trying to minimize using any Objects.

Tags: low latency, high throughput, finance, CPU optimization, memory optimization, data compression.

String packing part 2: converting Strings to any other objects: we discuss how and when to convert a String into various more compact Java objects for temporary string representation.

Tags: low latency, high throughput, finance, CPU optimization, memory optimization, data compression.

Small tricks

I/O bound algorithms: SSD vs HDD: This article will investigate an impact of modern SSDs on the I/O bound algorithms of HDD era.

Tags: low latency, high throughput, finance, CPU optimization, hardware, Java IO.

Forbidden Java actions: object assignments, type conversions etc on the low level in Java: This article will reveal you a few details about the low level Java memory layout: we will see how to implement Object assignments using just primitive types. Then we will see what's hidden in the array header and will convert an array of one type into an array of another type.

Tags: memory usage in Java, memory allocation in Java, unsafe memory access in Java.

Forbidden Java actions: updating final and static final fields: This article will discuss how you can update final or static final fields in Java using reflection and sun.misc.Unsafe.

Tags: memory usage in Java, memory allocation in Java, unsafe memory access in Java.

Forbidden Java actions: not declaring a checked exception; avoiding a constructor while creating an object: In this article we will see how to throw a checked exception in Java without declaring it in the method throws clause and how to create an object without calling any of its constructors.

Tags: memory usage in Java, memory allocation in Java, unsafe memory access in Java.

Static constructor code is not JIT-optimized in a lot of cases: Static constructor code is generally executed in the interpreted mode, even if you have a heavy calculations in it. But there is a way to force it run in the compiled mode:

Tags: Java pitfalls, avoid it.

Inefficient byte[] to String constructor: be careful when using public String(byte bytes[], int offset, int length, Charset charset) constructor in Java 6:

Tags: Java pitfalls, avoid it.

Java varargs performance issues: a short review of the actual varargs implementation in Java 5+.

Primitive types to String conversion and String concatenation: a description of various types of string concatenation in Java as well as a few JVM options helping us to make the string concatenation even faster.

Use cases

In this set of articles we try to apply principles discussed in the other articles to the "real world" problems.

Use case: FIX messages processing. Part 1: Writing a simple FIX parser and Use case: FIX messages processing. Part 2: Composing a message out of fields: possible gateway implementation: a tag-based FIX message parsing and composing is described in two these articles. In essence, we parse a 0x0001 separated string into a list of name=value tags, which are converted to actual datatypes after that. In the second part we will discuss a best way to compose these messages back to String format as a part of a gateway implementation.

Tags: low latency, high throughput, finance, CPU optimization.

Use case: Optimizing memory footprint of a read only csv file (Trove, Unsafe, ByteBuffer, data compression): we will see how to optimize memory consumption of a Java program which has to store a large set of readonly data in memory (using ByteBuffer or sun.misc.Unsafe). We will also try replacing index fields with their hash codes, still supporting the case of hash code collisions.

Tags: low latency, high throughput, finance, CPU optimization, memory optimization.

Single file vs multi file storage: a short research on the file system cache implementation in Windows 7 and Linux.

Tags: hardware, file system, CPU optimization.

Static code compilation in Groovy 2.0: we will see how static compilation in Groovy makes it as fast as Java.

Tags: Groovy, dynamic languages, CPU optimization.

Implementing a high performance Money class: how to implement the efficient Money class capable to deal with the arbitrary precision calculations.

Tags: money, monetary calcualtions, double, BigDecimal, finance, HFT, low latency.

Author Google profile
Author Linkedin profile