Escape Analysis in Java 6?
Last month we held our Speeding up Java applications course in the Dutch woods. When preparing for it, I discussed some of the new topics with my peer instructor and creator of the course Kirk Pepperdine. We explain new features of Java 6 and how they can help improve your performance. One of the more sophisticated features on the VM level is called escape analysis. The question is: does it really work?
We tell about escape analysis not only in the course, but also in our Performance Top-10 blog and podcast, and in my J-Fall presentation. Brian Goetz writes in September 2005: “Escape analysis is an optimization that has been talked about for a long time, and it is finally here — the current builds of Mustang (Java SE 6) can do escape analysis …” Furthermore, Wikipedia states: “Escape analysis is implemented in Java Standard Edition 6.” And several escape analysis JVM switches, like -XX:-DoEscapeAnalysis are available. So, we can assume it works, right?
But, let us not assume here because assumption is the mother of all f*** ups. And it turns out as we will see: it does not work! We need to measure, not make assumptions. I read an interview with Java specialist Heinz Kabutz where he actually measures. He benchmarks various ways of String concatenation. He uses the thread-safe StringBuffer and thread-unsafe StringBuilder where the latter turns out to be significantly faster than the former with Java 6. He does not talk about escape analysis, but with escape analysis working properly, using StringBuffer would be as fast as using StringBuilder, like we claim in our Top 10 blog! So, escape analysis is not working here. I’ll explain what is going on.
Escape Analysis explained
This analysis performed by the runtime compiler can conclude for example that an object on the heap is referenced only locally in a method and no reference can escape from this scope. If so, Hotspot can apply runtime optimizations. It can allocate the object on the stack or in registers instead of on the heap. Or it can remove acquiring and releasing locks on the object altogether (lock elision) since no other thread can access the object anyway. Example:
public String concatBuffer(String s1, String s2, String s3) { StringBuffer sb = new StringBuffer(); sb.append(s1); sb.append(s2); sb.append(s3); return sb.toString(); }
Here, result sb is only used locally in the method and no reference can escape from this scope. It is a candidate for stack allocation and lock elision.
With lock elision, using objects which are thread-safe but don’t need to be in the used context, will not have the overhead of being thread-safe. So, this is like no cure – no pay: no threads to stop – no overhead. The VM switch to enable this is: -XX:+DoEscapeAnalysis.
Biased locking explained
Biased locking is another threading optimization in Java 6. Since most objects are locked by at most one thread during their lifetime, this is a sensible case to optimize for. This is what biased locking does. It allows that thread to bias an object toward itself. Once biased, that thread can subsequently lock and unlock the object without resorting to expensive atomic instructions. The VM switch is: -XX:+UseBiasedLocking and it is on by default.
Lock coarsening explained
Another threading optimization is lock coarsening or merging. Adjacent synchronized blocks are merged into one synchronized block, or multiple synchronized methods are joined into one. This only holds if the same lock object is used. So, this reduces the locking overhead. Example:
public static String concatToBuffer(StringBuffer sb, String s1, String s2, String s3) { sb.append(s1); sb.append(s2); sb.append(s3); return sb.toString(); }
In this example, the StringBuffer lock is not a candidate for lock elision because it is used outside of the method. But the three times of acquiring and releasing the lock can be reduced into one, after in-lining the append methods. The VM switch is: -XX:+EliminateLocks and it is on by default.
Benchmarking – measuring if it works
So, to practice what we preach, I created a lock-intensive LockTest benchmark to test these three VM options. The code is shown below. I first want to run it with all mentions options disabled, on my Vista laptop:
java -XX:-DoEscapeAnalysis -XX:-EliminateLocks -XX:-UseBiasedLocking LockTest
I get:
Unrecognized VM option '-DoEscapeAnalysis'
Could not create the Java virtual machine.
Hmmm, strange. I’m sure this is the correct option. After some digging, we found that it only works for the server VM, not the client VM which is default on 32 bit Windows. These VM’s are currently two separate binaries. After contacting my valuable performance team source at Sun, I learned that it is not enabled by default, unlike other locking optimizations. The other surprising thing he told me was that allocation optimization (using escape analysis) was not yet in the JDK. They are still working on it and expect it to be available in the spring 2008 JDK update. So this is disappointing, there has been a ‘little’ delay since the statement from Brian Goetz in 2005. He however also told me that escape analyses-lock elision actually is available from the latest JDK release (6_03). Let’s do the test. Here are my results:
>java -server -XX:-DoEscapeAnalysis -XX:-EliminateLocks -XX:-UseBiasedLocking LockTest
StringBuffer: 6553 ms.
StringBuilder: 1836 ms.
Thread safety overhead of StringBuffer: 256%
.</p>
<blockquote>
<p>java -server -XX:<strong>+DoEscapeAnalysis</strong> -XX:-EliminateLocks -XX:-UseBiasedLocking LockTest
StringBuffer: 6546 ms.
StringBuilder: 1872 ms.
Thread safety overhead of StringBuffer: 249%
.
java -server -XX:-DoEscapeAnalysis -XX:<strong>+EliminateLocks</strong> -XX:-UseBiasedLocking LockTest
StringBuffer: 3101 ms.
StringBuilder: 1836 ms.
Thread safety overhead of StringBuffer: 68%
.
java -server -XX:-DoEscapeAnalysis -XX:-EliminateLocks -XX:<strong>+UseBiasedLocking</strong> LockTest
StringBuffer: 2852 ms.
StringBuilder: 1855 ms.
Thread safety overhead of StringBuffer: 53%
.
java -server -XX:-DoEscapeAnalysis -XX:<strong>+EliminateLocks</strong> -XX:<strong>+UseBiasedLocking</strong> LockTest
StringBuffer: 2645 ms.
StringBuilder: 1823 ms.
Thread safety overhead of StringBuffer: 45%
Conclusions
So, we clearly see that for this obvious case, escape analysis optimizations did escape from Java 6. It is not available for the client VM, it is disabled by default and if you enable it for the server VM, it does not help significantly: thread safety overhead here stays at about 250%. Ideally, thread safety overhead should go down to 0% with escape analysis lock elision. However, we fortunately see that lock coarsening and biased locking do help a lot and bring the overhead down to about 50%.
We see again in this exercise that we should question assumptions, ask the right questions and actually measure to get evidence.
Hopefully the spring 2008 update of Java 6 will bring the proper escape analysis optimizations which have been promised for such a long time! We’ll keep you posted.
LockTest code:
public class LockTest { private static final int MAX = 20000000; // 20 million . public static void main(String[] args) throws InterruptedException { // warm up the method cache concatBuffer("Josh", "James", "Duke"); concatBuilder("Josh", "James", "Duke"); . System.gc(); Thread.sleep(1000); . long start = System.currentTimeMillis(); for (int i = 0; i < MAX; i++) { concatBuffer("Josh", "James", "Duke"); } long bufferCost = System.currentTimeMillis() - start; System.out.println("StringBuffer: " + bufferCost + " ms."); . System.gc(); Thread.sleep(1000); . start = System.currentTimeMillis(); for (int i = 0; i < MAX; i++) { concatBuilder("Josh", "James", "Duke"); } long builderCost = System.currentTimeMillis() - start; System.out.println("StringBuilder: " + builderCost + " ms."); System.out.println("Thread safety overhead of StringBuffer: " + ((bufferCost * 10000 / (builderCost * 100)) - 100) + "%\n"); } . public static String concatBuffer(String s1, String s2, String s3) { StringBuffer sb = new StringBuffer(); sb.append(s1); sb.append(s2); sb.append(s3); return sb.toString(); } . public static String concatBuilder(String s1, String s2, String s3) { StringBuilder sb = new StringBuilder(); sb.append(s1); sb.append(s2); sb.append(s3); return sb.toString(); } }