Don’t we all remember the days when we programmed C or C++? You had to use new and delete to explicitly create and remove objects. Sometimes you even had to malloc() an amount of memory. With all these constructs you had to take special care that you cleaned up afterwards, else you were leaking memory.
Now however, in the days of Java, most people aren’t that concerned with memory leaks anymore. The common line of thought is that the Java Garbage Collector will take care of cleaning up behind you. This is of course totally true in all normal cases. But sometimes, the Garbage Collector can’t clean up, because you still have a reference, even though you didn’t know that.
I stumbled across this small program while reading JavaPedia, which clearly shows that Java is also capable of inadvertent memory leaks.
public class TestGC { private String large = new String(new char[100000]); public String getSubString() { return this.large.substring(0,2); } public static void main(String[] args) { ArrayList subStrings = new ArrayList(); for (int i = 0; i < 1000000; i++) { TestGC testGC = new TestGC(); subStrings.add(testGC.getSubString()); } } }
Now, if you run this, you’ll see that it crashes with something like the following stacktrace:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.lang.String.(String.java:174)
at TestGC.(TestGC.java:4)
at TestGC.main(TestGC.java:13)
Why does this happen? We should only be storing 1,000,000 Strings of length 2 right? That would amount to about 40Mb, which should fit in the PermGen space easily. So what happened here? Let’s have a look at the substring method in the String class.
public class String { // Package private constructor which shares value array for speed. String(int offset, int count, char value[]) { this.value = value; this.offset = offset; this.count = count; } public String substring(int beginIndex, int endIndex) { if (beginIndex count) { throw new StringIndexOutOfBoundsException(endIndex); } if (beginIndex > endIndex) { throw new StringIndexOutOfBoundsException(endIndex - beginIndex); } return ((beginIndex == 0) && (endIndex == count)) ? this : new String(offset + beginIndex, endIndex - beginIndex, value); }
We see that the substring call creates a new String using the given package protected constructor. And the one liner comment immediately shows what the problem is. The character array is shared with the large string. So instead of storing very small substrings, we were storing the large string every time, but with a different offset and length.
This problem extends to other operations, like String.split() and <em?java.util.regex.Matcher.group(). The problem can be easily avoided by adapting the program as follows:
public class TestGC { private String large = new String(new char[100000]); public String getSubString() { return new String(this.large.substring(0,2)); // <-- fixes leak! } public static void main(String[] args) { ArrayList subStrings = new ArrayList(); for (int i = 0; i < 1000000; i++) { TestGC testGC = new TestGC(); subStrings.add(testGC.getSubString()); } } }
I have many times heard, and also shared this opinion that the String copy constructor is useless and causes problems with not interning Strings. But in this case, it seems to have a right of existence, as it effectively trims the character array, and keeps us from keeping a reference to the very large String.