Blog

Leaking Memory in Java

04 Oct, 2007

Don’t we all remember the days when we programmed C or C++? You had to use new and delete to explicitly create and remove objects. Sometimes you even had to malloc() an amount of memory. With all these constructs you had to take special care that you cleaned up afterwards, else you were leaking memory.

Now however, in the days of Java, most people aren’t that concerned with memory leaks anymore. The common line of thought is that the Java Garbage Collector will take care of cleaning up behind you. This is of course totally true in all normal cases. But sometimes, the Garbage Collector can’t clean up, because you still have a reference, even though you didn’t know that.
I stumbled across this small program while reading JavaPedia, which clearly shows that Java is also capable of inadvertent memory leaks.

public class TestGC {
  private String large = new String(new char[100000]);
  public String getSubString() {
    return this.large.substring(0,2);
  }
  public static void main(String[] args) {
    ArrayList subStrings = new ArrayList();
    for (int i = 0; i < 1000000; i++) {
      TestGC testGC = new TestGC();
      subStrings.add(testGC.getSubString());
    }
  }
}

Now, if you run this, you’ll see that it crashes with something like the following stacktrace:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.lang.String.(String.java:174)
at TestGC.(TestGC.java:4)
at TestGC.main(TestGC.java:13)

Why does this happen? We should only be storing 1,000,000 Strings of length 2 right? That would amount to about 40Mb, which should fit in the PermGen space easily. So what happened here? Let’s have a look at the substring method in the String class.

public class String {
  // Package private constructor which shares value array for speed.
  String(int offset, int count, char value[]) {
    this.value = value;
    this.offset = offset;
    this.count = count;
  }
  public String substring(int beginIndex, int endIndex) {
    if (beginIndex  count) {
      throw new StringIndexOutOfBoundsException(endIndex);
    }
    if (beginIndex > endIndex) {
      throw new StringIndexOutOfBoundsException(endIndex - beginIndex);
    }
    return ((beginIndex == 0) && (endIndex == count)) ? this :
      new String(offset + beginIndex, endIndex - beginIndex, value);
  }

We see that the substring call creates a new String using the given package protected constructor. And the one liner comment immediately shows what the problem is. The character array is shared with the large string. So instead of storing very small substrings, we were storing the large string every time, but with a different offset and length.
This problem extends to other operations, like String.split() and <em?java.util.regex.Matcher.group(). The problem can be easily avoided by adapting the program as follows:

public class TestGC {
  private String large = new String(new char[100000]);
  public String getSubString() {
    return new String(this.large.substring(0,2)); // <-- fixes leak!
  }
  public static void main(String[] args) {
    ArrayList subStrings = new ArrayList();
    for (int i = 0; i < 1000000; i++) {
      TestGC testGC = new TestGC();
      subStrings.add(testGC.getSubString());
    }
  }
}

I have many times heard, and also shared this opinion that the String copy constructor is useless and causes problems with not interning Strings. But in this case, it seems to have a right of existence, as it effectively trims the character array, and keeps us from keeping a reference to the very large String.

guest
12 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
trackback

[…] Devlib wrote an interesting post today!.Here’s a quick excerptNow however, in the days of Java, most people aren’t that concerned with memory leaks anymore. The common line of thought is that the Java Garbage Collector will take care of cleaning up behind you. This is of course totally true in all … […]

Sherif Mansour
14 years ago

Hi There,
Thanks for the insightful article! I found this quite useful – especially in understanding why Java OutOfMemory’s work…
Sherif

Jos Hirth
14 years ago

Well, that’s not a memory leak. See:
http://en.wikipedia.org/wiki/Memory_leak
The behavior is intentional – it trades memory for performance. As most things in the standard library (eg collections) it’s optimized for general usage and, well, generally it’s alright. But you certainly shouldn’t tokenize a really big string this way.
The classic type of memory leaks doesn’t exist in managed languages. The only thing we can produce are so called reference leaks. That is… referencing stuff (and thus preventing em from being GCed) for longer as necessary (or for all eternity).
Fortunately it’s easy to avoid – for the most part.
The important things to know:
Locally defined objects can be GCed as soon as there are no more no more references to it. Typically it’s the end of the block they are defined in (if you don’t store the reference anywhere). If you do store references, be sure to remove em if you don’t need em anymore.
If you overwrite a reference with a new object, the object is first created and /then/ the reference is overwritten, which means the object can be only GCed /after/ the new object has been created.
Usually this doesn’t matter. However, if you want to overwrite an object which is so big that it only fits once into the memory, you’ll need to null the reference before creating/assigning the new instance.
Eg:
//FatObject fits only once into memory
FatObject fatty;
fatty=new FatObject();
fatty=new FatObject();
Will bomb with OOME. Whereas…
FatObject fatty;
fatty=new FatObject();
fatty=null;
fatty=new FatObject();
Will be fine, because the second creation of the FatObject will trigger a full GC and the GC will be able to clear enough memory (since the old reference has been nulled).
Well, that rarely matters, but it’s good to know.

trackback

[…] Jos Hirth wrote this in response to this post by Jeroen van Erp. […]

trackback

[…] Xebia Blog Leaking Memory in Java (tags: java memoryleak programming jvm) […]

James McInosh
James McInosh
14 years ago

I don’t know which version of the JVM you are sunning but when it constructs a new string using this constructor:
String(char value[], int offset, int count)
It sets the value using this:
this.value = Arrays.copyOfRange(value, offset, offset+count);

creyle
creyle
14 years ago

To be more obvious, with the underlying big char array being referenced, all the TestGC objects created in the big for-loop could not be GCed. that’s the problem.
Thanks

Ryan
Ryan
14 years ago

This is not a memory leak. As Jos Hirth said, this is trading memory for speed.
As soon as you remove the substrings from the ArrayList all the memory that has been allocated for it will be freed. No memory leak there.

Chris
Chris
13 years ago

I’m leaving this for those who google to find…
To those saying its not a memory leak, you are being very strict with the term. I, and others I know, have spent many man months of effort tracking down leaks due to this “general” use case.
Well, unfortunately, its not very general. The problem is that any substring used (or split string used) will keep the whole block. When splitting up large JMS text messages (for example) this will leave the entire message in memory, for an unspecified time.
It is a real problem that for a general algorithm you will get a leak like effect but not be warned of it in the javadocs.
User Beware.

trackback

[…] from here and […]

Pedro
Pedro
6 years ago

Explore related posts