April 17, 2014
Hot Topics:
RSS RSS feed Download our iPhone app

Understanding Automatic Garbage Collection

  • July 7, 2000
  • By David Reilly
  • Send Email »
  • More Articles »

"The automatic garbage collector of the JVM makes life much simpler for programmers by removing the need to explicitly de-allocate objects. "

One of Java's coolest features is automatic reclamation of memory space, a technique known as garbage collection.

Languages like C++ force developers to manually allocate and de-allocate memory space for objects, creating extra work for developers and allowing the possibility of leaks. Memory leaks, for those who haven't experienced them, occur when memory for an object is not de-allocated by programmers leading to the gradual depletion of memory resources. Java takes all the pain out of memory management by automatically reclaiming memory. Understanding how automatic garbage collection works is important, as programmers can influence when garbage collection occurs and which objects are destroyed. Without a clear comprehension of garbage collection, your software may not be running at peak performance and may be consuming more memory than is needed.

Memory Management with the Heap

To understand how garbage collection works, you need to know a little about how the Java Virtual Machine (JVM) handles memory allocation. All data, such as objects or arrays of primitive data types, is stored in the heap, a shared region of memory that all JVM threads have access to. When the JVM first starts, memory is allocated for the heap, and this memory may be contracted or expanded as required [1]. Whenever a new object is created, a portion of the heap is allocated for its storage.

Depending on the implementation, a JVM may give the user control over how much heap memory is allocated initially, through the use of command line parameters. Inevitably, however, memory will run short if objects are frequently allocated. Rather than forcing the programmer to decide which object's memory storage must be freed, and when, the JVM takes the choice away from us. There is no way to explicitly allocate or de-allocate memory, nor are there any pointers (direct memory references). This is in stark comparison to C++, which maintains the use of pointers as a hangover from the C language. Instead, the JVM offers automatic garbage collection.

Automatic Garbage Collection

You'll probably be wondering how the JVM knows when to junk an object, and when not to. Fear not — the JVM will never de-allocate memory for an object that you're using. The way it decides which objects are no longer needed (and hence are garbage) is simple:

Any object which is not referenced by an active thread may be safely de-allocated.

Let's consider that statement very carefully, for its meaning is crucial to understanding how garbage collection works. If an active thread does not reference an object (either directly, or indirectly through an object that references another object, and so on), then it is fair game for the garbage collector. For an object to be useful, we must have a reference to it so that we can access member variables or invoke methods. If that reference is destroyed, and no other object has a reference to it, the object immediately becomes lost and so we can never access it. Anything we can never access is no longer needed, and the garbage collector can safely reclaim its memory. Note too, that even if the current thread (for example, the primary thread running the

main
method of an application) does not have a reference to an object, other threads may have such a reference and therefore it is not safe to destroy the object. Even if a thread becomes inactive, as long as a reference to it is held by an active thread, the objects that it references will remain safe.

This logic is fairly simple to follow and sounds like common sense. Unfortunately, a lot of myths and misconceptions surround automatic garbage collection. As soon as you assign the

null 
value to a reference, if no other thread contains a direct reference to an object or an indirect reference (a link to a link, and so on), then it is fair game for the garbage collector. For example, if I wanted to clear an array of objects, I'd simply assign a
null
value to all of the references to the array (remembering that if I miss one, the array's contents will linger and keep hold of valuable memory real-estate).


// Some array of objects, arr
arr = null;

// Now array can be reclaimed when needed by the gc

Let's examine some common misconceptions to help illustrate this concept.

Misconception Number One: A Reference to Oneself

One big misconception is that an object that maintains a reference to itself will not protect itself from being junked by the garbage collector. Some programmers place a reference to the object as a member variable, hoping that it will count. Let me assure you that it does not. Remember that an object already has an implicit reference to itself via the

this
keyword. Unless an object is referenced by other objects, which are in turn referenced, it does not satisfy the rule.

Figure 1
Figure 1. Member variable reference to an object doesn't protect against memory reclamation.

Misconception Number Two: Cyclical References

The second biggest misconception concerns cyclical references, whereby one object links to another, which links back again. Sometimes these cyclical relationships may be referenced by other objects, and thus protected, which reinforces the misconception. Take the object references shown in Figure 2, for example. If there are no other references to Obj1, Obj2, or Obj3, then it will be reclaimed. The same is true for data structures like vectors, hash tables, and collections — if you don't maintain a reference to them, their contents are gone. This is a handy tip to remember. I've seen people manually assign all references in a collection to

null
, rather than just making the reference to the collection itself
null
. Aside from easier code, it is less computationally expensive as well.

Figure 2
Figure 2. Cyclical references are no protection.

Garbage Collection Timing

You should be aware that objects are not immediately destroyed once they are no longer referenced — the thread responsible for garbage collection can reclaim memory at any time. This time, of course, may not always be opportune — a time sensitive calculation or GUI update might be trying to complete when the garbage collector kicks in. The garbage collector thread normally runs as a very low process thread, but once started, it might not be suspended until its task is complete. At the wrong time, garbage collection could introduce a performance degradation.

Maximizing Performance Through Garbage Collection

There are two ways to improve system performance through garbage collection. The first is to explicitly invoke the garbage collector during idle periods, or before creating a large number of objects. The

java.lang.System
class provides a method,
gc()
, that requests the garbage collection thread start reclaiming back memory. The exact delay between invoking and returning from the method will vary from JVM to JVM, and depending on the amount of objects to be freed. For example, to invoke the garbage collector the following code snippet is used:


// Invoke the garbage collector (may cause a delay)
	System.gc();

The second way to maximize performance is to always

null
out references to unwanted objects at the earliest moment possible. This will allow the garbage collector to free memory rather than expanding the size of the heap, which can be a particularly time-consuming task on slow systems that use virtual memory. Remember too, that all references need to be removed — if a single reference exists, the memory will remain locked.

Summary

The automatic garbage collector of the JVM makes life much simpler for programmers by removing the need to explicitly de-allocate objects. However, this does not let us off the hook entirely — we must still remember to

null
any unwanted objects, to allow the garbage collector to do its work.

References

[1] The Java Virtual Machine Specification (Second Edition), Lindholm & Yellin, Addison-Wesley, 1999.

About the Author

David Reilly is a software engineer and freelance technical writer living in Australia. A Sun Certified Java 1.1 Programmer, his research interests include the Java programming language, networking & distributed systems, and software agents. He can be reached via e-mail at java@davidreilly.com or through his personal site.







Comment and Contribute

 


(Maximum characters: 1200). You have characters left.

 

 


Sitemap | Contact Us

Rocket Fuel