What is garbage collection?
Garbage collection allows for automatic memory management. Instead of forcing the programmer to explicitly deallocate a dynamic object (for instance, using free in C or delete in C++), a "dead" object is automatically reclaimed when the region of memory it lives in is collected.
Why garbage collect?
As stated in, Pretenuring for Java, "Garbage collection removes two sources of programming error:
How does a garbage collector work?
Currently the most effective garbage collection algorithm is the generational copying collector. A generational collector groups objects by age into different regions of the heap. Age is determined by the "time elapsed" since allocation (time elapsed refers to the number of bytes of the heap allocated since an object was created). The youngest generation, called the nursery, is collected most often (following the weak generational hypothesis that "most objects die young"). Younger generations can be collected independently of older generations. As a generation is collected, the surviving objects are promoted to the next oldest generation and the "dead" objects are reclaimed (to be allocated at a later time).
Are there any problems with this collection scheme?
Having the ability to collect younger generations independently of older ones is an important benefit of generational copying collection. System pause times due to garbage collection are reduced (because the entire heap no longer has to be collected everytime) and the garbage collection itself becomes more efficient (because generations where dead objects are most likely to be found can be collected more often).
However, one important detail was left out of the answer to the above
question...the overhead involved with collecting generations
independently.
In order to collect younger generations independently,
inter-generational pointers (pointers whose source is in a different
generation than its target) must be checked. A write barrier is
implemented between generations to determine whether a pointer store
needs to be remembered, and if so, remembers it. Only old-young pointers
must be remembered - though all pointers are tested, causing a large
amount of unnecessary overhead.
Researching GC at UT...
What was my project for this summer?
Ultimately, Kathryn wanted Maria and I to optimize the Beltway collector. But in order to do that, we first had to test the optimization in an Appel generational copying collector.
What is Beltway?
Beltway is a generalized copying garbage collector. This collector is made up of belts and increments. The increments are independently collectible regions of memory. Belts are made up of increments. The youngest belt is called the nursery.
Beltway exploits the following five ideas:
What is Appel?
Appel is a generational copying collector, as the name suggests (see above for an explanation of how it works). Appel has only two generations, the nursery and the older generation.
What optimization did you make?
As explained above, the write barrier incurrs a lot of overhead. One of reasons Kathryn and Steve found to explain the high cost was because write barriers must check pointers to TIBs. TIB stands for type information block and holds information about the class. The header of every object allocated in a program, contains a pointer to its TIB.
A TIB is created when its class is loaded and is allocated into the nursery, like all objects. A TIB lives for the duration of the program. Since TIBs are always older than instances of the class, pointers to TIBs are young-old pointers and do not need to be remembered. However, they are still checked by the write barrier, creating the overhead.
Our optimization created an immortal space, in which TIBs could be
allocated at creation and kept there for the rest of the program's life.
The immortal space will never be collected. We also turned off the
TIB write barrier; thus pointers to TIBs are no longer checked by the
write barrier.
[1] Pretenuring for Java , Steve M. Blackburn, Sharad Singhai, Matthew Hertz, K. S. McKinley, and
J. Eliot B. Moss, OOPSLA 2001: ACM Conference on Object-Oriented Programming, Systems,
Languages and Applications, Tampa, FL, USA, Oct. 14 - 18, 2001.
[2] Beltway: Getting Around Garbage Collection Gridlock, Steve M. Blackburn, Richard Jones, K. S. McKinley, and J. Eliot B. Moss, ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), June 17-19, 2002, Berlin, Germany.