5.4 Contention

Contention, in this context, is the detrimental interaction between the threads of a process when competing for a shared resource like the heap. Programmatic methods such as critical regions are used to ensure the correctness of a multithreaded application. Performance losses ensue because a thread must sleep or spin when it needs a resource locked by another thread. Context switching, which is the halting of the running thread and the resumption of another, occurs every few milliseconds under Windows NT, and is a necessary concomitant of its multitasking ability. However the extra context switching provoked by contention can easily become uncontrolled and cause the machine to thrash.

There is a pernicious interaction between page faulting and contention. If a thread should trigger a page fault while inside a critical region, the vulnerable period will be increased a thousand-fold, and every other thread of the process is likely to end up queued on the critical region.

The Windows NT heap manager maintains a critical region around the code that manipulates free lists, so that when a thread is performing HeapAlloc or HeapFree, the whole heap is locked to other threads. The contention that this causes is considerably ameliorated (some versions) by the use of lookaside lists. Briefly touched on earlier, these are short linked lists which elements may be added to or removed from with a single bus-locking Pentium instruction. The lookaside lists work best when the allocations and frees for any particular sizeclass are intermingled; a run of frees without any intervening allocations will fill the lookaside list up and cause the global lock to be asserted.

Contention avoidance is the strong suit of LeapHeap. As is evident from sections 3 LeapHeap Algorithms and 4 The LeapHeap Implementation, LeapHeap routines never take a lock. Compartment area allocations do involve a window of vulnerability between setting a lower tier bitmap bit and updating the upper tiers to record that a whole word has been filled. During the vulnerable period the bitmap is inconsistent, and other allocating threads may be led to the full word and become contention victims. All the same, the chance of a page fault extending the vulnerable period is effectively zero, because the memory holding the upper bitmap tiers has just been accessed by the thread on its way down. Besides, LeapHeap contention victims consume extra cycles making their allocations, but they do not cause context switches.

On multiprocessor systems, the additional factor of cache contention comes into play. A write by a processor to a memory address causes any cache lines containing that address held by other processors to be invalidated. When a particular address, such as that of a lock variable, is frequently visited by all the processors, a performance-sapping oscillation known as cache ping-pong occurs. Under both native heap and LeapHeap, the segregation of small blocks by size ameliorates the problem considerably, for concurrent heap routines are likely to be dealing with different allocation sizes, which have different control structures. LeapHeap encourages segregation by thread as well; consequently writes to the lower tiers of a cascading bitmap (which are the only frequent writes) are generally to a different cache line for each thread. When dealing with big blocks, native heap threads contend for a single lock variable, whereas LeapHeap writes are distributed evenly across several kilobytes of descriptor memory.