previousupnext

5.1 Intrinsic Efficiency

The term intrinsic efficiency refers to the speed with which an implementation performs a heap routine under conditions where all the memory pages accessed during the routine are already in RAM and there is no interference from other threads.

In this guide, intrinsic efficiency is estimated on the basis of instruction count. This is a crude metric where Pentium processors are concerned, but calculating clock cycles is impractical and anyway the results differ markedly between members of the Pentium family. Measurements for the Microsoft routines have been taken for one particular release of Windows NT and are sensitive to version.

For HeapAlloc, the fastest possible path through the native routine is 68 instructions, which is actually shorter then the LeapHeap fastest path of 91 (assuming a 4-tier cell bitmap). However if the native heap is using lookaside lists, its shortest path rises to 109 instructions. LeapAlloc is highly deterministic and the longest path through its code is 143 instructions (excluding cases where fresh memory has to be committed, where the HEAP_ZERO_MEMORY flag is set etc.). The native routine must under some circumstances process a list, and then instruction count depends on the size of the heap and is not bounded above.

For HeapFree the corresponding native heap figures are 58 instructions minimum without lookaside lists and 98 instructions minimum with. The LeapHeap analogue can complete in 49 instructions and takes at most 88. The HeapFree instruction count is not bounded above, though naturally the native heap is designed such that list processing does not occur in the majority of cases.

The preceding paragraphs apply to small allocations. When big blocks are allocated, both native heap and LeapHeap process lists, and the LeapHeap is at a disadvantage because its list consists of all large free blocks plus all large allocated blocks, whereas the NT heap list contains only free blocks. That difference would be expected to make the LeapHeap list perhaps three times the length of the native heap list. Compensating for this is the fact that the LeapHeap big block threshold is 4 KB whereas the native heap threshold is 1 KB.

It is worth mentioning the Intel LOCK prefix, applied to a handful of pivotal instructions in the multiprocessor implementations of both LeapHeap and native heap. A locked instruction forces the processor pipeline to empty and causes a delay equivalent to many, perhaps a hundred, ordinary instructions. If the heap is a private heap created unserialised, the native heap does not employ the prefix. For other heaps it applies LOCK once if it scores a hit on the lookaside list, thrice if it misses. In the compartment area LeapHeap is likely to complete with a single locked instruction, but under certain allocation patterns will regularly have to lock at each bitmap tier, say four times. The locked-instruction factor is less important for the big-block area, as this is negotiated less frequently and involves list processing anyway; the figures are available in section 4.2 Big Block Area Implementation.

Unless a large number of locked instructions are incurred, intrinsic efficiency is not a crucial contributor to overall performance. As long as routines are completed in constant time they are likely to be adequately fast; both LeapHeap and the native heap can turn out a million allocations per second on an entry-level desktop machine. It is easy to see that if heap manager A takes even twice as long as heap manager B to allocate and free, and if a heap-intensive application spends 20% of its time within the heap routines, there is only a 10% performance gain available by switching to heap manager B.

In consequence, the performance of LeapHeap and the NT native heap is not expected to differ noticeably owing solely to their intrinsic efficiency save under two circumstances: