Improving Instruction Fetch Rate with Code Pattern Cache for Superscalar Architecture
Beg, Azam Muhammad
Younan, Nicolas H.
Bridges, Susan M.
In the past, instruction fetch speeds have been improved by using cache schemes that capture the actual program flow. In this proposal, we present the architecture of a new instruction cache named code pattern cache (CPC); the cache is used with superscalar processors. CPC?s operation is based on the fundamental principles that: common programs tend to repeat their execution patterns; and efficient storage of a program flow can enhance the performance of an instruction fetch mechanism. CPC saves basic blocks (sets of instructions separated by control instructions) and their boundary addresses while the code is running. Basic blocks and their addresses are stored in two separate structures, called block pointer cache (BPC) and basic block cache (BBC), respectively. Later, if the same basic block sequence is expected to execute, it is fetched from CPC, instead of the instruction cache; this mechanism results in higher likelihood of delivering a larger number of instructions in every clock cycle. We developed single and multi-threaded simulators for TC, BC, and CPC, and used them with 10 SPECint2000 benchmarks. The simulation results demonstrated CPC?s advantage over TC and BC, in terms of trace miss rate and average trace length. Additionally, we used cache models to quantify the timing, area, and power for the three cache schemes. Using an aggregate performance index that combined the simulation and modeling results, CPC was shown to perform better than both TC and BC. During our research, each of the TC-, BC-, or CPC- configurations took 4-6 hours to simulate, so performance comparison of these caches proved to be a very time-consuming process. Neural network models (NNM?s) can be time-efficient alternatives to simulations, so we studied their feasibility to represent the cache behavior. We developed two NNM's, one to predict the trace miss rate and the other to predict the average trace length for the three caches. The NNM's modeled the caches with reasonable accuracy, and produced results in a fraction of a second.