Power-Performance Trade-Offs in Nanometer-Scale Multi-Level Caches Considering Total Leakage

Reading time: 7 minute
...

📝 Original Info

  • Title: Power-Performance Trade-Offs in Nanometer-Scale Multi-Level Caches Considering Total Leakage
  • ArXiv ID: 0710.4794
  • Date: 2007-10-25
  • Authors: Robert Bai, Nam-Sung Kim, Tae Ho Kgil, Dennis Sylvester, Trevor Mudge

📝 Abstract

In this paper, we investigate the impact of T_{ox} and Vth on power performance trade-offs for on-chip caches. We start by examining the optimization of the various components of a single level cache and then extend this to two level cache systems. In addition to leakage, our studies also account for the dynamic power expanded as a result of cache misses. Our results show that one can often reduce overall power by increasing the size of the L2 cache if we only allow one pair of Vth/T_{ox} in L2. However, if we allow the memory cells and the peripherals to have their own Vth's and T_{ox}'s, we show that a two-level cache system with smaller L2's will yield less total leakage. We further show that two Vth's and two T_{ox}'s are sufficient to get close to an optimal solution, and that Vth is generally a better design knob than T_{ox} for leakage optimization, thus it is better to restrict the number of T_{ox}'s rather than Vth's if cost is a concern.

💡 Deep Analysis

Deep Dive into Power-Performance Trade-Offs in Nanometer-Scale Multi-Level Caches Considering Total Leakage.

In this paper, we investigate the impact of T_{ox} and Vth on power performance trade-offs for on-chip caches. We start by examining the optimization of the various components of a single level cache and then extend this to two level cache systems. In addition to leakage, our studies also account for the dynamic power expanded as a result of cache misses. Our results show that one can often reduce overall power by increasing the size of the L2 cache if we only allow one pair of Vth/T_{ox} in L2. However, if we allow the memory cells and the peripherals to have their own Vth’s and T_{ox}’s, we show that a two-level cache system with smaller L2’s will yield less total leakage. We further show that two Vth’s and two T_{ox}’s are sufficient to get close to an optimal solution, and that Vth is generally a better design knob than T_{ox} for leakage optimization, thus it is better to restrict the number of T_{ox}’s rather than Vth’s if cost is a concern.

📄 Full Content

Leakage power is a problem for all microprocessor circuit components, but it is a particularly important problem in processor on-chip caches where a large number of potentially high-leakage cross-coupled inverters -the storage elements of caches -are integrated in great numbers. We can expect the fraction of the leakage power to exceed that of the dynamic power in future processor generations. There have been several previous studies on cache leakage power reduction [1][2][3][4][5][6][7]; all of them focused on subthreshold leakage power. However, with aggressive T ox scaling, gate leakage power can potentially surpass the subthreshold leakage at low T ox . In this paper, we investigate various techniques to minimize total (gate + subthreshold) leakage power plus dynamic power under delay constraints by systematically assigning values for T ox and V th for single cache, two-level caching system and an entire microprocessor memory system consisting of L1, L2 cache and main memory.

For our experiment, we have used the technology files from Berkeley Predictive Technology Model (BPTM) for a 65nm technology node [8]. We then characterize the technology files for a range of V th and T ox values. We let V th vary from 0.2V to 0.5V, while allowing T ox to scale from 10Å to 14Å. The lower limits of these ranges are chosen to reflect typical values of highperformance logic for the studied technology node. Such transistors would be required for the non-memory portion of a processor or system. While there is no physical reason for a V th upper bound, we expect that values above 0.5V are unlikely in 65nm technology with approximately 1V supply. The increase of T ox while maintaining the same drawn channel length may cause the gate terminal to lose control of the conduction state of the channel due to DIBL effect [9]. Hence, when T ox changes, the drawn channel length must be scaled appropriately. Also in order to maintain memory cell stability, the widths of the transistors in the memory cell need to be adjusted proportionately with the change in the drawn channel lengths. Thus the impact of T ox scaling on the cell area must be taken into account, as the cell will grow in both horizontal and vertical dimensions.

First we have re-designed the cache netlists used in [7] to target for 65nm technology node. We assume that internally, the cache consists of four components: memory cell array and sense amplifier, decoder, address bus drivers, and data bus drivers. Second, it is observed through extensive HSPICE simulation that the total leakage current of memory cell array is exponentially dependent on T ox and V th . We then approximate the total leakage power as follows:

( )

On the other hand, the delay of the array is shown to be linear with T ox and over the range of our interest its dependence on V th can be approximated to an exponential growth function with very small exponents as follows:

3 )

Although these total leakage and delay trends are for the memory cell array, the same trends also hold for the rest of cache memory components -decoders and address/data bus drivers. Therefore, we can model the total leakage and delay of each component in the same way as we do for the memory cell array assuming that both total leakage and delay of each component are independent from one another. Thus we can approximate both the total leakage and the delay of a cache system by summing up the leakage and delay of each cache component.

To examine the dependence of leakage power on V th and T ox assignment, we study three different V th /T ox assignment schemes:

Scheme I: assign independent V th ’s and T ox ’s to each cache component.

Scheme II: assign a V th /T ox pair to the memory cell array and another pair to the remaining three cache components. Scheme III: assign the same V th /T ox pair to all four cache components.

We formulate the problem of minimizing the leakage power given the delay constraint as the following optimization problem [10]:

In our optimization process, we have chosen V th and T ox to take on discrete values with small step size. The optimization shows scheme III is the worst performer, and scheme I is the best. However, scheme II is only slightly behind scheme I for the same delay constraint, but from a process standpoint, scheme I is more costly than scheme II. Therefore, it is the preferred scheme, as it is not only economically feasible but also achieves close to optimal leakage. It is worth noting that in schemes I and II, high values of V th and thick T ox ’s are always assigned to the memory cell arrays, and V th /T ox in the peripheral components have been set sufficiently low to help meet the delay target. To gain further insight into the selection of the decision variables during the optimization process, we perform an experiment in which for a 16KB cache we hold either V th or T ox constant, and at the same time observe how leakage power is impacted by the other decision variable indepen

…(Full text truncated)…

📸 Image Gallery

cover.png

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut