Selecting Microarchitecture Configuration of Processors for Internet of Things
The Internet of Things (IoT) makes use of ubiquitous internet connectivity to form a network of everyday physical objects for purposes of automation, remote data sensing and centralized management/control. IoT objects need to be embedded with processing capabilities to fulfill these services. The design of processing units for IoT objects is constrained by various stringent requirements, such as performance, power, thermal dissipation etc. In order to meet these diverse requirements, a multitude of processor design parameters need to be tuned accordingly. In this paper, we propose a temporally efficient design space exploration methodology which determines power and performance optimized microarchitecture configurations. We also discuss the possible combinations of these microarchitecture configurations to form an effective two-tiered heterogeneous processor for IoT applications. We evaluate our design space exploration methodology using a cycle-accurate simulator (ESESC) and a standard set of PARSEC and SPLASH2 benchmarks. The results show that our methodology determines microarchitecture configurations which are within 2.23%-3.69% of the configurations obtained from fully exhaustive exploration while only exploring 3%-5% of the design space. Our methodology achieves on average 24.16x speedup in design space exploration as compared to fully exhaustive exploration in finding power and performance optimized microarchitecture configurations for processors.
💡 Research Summary
**
The paper addresses the challenge of configuring micro‑architectural parameters for embedded processors that will be used in Internet‑of‑Things (IoT) devices. IoT objects must balance stringent, often conflicting requirements such as low power consumption, high performance, and limited thermal budget, while also needing to support a wide range of workloads (sensor data acquisition, local filtering, security processing, etc.). Traditional exhaustive design‑space exploration (DSE) quickly becomes infeasible because the number of possible configurations (core counts, pipeline depths, cache sizes, voltage/frequency settings, etc.) can reach billions, leading to prohibitive simulation time and delayed time‑to‑market.
To overcome this, the authors propose a “Temporally Efficient Design Space Exploration” (TE‑DSE) methodology that reduces the explored portion of the design space to only 3‑5 % yet still yields configurations within 2.23‑3.69 % of the true optimum obtained by full exhaustive search. The methodology consists of five sequential phases:
-
One‑Shot Search & Parameter Significance Ordering – A small set of representative configurations is generated from the full parameter space and evaluated using the cycle‑accurate simulator ESESC. From the resulting power and performance numbers, a sensitivity analysis ranks each parameter by its impact on the objectives. This quickly identifies the most influential knobs.
-
Set Partitioning – Based on the importance ranking, parameters are split into two groups: a “exhaustive‑search set” whose combined configuration count is below a designer‑specified threshold, and a “greedy‑search set” that would otherwise explode the space. The threshold allows the designer to trade exploration time against solution quality.
-
Exhaustive Search on the Small Set – All possible combinations of the exhaustive‑search set are simulated. A weighted multi‑objective function (e.g., w₁·Power + w₂·Latency) is used to generate a Pareto front of candidate configurations.
-
Greedy Search on the Remaining Parameters – Starting from the best candidate found in phase 3, the parameters in the greedy set are tuned one‑by‑one. At each step the algorithm keeps a change only if it improves the objective; otherwise the parameter is frozen. This phase adds fine‑grained refinement without incurring exponential cost.
-
Integration & Final Selection – The best configurations from phases 3 and 4 are compared, and the final micro‑architecture is selected according to the application‑specific weighting of power versus performance.
The authors evaluate TE‑DSE on a set of PARSEC and SPLASH‑2 benchmarks covering a spectrum of compute‑intensity and memory‑intensity workloads. Using the ESESC simulator, they demonstrate that the proposed method achieves an average speed‑up of 24.16× over full exhaustive search while staying within a few percent of the optimal power‑performance trade‑off. Moreover, the methodology is flexible: by adjusting the exploration threshold, designers can control the runtime to meet market‑driven time‑to‑product constraints.
Beyond the DSE technique, the paper also proposes a two‑tier heterogeneous processor architecture for IoT devices. The “host” processor is a high‑performance core that remains in sleep mode most of the time and is awakened for heavyweight tasks such as data filtering, analytics, or cryptographic operations. The “interface” processors are low‑power cores dedicated to continuous sensor/actuator interfacing and lightweight control. This architectural split further reduces average power consumption while still providing the necessary performance when required.
Related work is surveyed, highlighting prior efforts from ARM, Synopsys, and academic groups that use exhaustive, genetic, or clustering‑based DSE. The authors argue that their approach improves on these by combining a fast sensitivity‑driven partitioning with a limited exhaustive search, thereby achieving both speed and accuracy.
Limitations noted include reliance on simulation results (real silicon may exhibit process variation, layout effects, and temperature‑dependent behavior not captured by ESESC) and the fact that only a single set of benchmark workloads was used for validation. Future directions suggested are hardware prototyping, incorporation of machine‑learning models to predict promising regions of the design space, and extension to more complex heterogeneous systems that integrate GPUs or FPGAs.
In summary, the paper delivers a practical, scalable DSE framework that dramatically cuts exploration time while preserving near‑optimal power‑performance configurations for IoT processors. Its combination of sensitivity analysis, threshold‑driven set partitioning, limited exhaustive search, and greedy refinement offers a template that can be adapted to other domains where design‑space size and time‑to‑market pressures are critical.
Comments & Academic Discussion
Loading comments...
Leave a Comment