Thesis title: Workload-aware performance optimization under power constraints for parallel computing systems
Power consumption and heat dissipation are major concerns in the design of current and future computing systems. To cope with this issues, hardware manufacturers introduced different power management mechanisms which can provide a reduction in the system power consumption in operational contexts, often at the expense of application performance. As the portions of the system which can be turned on at any time is expected to decrease over the following years, a proper allocation of the system power budget through a combined use of the available power management mechanisms is essential to match the ever-increasing performance requirements of modern applications. Unfortunately, selecting an optimal power budget allocation can be particularly complex as the trade-offs of performance and power consumption provided by the different power management mechanisms can change based on the characteristics of the workload and the specific hardware.
Within this complex scenario, in this thesis we study the problem of maximizing the performance of parallel applications running on power constrained systems. A distinctive feature of our research is related to how different workloads might behave differently---both in terms of power consumption and performance---to alternative power management settings, with a particular focus on parallel scalability and synchronization.
First, we present both an exploration-based and a model-based approach that maximizes the performance of multi-threaded applications executed on power constrained system by dynamically selecting the optimal configuration of CPU frequency/voltage and the number of threads executed in parallel.
Then, we explore the complications and the opportunities associated to the application of power management techniques in Parallel Discrete Event Simulation run-time environments based on the Time Warp synchronization protocol. This class of applications is particularly interesting as, on the one hand, they rely on a complex synchronization scheme that limits the applicability of traditional power management technique, and on the other hand, their speculative approach provides compelling opportunities for power efficiency improvements.
As a final contribution, we propose an alternative software design for parallel applications based on the concept of asymmetric threads, which we believe has the prospect of unlocking further opportunities for efficiency improvements in modern and future systems. Modern applications---and in particular those based on run-time environments---generally perform multiple different tasks, which are executed in an interleaved fashion by the same threads on the same cores. However, each task might be associated to a different workload, which may behave in peculiar ways to alternative power management settings. In this context, the general idea of our proposed approach is to divide the threads of an application in different classes, where each class performs tasks associated to a similar workload and is scheduled on the same set of cores. This reorganization allows to apply differentiated power management settings to each different task of the application, which can result in a more effective and finer-grained allocation of the overall power budget.