Bull Dynamic Power Optimizer (BDPO)

BDPO is a lightweight daemon aiming to increase the energy-efficiency associated with the execution of an HPC application, while being agnostic of the latter. It is executed in parallel to the aforementioned HPC application, on the compute nodes of a supercomputer.

BDPO uses fine-grain monitoring (i.e. a sampling period between 10 and 100 milliseconds) of performance counters (e.g. number of retired instructions of the compute cores) to characterize dynamically the workload being executed by the compute cores. Then, when the execution of this workload is mostly composed of stalled cycles, for instance while waiting for data to be moved from main memory to the registers of the processor, BDPO enforces Dynamic Voltage and Frequency Scaling (DVFS), as illustrated in Figure 1. In a few words, it consists in adapting the frequency of the compute cores: when functioning at lower frequencies, they exhibit less computational power but consume less electrical power. Thus, when BDPO identifies a workload which can be executed with a lower frequency without drastically slowing down the application execution, it reduces the frequency of the compute cores. By doing so when possible, it increases the energy-efficiency associated with the execution of an HPC application: the ratio between the decrease of the energy consumption and the increase of the execution time is favourable towards the former, while keeping the latter under an acceptable threshold.

The EcoHMEM framework optimizes applications by minimizing CPU cycles caused by L3 cache misses. To this end, it consists of three main steps:

  1. Profiling Run: Using the Extrae and Paramedir tools to collect profiling data in the granularity of objects.
  2. Data Placement: The HMEM Advisor component makes use of the profiling data to decide an initial data distribution.
  3. Production Run: The application is launched while the Flexmalloc Allocator is preloaded into the execution environment. This allocator controls the allocation of dynamic objects, honoring the recommended distribution generated in Step-2.

 

 

In the context of the DEEP-SEA project, several new functions are being implemented, including three major ones:

  • A refinement of the core decision engine of BDPO;
  • Porting the approach implemented by BDPO for processors to GPUs;
  • Giving prediction capabilities to BDPO so as to estimate the impact of a frequency scaling before enforcing it.

Publication

M. Stoffel and A. Mazouz, „Improving Power Efficiency Through Fine-Grain Performance Monitoring in HPC Clusters,“ 2018 IEEE International Conference on Cluster Computing (CLUSTER), 2018, pp. 552-561