Benchmarking and Tools
Benchmarking is an essential element in evaluating the success of a hardware prototyping project. In the DEEP projects we use the JUBE benchmarking environment to assess the performance of the DEEP system.
Benchmarking a computer system usually involves numerous tasks, involving several runs of different applications. Configuring, compiling, and running a benchmark suite on several platforms with the accompanied tasks of result verification and analysis needs a lot of administrative work and produces a lot of data, which has to be analysed and collected in a central database. Without a benchmarking environment all these steps have to be performed by hand.
For each benchmark application the benchmark data is written out in a certain format that enables the benchmarker to deduct the desired information. This data can be parsed by automatic pre- and post-processing scripts that draw information, and store it more densely for manual interpretation.
In the DEEP projects we use the JUBE benchmarking environment that is actively developed by the Jülich Supercomputing Centre.
JUBE provides a script based framework to easily create benchmark sets, run those sets on different computer systems and evaluate the results. Within the DEEP projects, the main focus lies on collecting benchmarking tests to compare
- The different I/O approaches used in DEEP projects.
- The I/O performance in DEEP projects with respect to production systems.
For this purpose, all DEEP co-design applications are integrated into the JUBE benchmarking environment.
The version used in DEEP projects is written in Python and is currently installed on the DEEP Cluster as well as on the DEEP-ER SDV.
Extra-P is an automatic performance-modelling tool that supports the user in the identification of scalability bugs.
Hybrid memory systems are an emerging trend to provide larger RAM sizes at reasonable cost and energy consumption.
Bull Dynamic Power Optimizer (BDPO)
BDPO is a lightweight daemon aiming to increase the energy-efficiency associated with the execution of an HPC application, while being agnostic of the latter.
Automating benchmarks is important to guarantee reproducibility and comparability which is the major intent when performing benchmarks
PIM Simulations: ZSim and DRAMSim3
One of the objectives of the DEEP-SEA project is to explore various architectural proposals for processing in memory (PIM).
PARCOACH is a framework that aims at helping users programming MPI codes. It proposes an advanced aid for detecting errors when using MPI collective communications, non-blocking communications (i.e., MPI_Isend/Irecv), and correct usage of MPI routines in programs using both MPI and threads.
The MUlti-level Simulation Approach (MUSA) is a simulation methodology that employs different tools and abstraction levels to reduce simulation overhead but maintain high precision.
Mitos is a tool that enables obtaining detailed information about samples of memory operations which an application trigger.
MemAxes is a tool for interactive analysis and visualisation of data movements within a node.
The Extrae measurement infrastructure is an easy-to-use tool for event tracing and online analysis.
The PROFiling-based EsTimation of performance and energy (PROFET) tool profiles memory system performance, quantifies the application pressure to the memory system and estimates application performance on hardware platforms with novel memory systems.
Paraver is a performance analyser based on event traces with a great flexibility to explore the collected data, supporting a detailed analysis of the variability and distribution of multiple metrics with the objective of understanding the application’s behaviour.
Scalasca Trace Tools
The Scalasca Trace Tools are a collection of trace-based performance analysis tools built on top of the Score-P instrumentation and measurement infrastructure.
Score-P is a community-maintained instrumentation and measurement infrastructure to collect performance data from HPC applications.
LinkTest is a communication benchmark that tests point-to-point connections designed to scale up to a large number of processes (it was validated using up to 1 800 000 MPI ranks).