Score-P is a community-maintained instrumentation and measurement infrastructure to collect performance data from HPC applications. It is available as open-source under the 3-clause BSD licence. Score-P is easy to use, highly scalable, and able to generate both summarised call-path profiles and detailed event traces.
By using open data formats – CUBE4 for profile data and the Open Trace Format 2 (OTF2) for event traces – Score-P provides the foundation for a number of well-established performance analysis tools. In particular, Score-P’s event traces can be manually examined using the Vampir and Ravel trace visualisers, or automatically analysed using the Scalasca Trace Tools. Likewise, the generated call-path profiles can be explored using the Cube performance report explorer as well as TAU ParaProf and PerfExplorer, including cross-experiment analyses using a performance database. In addition, they serve as input for generating empirical performance models with Extra-P.
To capture details of the application execution, Score-P mainly relies on instrumentation, i.e., the insertion of “hooks” into the application code that call into the Score-P run-time libraries at important points during the execution. This instrumentation can be added to the application executables in various ways, e. g., by leveraging compilers to gather information about function entries and exits, using standardised tools interfaces such as PMPI, the OpenMP, OpenCL, and OpenACC tools interfaces, as well as the CUDA Profiling Tools Interface (CUPTI) or even source-to-source translation. Score-P also provides an instrumentation API for manually annotating the source code in case the automatically added instrumentation is not adequate. Depending on the configured measurement mode, the gathered data is then either summarised in a call-path profile or stored in a memory buffer accumulating the event trace. Finally, the measured data is flushed to disk for further analysis with the aforementioned tools.
Goals within DEEP-SEA
While Score-P can already be used to analyse the performance of many applications running on MSA systems, users often have to apply workarounds to bypass current limitations – which come along with various drawbacks. These will be addressed within DEEP-SEA to improve both the ease of use and the general applicability of Score-P. For example, we will remove the restriction that the Score-P measurement libraries currently expect a homogeneous use of parallel programming models in MPMD applications. Likewise, we will allow using different hardware performance counters and low-overhead timestamp counter register timers running at different “ticks per seconds” rates when using multiple MSA modules. In addition, we will implement measurement support for tracking MPI inter-communicators and proper handling of communication operations using them.