The DataCenter DataBase (DCDB) is a modular, continuous and holistic monitoring and data analytics framework targeted at HPC environments. It consists of three main components as depicted in Figure 1:
- Pusher: The main component for collecting data is the Pusher. It allows to run arbitrary data collection plugins and pushes the data to a Collect Agent via the MQTT protocol.
- Collect Agent: The collect agent functions as intermediary between one Storage Backend and one or multiple Pushers, acting as a data broker.
- Storage Backend: Here all the collected data is stored. By default a Cassandra or ScyllaDB database is used, but the framework is intended to support usage of other data storage solutions as well.
DCDB is intended for holistic monitoring of HPC systems and their supporting infrastructures, such as system hardware or software, applications, I/O, power provisioning and cooling. On top of its monitoring capabilities DCDB includes Wintermute, a plugin-based framework to enable Operational Data Analytics (ODA) on HPC systems: Wintermute is deeply integrated within DCDB and allows to ingest monitoring data as it is acquired, process it with state-of-the-art techniques via operators, and in turn enable analysis to improve a system’s efficiency and effectiveness.
Goals within DEEP-SEA
In DEEP-SEA, DCDB will be integrated with LLview to provide telemetry for job-level monitoring and new plugins will be developed to acquire monitoring data from vendor- and hardware-agnostic libraries such as libvariorum or LIKWID. Further, capabilities to leverage source-level instrumentation of user applications will be added to collect application meta-data.