LLview is a set of software components to monitor clusters. Within its Job Reporting module, it provides detailed information of all jobs running on the system. To achieve this, LLview connects to different sources in the system and collects data to present to the user via a web portal. For example, the resource manager provides information about the jobs, while additional daemons may be used to acquire extra information from the compute nodes, keeping the overhead at a minimum, as the metrics are obtained only every minute. The LLview portal establishes a link between performance metrics and individual jobs to provide a comprehensive job reporting interface.
Goals within DEEP-SEA
In DEEP-SEA, LLview has been installed on the deep system and will provide its monitoring service throughout the project. Its database has been reworked to allow easier integration of new data sources. DCDB will be added as a data source to LLview, enriching the job monitoring data with energy metrics and later node-level performance counters. A REST API will also be added to LLview, in order to allow its database to be queried from other tools.