Nek5000 is a highly scalable open source code for the simulation of incompressible flows in moderately complex geometries.
The discretisation is based on the spectral-element method (SEM), which combines the high-order accuracy from spectral methods with the geometric flexibility of finite-element methods. Such a scheme is inherently more accurate than the present industry standard of using low-order finite volume codes. In particular for flow problems involving high-quality Direct Numerical Simulation (DNS) of canonical turbulent flows, flow stability and high order transition studies give clear advantages. Nek5000 includes extensive pre- and post-processing SW, and filters to various other tools. Nek5000 is maintained at Argonne National Laboratories and is used at many universities in Europe, including, e.g., KTH and ETHZ. Since its beginning, Nek5000 was designed to be a code to employ large-scale parallelism. Kernel arithmetic is dominated by matrix products and well suited to anticipated exascale architectures.
The goal for Nek5000 in the DEEP-SEA consortium is to port the code to large-scale systems with heterogeneous memories and GPUs together with the optimisation of its main computational kernel (matrix-matrix multiplication) using the DEEP-SEA software stack. Within DEEP-SEA, we will demonstrate the usage of the DaCe framework within a large scale legacy Fortran application to enable Nek5000 to run and exploit different heterogeneous platforms still achieving high performance both in computational time and scalability.
Computational Fluid Dynamics in the first DEEP project
AVBP is a parallel Computational Fluid Dynamic (CFD) code that studies the combustion process in gas turbines, targeting its optimisation, impacting stability and pollution reduction. The AVBP code has been ported to all major systems (SGI Altix ICE, CRAY XT4, IBM BlueGene/Q) with excellent performance. However, keeping a good level of scalability and performance on the upcoming HPC systems is challenging: CFD requires all computing cores to communicate frequently, and the physical models often require reduced variables (max/min/mean) over the complete computing partition.
In the DEEP project, CERFACS (the European Centre for Research and Advanced Training in Scientific Computation) aimed at improving AVBP’s scalability by taking advantage of the DEEP Cluster-Booster architecture. In order to do that, the bottlenecks caused by the original master/slave approach were removed as a first step. The next step was migrating from a pure MPI approach to a hybrid approach of MPI+OmpSs. The OmpSs model allows exposing additional parallelism, and by using the task-based model, it was possible to implement a version of the application that outperformed and outscaled the previous one. Loop refactoring and compiler hints gave an extra edge in performance, as now the vector units are used more efficiently. Lastly, the I/O operations have been offloaded, together with costly global reductions that hindered the scalability of the application, and that were then performed in an overlapped manner in the Cluster, while the simulation continued on the Booster.