LinkTest
LinkTest is a communication benchmark that tests point-to-point connections designed to scale up to a large number of processes (it was validated using up to 1 800 000 MPI ranks). Processes can be on the same node or on different nodes, as long as a usable link between the nodes exists. It supports the benchmarking of the following communication APIs: MPI, TCP, UCP, IB Verbs, PSM2, and NVLink bridges through CUDA and CUDA-aware MPI. Output of the program is a full communication matrix of the message-transmission times for all pairs of processes. The data is written in parallel to a SIONlib file. A standard output log with a results summary is also created. Additional python tools are provided which read the generated SION file and generate pdf reports.
LinkTest performs both serial and parallel test. In serial mode all (N-1)*N/2 pairs of processes are tested sequentially. In parallel (N/2) pairs are tested at once. Thus, the parallel test exposes the usable bandwidth of a system under maximum communication stress.
LinkTest is used in all three SEA Projects. In DEEP-/ and IO-SEA it is a key synthetic benchmark for testing the system and module interconnect speeds. MSA runs are conducted on the DEEP system using the CM, DAM and ESB modules, as shown in Figure 1.
For DEEP-SEA and the other SEA projects a new version of LinkTest has been released to the public. In addition to MPI, this version now supports the transport protocols UCX, InfiniBand Verbs, PSM2, NVLink and TCP.
Another major addition are the new kernel modes. The kernel in LinkTest defines the communication inside a pair of nodes. By default, one process sends a message to its partner, which sends the same message right back (Semi-directional ping-pong). Alternatively, both processes can transmit the message at the same time (Bidirectional, ping-ping which profits from symmetrical bidirectional bandwidth). Thirdly, one process can send a number of unacknowledged messages and receive only one confirmation message at the very end (Unidirectional, which allows direct comparisons with the OSU benchmarks).
We also added a new parallel mode that introduces pseudo randomness in the algorithm for selecting communication pairs. This is important because the parallel bandwidth greatly depends on the exact pairing for each step. The number of possible pairings is way too big to be tested exhaustively. With the new mode at least, a statistical evaluation of the influence of pairing on the bandwidth can be done.
Lastly, the overhauled python analysis tool supports all these added options with new graphics and shows a lot more details about the insides of LinkTest, thus helping interpretation of data.
In RED-SEA LinkTest and ParaStation MPI will both add support for the BullSequana eXascale Interconnect (BXI), which uses the Portals low-level communication API. The DEEP system now includes a new partition with four nodes connected by BXI. Studies with LinkTest of this interconnect and comparisons with the InfiniBand available on the other modules is ongoing.