This Website is not fully compatible with Internet Explorer.
For a more complete and secure browsing experience please consider using Microsoft Edge, Firefox, or Chrome

Airplane Simulations on Heterogeneous Pre-Exascale Architectures


Abstract


High fidelity Computational Fluid Dynamics simulations are generally associated with large computing requirements, which are progressively acute with each new generation of supercomputers. However, significant research efforts are required to unlock the computing power of leading-edge systems based on increasingly complex architectures. We can affirm with quite a certainty that future Exascale systems will be heterogeneous, including accelerators such as GPUs. We can also expect higher variability on the performance of the various computing devices engaged in a simulation; due to the explosion of the parallelism, and technical issues such as the hardware-enforced mechanisms to preserve the thermal design limits. In this context, dynamic load balancing (DLB) becomes a must for the parallel efficiency of any simulation code. In the Center of Excellence for engineering EXCELLERAT, the CFD code Alya has been provisioned with a distributed memory DLB mechanism, complementary to the node-level load balancing mechanisms strategy already in place. The kernel parts of the method are an efficient in-house SFC-based mesh practitioner, and an online redistribution module to migrate the simulation between two different partitions. Those are used to correct the partition according to runtime measurements. We have focused on maximizing the parallel performance of the mesh partition process to minimize the overhead of the load balancing. Our software can partition a 250M elements mesh for an Airplane simulation with 0.08 sec using 128 nodes (6144 CPU-cores) of the MareNostrum sumpercomputer. We then applied all this technology to perform simulations on the heterogeneous POWER9 cluster installed at the Barcelona Supercomputing Center, with an architecture very similar to that of the Summit supercomputer from the Oak Ridge National Laboratory ? ranked second in the top500 list. In the BSC POWER9 cluster, which has 4 NVIDIA P100 GPUS per node, we have assessed the performance of Alya using up to 40 nodes for simulations of airplane aerodynamics.We demonstrated that we could perform a well-balanced co-execution using both the CPUs and GPUs simultaneously, being that 23% faster than using only the GPUs. In practice, this represents a performance boost equivalent to attaching an additional GPU per node.

Document Details

ReferenceNWC21-424-c
AuthorBorrell. R
LanguageEnglish
TypePresentation Recording
Date 26th October 2021
OrganisationBarcelona Supercomputing Center
RegionGlobal

Download


Back to Previous Page