In scientific computing, a Chinese approach increases Nvidia GPU performance by an order of magnitude.

A high-performance algorithm that could solve complicated material design problems on consumer GPUs has been developed by Chinese researchers, achieving a groundbreaking 800-fold increase in speed over traditional methods.

Developed by a research team at Shenzhen MSU-BIT University, co-founded by Lomonosov Moscow State University and Beijing Institute of Technology, the new algorithm enhances the computational efficiency of peridynamics (PD), a cutting edge, non-local theory that solves difficult physical issues such as cracks, damage and fractures.
It opens up new possibilities for solving complex mechanical problems across various industries, including aerospace and military applications, on widely available chips that are low-cost and not subject to US sanctions.

Peridynamics has proven advantageous in modelling material damage, but its high computational complexity has traditionally made large-scale simulations inefficient, with issues such as high memory usage and slow processing speeds.

To address these challenges, Yang Yang, an associate professor, leveraged Nvidia’s CUDA programming technology to create the PD-General framework. By making an in-depth analysis of the chip’s unique structure, her team optimised algorithm design and memory management that led to a remarkable performance boost. Their research was published in the Chinese Journal of Computational Mechanics on January 8.

“This efficient computational power allows researchers to reduce calculations that would typically take days to just a few hours – or even minutes – using an ordinary home-level GPU, which is a significant advancement for PD research,” Yang wrote in the paper.

The PD-General framework achieved a speed increase of up to 800 times on a consumer-grade Nvidia GeForce RTX 4070, compared with traditional serial programs. Even when compared with widely used OpenMP parallel programs, the algorithm showed a 100-fold speed increase.

In simulations involving millions of particles, PD-General can complete 4,000 iterative steps in just five minutes. In the two-dimensional uniaxial tensile problems – considered to be the largest in computational scale – the framework processed 69.85 million iterations in under two minutes using single precision.

Peridynamics plays a crucial role in the analysis of material fracture and damage. In the aerospace field, it is used to model crack propagation in aircraft materials during impact, providing more accurate safety predictions for aircraft structures.

In civil engineering, it is used to simulate damage evolution and failure patterns of bridges or buildings during seismic events, supporting earthquake-resistant design.

Additionally, in ballistics and explosive research, PD models the crack and material fragmentation processes under high-impact conditions, assisting in the development of military equipment.
With ongoing improvements in GPU hardware, the PD-General framework is expected to achieve even greater computational efficiency in the future, further enhancing its potential to address more complex mechanical problems and expand the applications of computational mechanics.

 

You May Have Missed