

浏览全部资源
扫码关注微信
1.中国地震局第二监测中心,陕西 西安 710054
2.西安电子科技大学智能感知与图像理解教育部重点实验室,陕西 西安 710071
3.西安电子科技大学人工智能学院,陕西 西安 710071
4.西安电子科技大学杭州研究院,浙江 杭州 311231
Received:12 September 2024,
Revised:2025-01-10,
Published:15 February 2025
移动端阅览
周辉,朱虎明,高天琦等.基于国产加速卡的地震模拟计算性能分析与优化[J].防灾减灾工程学报,2025,45(01):21-33.
ZHOU Hui,ZHU Huming,GAO Tianqi,et al.Performance Analysis and Optimization of Seismic Simulation Computation Based on Domestic Accelerator Cards[J].Journal of Disaster Prevention and Mitigation Engineering,2025,45(01):21-33.
周辉,朱虎明,高天琦等.基于国产加速卡的地震模拟计算性能分析与优化[J].防灾减灾工程学报,2025,45(01):21-33. DOI: 10.13409/j.cnki.jdpme.20240912002.
ZHOU Hui,ZHU Huming,GAO Tianqi,et al.Performance Analysis and Optimization of Seismic Simulation Computation Based on Domestic Accelerator Cards[J].Journal of Disaster Prevention and Mitigation Engineering,2025,45(01):21-33. DOI: 10.13409/j.cnki.jdpme.20240912002.
AWP‑ODC是基于有限差分数值方法来实现大规模三维地震模拟的软件。随着国外对我国高性能计算芯片的出口限制,我国急需发展自己的高性能计算芯片及其软件生态。早期的AWP‑ODC加速主要基于NVIDIA GPU软硬件架构来设计优化,近年来,多种异构计算平台迅猛发展,如何基于新的异构计算软硬件平台来加速AWP‑ODC具有重要研究价值。为此,本文在一种国产加速卡上对AWP‑ODC进行移植。针对耗时较多的核函数dstrqc,通过GPU访存优化和网格参数优化等方式缩短了其运行时间。最后分别在国产类GPU单卡和双卡上,利用Fréchet Kernels地震和8·3鲁甸地震数据集进行性能测试。实验结果表明,在单卡计算环境下,两个数据集的FLOPS分别提高了30.51%和25.21%;在双卡计算环境下,两个数据集的FLOPS分别提高了9.42%和23.6%。
AWP-ODC is a software for large-scale 3D seismic simulation based on the finite difference numerical method. Due to foreign export restrictions on high-performance computing chips to China
there is an urgent need to develop China's own high-performance computing chips and software ecosystem. The early acceleration of AWP-ODC was primarily designed and optimized based on the NVIDIA GPU software and hardware architecture. In recent years
various heterogeneous computing platforms developed rapidly. How to accelerate AWP-ODC based on new heterogeneous computing software and hardware platforms showed significant research value. To this end
AWP-ODC was ported to a domestic accelerator card. By optimizing GPU memory access and grid parameters
the execution time of the time-consuming kernel function dstrqc was reduced. Finally
performance tests were conducted on a domestic GPU single-card and dual-card setup using the Fréchet Kernels seismic dataset and the 8·3 Ludian earthquake dataset. Experimental results showed that
under a single-card computing environment
the FLOPS for the two datasets increased by 30.51% and 25.21%
respectively. Under a dual-card computing environment
the FLOPS for the two datasets increased by 9.42% and 23.6%
respectively.
Olsen K B , Archuleta R J , Matarese J R . Three-dimensional simulation of a magnitude 7.75 earthquake on the San Andreas fault [J]. Science , 1995 , 270 ( 5242 ): 1628 - 1632 .
Olsen K B , Day S M , Bradley C R . Estimation of Q for long-period (> 2 sec) waves in the Los Angeles basin [J]. Bulletin of the Seismological Society of America , 2003 , 93 ( 2 ): 627 - 638 .
Olsen K B , Day S M , Minster J B , et al . Strong shaking in Los Angeles expected from southern San Andreas earthquake [J]. Geophysical Research Letters , 2006 , 33 ( 7 ): 1 - 4 .
Cui Y , Olsen K B , Jordan T H , et al . Scalable earthquake simulation on petascale supercomputers [C]∥ SC'10: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing , Networking, Storage and Analysis . IEEE , 2010 : 1 - 20 .
Christen M , Schenk O , Cui Y . Patus for convenient high-performance stencils: Evaluation in earthquake simulations [C]∥ SC'12: Proceedings of the International Conference on High Performance Computing , Networking, Storage and Analysis . IEEE , 2012 : 1 - 10 .
Roten D , Cui Y , Olsen K B , et al . High-frequency nonlinear earthquake simulations on petascale heterogeneous supercomputers [C]∥ SC'16: Proceedings of the International Conference for High Performance Computing , Networking, Storage and Analysis . IEEE , 2016 : 957 - 968 .
Fu H H , He C H , Chen B W , et al . 18 . 9-Pflops nonlinear earthquake simulation on Sunway TaihuLight: enabling depiction of 18-Hz and 8-meter scenarios [C]∥ Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis . 2017: 1 - 12 . http:∥doi.org/10.1145/3126908. 3126910 http://doi.org/10.1145/3126908.3126910 .
田浩东 . 基于新一代神威超级计算机的地震模拟并行优化方法研究 [D]. 济南 : 山东大学 , 2023 .
Tian H D . Parallelizing and optimizing seismic simulations on the new-generation sunway [D]. Ji’nan : Shandong University , 2023 . (in Chinese)
范国炜 , 吴涛 , 刘壮 . 基于新一代神威天气和气候预测系统并行优化 [J]. 计算机仿真 , 2023 , 40 ( 12 ): 353 - 358 .
Fan G W , Wu T , Liu Z . Parallel optimization of weather and climate prediction system based on new generation of sunway [J]. Computer Simulation , 2023 , 40 ( 12 ): 353 - 358 . (in Chinese)
胡文娇 . SM2算法在天河新一代超级计算机上的实现和优化 [D]. 长沙 : 湖南大学 , 2023 .
Hu W J . The implemention and optimization of the SM2 algorithm on tianhe new generation supercomputers [D]. Changsha : Hunan University , 2023 . (in Chinese)
郝萌 , 田雪洋 , 鲁刚钊 , 等 . 基于国产DCU异构平台的图匹配算法移植与优化 [J]. 计算机科学 , 2024 , 51 ( 4 ): 67 - 77 .
Hao M , Tian X Y , Lu G Z , et al . Transplanation and optimization of graph matching algorithm based on domestic DCU heterogeneous platform [J]. Computer Science , 2024 , 51 ( 4 ): 67 - 77 . (in Chinese)
NVIDIA Corporation . NVIDIA Developer Zone [EB/OL]. (n .d . )[ 2024-07-30 ]. https://developer.nvidia.com https://developer.nvidia.com .
Sanders J , Kandrot E . CUDA by example: an introduction to general-purpose GPU programming [M]. USA : Addison-Wesley Professional , 2010 .
Cook S . CUDA programming: a developer's guide to parallel computing with GPUs [M]. USA : Morgan Kaufmann Publishers Inc. , 2012 .
IncAMD . ROCm Documentation: HIP Programming Guide [EB/OL]. (n .d . )[ 2024-07-30 ]. https:∥rocmdocs.amd.com/en/latest/Programming_Guides/HIP-GUIDE.html https://rocmdocs.amd.com/en/latest/Programming_Guides/HIP-GUIDE.html .
Jin Z , Vetter J S . Evaluating Unified Memory Performance in HIP [C]∥ 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) . IEEE , 2022 : 562 - 568 .
黄聪祎 , 赵伟文 , 万德成 . 国产DCU加速卡与MPS方法结合高效模拟带障碍物溃坝流动问题 [J]. 水动力学研究与进展A辑 , 2024 , 39 ( 2 ): 187 - 195 .
Huang C Y , Zhao W W , Wan D C . Efficient simulation of obstructed dam-break flows using the MPS method on domestic DCU accelerator card [J]. Chinese Journel of Hydrodynamics , 2024 , 39 ( 2 ): 187 - 195 . (in Chinese)
Niu J , Gao W , Han L , et al . A DCU code generation and optimization method based on polyhedral model [J]. In International Conference on Cloud Computing Performance Computing and Deep Learning , 2023 , 12712 : 416 - 428 .
李冰洋 . 面向“嵩山”超级计算机系统的供水管网仿真计算移植与优化 [D]. 郑州 : 郑州大学 , 2022 .
Li B Y . Transplantation and optimization of water supply network simulation calculation for "Songshan" supercomputer system [D]. Zhengzhou : Zhengzhou University , 2022 . (in Chinese)
杨思驰 , 赵荣彩 , 韩林 , 等 . 面向DCU的LDS访存向量化优化 [J]. 计算机工程 , 2024 , 50 ( 2 ): 206 - 213 .
Yang S C , Zhao R C , Han L , et al . Vectorization optimization of LDS memory access for DCU [J]. Computer Engineering , 2024 , 50 ( 2 ): 206 - 213 . (in Chinese)
Poursartip B , Fathi A , Tassoulas J L . Large-scale simulation of seismic wave motion: A review [J]. Soil Dynamics and Earthquake Engineering , 2020 , 129 : 105909 .
Xiaolin H , Xiaofeng J . High-order dynamic lattice method for seismic simulation in anisotropic media [J]. Geophysical Journal International , 2018 , 212 ( 3 ): 1868 - 1889 .
Tessmer E . Seismic finite-difference modeling with spatially varying time steps [J]. Geophysics , 2000 , 65 ( 4 ): 1290 - 1293 .
Virieux J , Calandra H , Plessix R É . A review of the spectral, pseudo‐spectral, finite‐difference and finite‐element modelling techniques for geophysical imaging [J]. Geophysical Prospecting , 2011 , 59 : 794 - 813 .
Tobin J , Breuer A , Heinecke A , et al . Accelerating seismic simulations using the intel xeon phi knights landing processor [C]∥ International Conference on High Performance Computing . Cham : Springer International Publishing , 2017 : 139 - 157 .
Mu D , Moran J , Zhou H , et al . In-situ analysis and visualization of earthquake simulation [C]∥ Proceedings of the Practice and Experience in Advanced Research Computing on Rise of the Machines (learning) . PEARC , 2019 : 1 - 5 .
Zhou Q , Chu C , Kumar N S , et al . Designing high-performance mpi libraries with on-the-fly compression for modern gpu clusters [C]∥ 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS) . IEEE , 2021 : 444 - 453 .
Cui Y , Roten D , Palla A , et al . Progress of porting AWP-ODC to next generation HPC architectures and a 4-Hz Iwan-type nonlinear dynamic simulation of the ShakeOut scenario on TACC Frontera [J]. SCEC Publications , 2023 : 1 .
Zhou H . Application of intelligent optimization algorithms to the design of automatic generation of software tests for data anomaly identification [J]. Applied Mathematics and Nonlinear Sciences , 2023 , 9 ( 1 ): 1 - 17 .
SCEC . AWP-ODC - SCECpedia [EB/OL]. ( 2024-11-06 )[ 2024-11-06 ]. https:∥scec.usc.edu/scecpedia/AWP-ODC https://scec.usc.edu/scecpedia/AWP-ODC
Wang G , Ji S , Lv H , et al . Drucker-prager yield criteria in viscoelastic-plastic constitutive model for the study of sea ice dynamics [J]. Journal of Hydrodynamics , 2006 , 18 ( 6 ): 714 - 722 .
杨庆节 , 刘财 , 耿美霞 , 等 . 交错网格任意阶导数有限差分格式及差分系数推导 [J]. 吉林大学学报(地球科学版) , 2014 , 44 ( 1 ): 375 - 385 .
Yang Q J , Liu C , Geng M X , et al . Staggered grid finite difference scheme and coefficients deduction of any number of derivatives [J]. Journal of Jilin University (Earth Science Edition) , 2014 , 44 ( 1 ): 375 - 385 . (in Chinese)
Berenger J P . A perfectly matched layer for the absorption of electromagnetic waves [J]. Journal of Computational Physics , 1994 , 114 ( 2 ): 185 - 200 .
Komatitsch D , Martin R . An unsplit convolutional perfectly matched layer improved at grazing incidence for the seismic wave equation [J]. Geophysics , 2007 , 72 ( 5 ): SM155-SM167
cppreference . com . C++ Reference: C++17 Features [EB/OL]. (n .d . )[ 2024-07-30 ]. https:∥en.cppreference.com/w/cpp/17 https://en.cppreference.com/w/cpp/17 .
商建东 , 熊威 , 华浩波 , 等 . 面向DCU的流固耦合浸没边界算法异构实现 [J]. 计算机工程 , 2024 , doi: 10-19678/j.issn.1000-3428.0068818 http://dx.doi.org/10-19678/j.issn.1000-3428.0068818 .
Shang J D , Xiong W , Hua H B , et al . Heterogeneous implementation of fluid-structure interaction immersed boundary method for DCU [J]. Computer Engineering , 2024 , doi: 10-19678/j.issn.1000-3428.0068818. http://dx.doi.org/10-19678/j.issn.1000-3428.0068818. (in Chinese)
Zhao L , Jorden T H , Olsen K B , et al . Fréchet Kernels for imaging regional earth structure based on three-dimensional reference models [J]. Bulletin of the Seismological Society of America , 2005 , 95 : 2066 - 2080 .
冀昆 , 温瑞智 , 崔建文 , 等 . 鲁甸 M_S6. 5 级地震强震动记录及震害分析 [J]. 震灾防御技术 , 2014 , 9 ( 3 ): 325 - 339 .
Ji K , Wen R Z , Cui J W , et al . Ludian M_S6.5 earthquake earthquake real vibration record and earthquake analysis [J]. Technology for Earthquake Disaster Prevention , 2014 , 9 ( 3 ): 325 - 339 . (in Chinese)
Moseley B , Nissen-Meyer T , Markham A . Deep learning for fast simulation of seismic waves in complex media [J]. Solid Earth , 2020 , 11 ( 4 ): 1527 - 1549 .
Ali Najah nori , Walid fahs . Seismic waves near oil reservoir prediction using deep learning [J]. Humanitarian and Natural Sciences Journal , 2023 , 4 ( 9 ): 177 - 187 .
0
Views
0
下载量
0
CSCD
Publicity Resources
Related Articles
Related Author
Related Institution
苏公网安备32010202012147号