Skip to main content
QUICK REVIEW

[論文レビュー] Performance Portable Monte Carlo Particle Transport on Intel, NVIDIA, and AMD GPUs

John Tramm, Paul Romano|arXiv (Cornell University)|Mar 19, 2024
Advanced Data Storage Technologies被引用数 5
ひとこと要約

本論文は、Intel、NVIDIA、AMDのGPU上でOpenMCのパフォーマンスポータブルなGPU加速を、OpenMPターゲットオフロードを用いて実現し、Frontier、Polaris、Auroraでの強スケーリングを達成し、CPUベースラインおよびいくつかのGPUプラットフォームを上回ることを示している。

ABSTRACT

OpenMC is an open source Monte Carlo neutral particle transport application that has recently been ported to GPU using the OpenMP target offloading model. We examine the performance of OpenMC at scale on the Frontier, Polaris, and Aurora supercomputers, demonstrating that performance portability has been achieved by OpenMC across all three major GPU vendors (AMD, NVIDIA, and Intel). OpenMC's GPU performance is compared to both the traditional CPU-based version of OpenMC as well as several other state-of-the-art CPU-based Monte Carlo particle transport applications. We also provide historical context by analyzing OpenMC's performance on several legacy GPU and CPU architectures. This work includes some of the first published results for a scientific simulation application at scale on a supercomputer featuring Intel's Max series "Ponte Vecchio" GPUs. It is also one of the first demonstrations of a large scientific production application using the OpenMP target offloading model to achieve high performance on all three major GPU platforms.

研究の動機と目的

  • Demonstrate performance portability of OpenMC across AMD, NVIDIA, and Intel GPUs using OpenMP target offloading.
  • Evaluate multi-node scalability of OpenMC with MPI domain replication on GPU-accelerated supercomputers.
  • Compare GPU-accelerated OpenMC performance against CPU baselines and other state-of-the-art Monte Carlo codes.
  • Characterize historical trends in GPU versus CPU performance for Monte Carlo transport applications.

提案手法

  • Use OpenMC with OpenMP target offloading to run on GPUs from AMD, NVIDIA, and Intel within a single codebase.
  • Adopt event-based parallelism on GPUs and history-based parallelism on CPUs for particle transport.
  • Apply particle sorting by material type and energy to improve cross-section lookup kernel efficiency, using vendor sorting libraries.
  • Tune in-flight particle limits and batch sizing to maximize GPU memory and throughput.
  • Utilize multiple MPI ranks per GPU to enable concurrency across streams, with vendor-specific configurations (CUDA Thrust, ROC Thrust, Intel oneDPL) for sorting and memory operations.
  • Leverage OpenMC’s MPI domain replication to study scalability on large GPU-based systems.

実験結果

リサーチクエスチョン

  • RQ1Can OpenMC running with OpenMP target offloading achieve true performance portability across AMD, NVIDIA, and Intel GPUs?
  • RQ2How does OpenMC scale on multi-GPU, multi-node systems (Frontier, Polaris, Aurora) using MPI domain replication?
  • RQ3What are the key architectural optimizations (event-based parallelism, sorting, in-flight particle tuning) that maximize GPU performance in OpenMC?
  • RQ4How does GPU performance compare against CPU baselines and other state-of-the-art Monte Carlo codes for a depleted SMR pincell problem?

主な発見

  • OpenMC delivers substantial speedups on GPUs versus CPUs, with per-node performance gains on A100, MI250X, and PVC GPUs.
  • Weak scaling efficiencies on Aurora, Polaris, and Frontier are 97%, 96%, and 99%, respectively, for large-scale runs.
  • The study demonstrates OpenMC exceeding a 50× speedup over a Titan baseline using only 325 Aurora nodes or 1024 Frontier nodes.
  • OpenMC achieves approximately 1 billion particles per second on large-scale GPU runs for depleted fuel scenarios.
  • Intel Ponte Vecchio PVC GPUs outperform NVIDIA A100, NVIDIA GH200, and AMD MI250X by factors of about 2.3×, 1.2×, and 1.8× respectively on the test problems.
  • On a single NVIDIA-based GPU node (Aurora), OpenMC is over 17× faster than a state-of-the-art 96-core Sapphire Rapids CPU node.
  • OpenMC successfully demonstrates performance portability across three major GPU vendors using the OpenMP target offloading model.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。