QUICK REVIEW

[论文解读] From Dark Matter to Galaxies with Convolutional Networks

Xinyue Zhang, Yanfang Wang|arXiv (Cornell University)|Feb 15, 2019

Galaxies: Formation, Evolution, Phenomena参考文献 29被引用 40

一句话总结

本论文训练一个两阶段卷积神经网络，将来自 N-body 模拟的3D 暗物质场映射到来自水力学模拟的3D星系分布，在多个统计量上超过或接近传统的 HOD 方法，且训练/生成速度更优。

ABSTRACT

Cosmological surveys aim at answering fundamental questions about our Universe, including the nature of dark matter or the reason of unexpected accelerated expansion of the Universe. In order to answer these questions, two important ingredients are needed: 1) data from observations and 2) a theoretical model that allows fast comparison between observation and theory. Most of the cosmological surveys observe galaxies, which are very difficult to model theoretically due to the complicated physics involved in their formation and evolution; modeling realistic galaxies over cosmological volumes requires running computationally expensive hydrodynamic simulations that can cost millions of CPU hours. In this paper, we propose to use deep learning to establish a mapping between the 3D galaxy distribution in hydrodynamic simulations and its underlying dark matter distribution. One of the major challenges in this pursuit is the very high sparsity in the predicted galaxy distribution. To this end, we develop a two-phase convolutional neural network architecture to generate fast galaxy catalogues, and compare our results against a standard cosmological technique. We find that our proposed approach either outperforms or is competitive with traditional cosmological techniques. Compared to the common methods used in cosmology, our approach also provides a nice trade-off between time-consumption (comparable to fastest benchmark in the literature) and the quality and accuracy of the predicted simulation. In combination with current and upcoming data from cosmological observations, our method has the potential to answer fundamental questions about our Universe with the highest accuracy.

研究动机与目标

动机：从暗物质分布快速、准确地生成星系目录，以实现与观测结果的高效比较。
应对从密集的暗物质场预测高度稀疏的3D星系场的挑战。
开发并评估一个两阶段 CNN 架构，以在预测准确性上优于标准方法。
在多种宇宙学统计量（功率谱、三阶谱、空洞）上评估模型，并与 HOD 进行比较。

提出的方法

将星系预测表述为一个有监督学习任务：从3D暗物质密度到3D星系计数，使用CNN。
引入一个两阶段架构：第一阶段为二元分类器，使用加权交叉熵预测每个体素中星系的存在/不存在；第二阶段为回归器（在被预测包含星系的体素上）使用L2损失预测星系数量。
测试多种3D CNN变体（U-Net、Recurrent Residual U-Net、Inception），并采用修改后的跳跃连接策略以减少对高分辨率输入的过度依赖。
将模拟体积划分为32^3个子立方体（每个2.3 Mpc/h），在1024^3体素密度场上训练：训练集占62.6%、验证集19.63%、测试集17.76%。
以具有三个参数（M_min、M1、α）的 Halo Occupation Distribution (HOD) 为基准，优化以匹配星系密度和功率谱。
使用功率谱、传输函数、双谱、空洞以及可视化等进行评估，以评估尺度相关的精度。

实验结果

研究问题

RQ1基于 CNN 的模型是否能够在跨越多种尺度的映射中，比传统的 HOD 方法更好地从暗物质密度场学习到星系分布？
RQ2两阶段训练方案是否能够缓解由于体素中星系稀疏而导致的极端类别不平衡？
RQ3哪些 CNN 架构和配置最能捕捉用于此映射的多尺度空间信息？
RQ4与 HOD 相比，基于 ML 的方法在更高阶统计量（如三阶谱和空洞丰度）上的表现如何？

主要发现

两阶段的 Inception+R2Unet 模型在大尺度上与 HOD 相近甚至超越，在功率谱的小尺度上表现更好。
两阶段方法在预测体素计数的均方误差方面更低（MSE ~ 0.00308）比 HOD（MSE ~ 0.01007）。
二值阶段结果显示第一阶段中 Inception 获得最高召回率（95.72%），表明对包含星系的体素的识别效果良好。
Inception+R2Unet 相对于目标的双谱残差在两个测试尺度上分别为2.7%和5.0%（k1=0.5, k2=0.6 h/Mpc）；在更小尺度上该模型显著优于 HOD（相对残差在 k1=1.2, k2=1.3 h/Mpc 为0.68% vs 1193%）。
空洞丰度（空洞尺度函数）对于 Inception+R2Unet 和 HOD 与目标一致，表明在大尺度结构统计量上具有竞争力。
训练/生成速度：Illustris 全部流体动力模拟约需 1900 万 CPU 小时；HOD 约 8 CPU 小时；Inception+R2Unet 约 3 GPU 小时即可完成训练和生成一个模拟。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。