Skip to main content
QUICK REVIEW

[论文解读] A Survey of Deep Learning for Scientific Discovery

Maithra Raghu, E. Schmidt|arXiv (Cornell University)|Mar 26, 2020
Machine Learning and Data Classification参考文献 249被引用 58
一句话总结

本综述回顾跨数据模态的深度学习模型如何辅助科学发现,强调数据效率、可解释性以及实际实施资源。

ABSTRACT

Over the past few years, we have seen fundamental breakthroughs in core problems in machine learning, largely driven by advances in deep neural networks. At the same time, the amount of data collected in a wide array of scientific domains is dramatically increasing in both size and complexity. Taken together, this suggests many exciting opportunities for deep learning applications in scientific settings. But a significant challenge to this is simply knowing where to start. The sheer breadth and diversity of different deep learning techniques makes it difficult to determine what scientific problems might be most amenable to these methods, or which specific combination of methods might offer the most promising first approach. In this survey, we focus on addressing this central issue, providing an overview of many widely used deep learning models, spanning visual, sequential and graph structured data, associated tasks and different training methods, along with techniques to use deep learning with less data and better interpret these complex models --- two central considerations for many scientific use cases. We also include overviews of the full design process, implementation tips, and links to a plethora of tutorials, research summaries and open-sourced deep learning pipelines and pretrained models, developed by the community. We hope that this survey will help accelerate the use of deep learning across different scientific domains.

研究动机与目标

  • 提供对应用于科学问题的深度学习概念的广泛、易于理解的概览。
  • 强调与科学相关的数据高效训练方法(自监督、半监督学习)和可解释性技术。
  • 在科学背景下概述端到端的深度学习工作流,包括数据、学习和验证阶段。
  • 提供实现指南、教程和开源资源以加速科学领域的应用。

提出的方法

  • 对多种深度学习模型(CNNs、GNNs、RNNs、Transformers)及其在科学领域的典型任务(分类、分割、配准)进行调查。
  • 讨论包括有监督、自监督、半监督和迁移学习在内的训练方法。
  • 提供在科学中应用DL的模板(预测、理解、复杂变换)。
  • 描述数据效率策略(数据增强、去噪)以及可解释性/表示分析技术。
  • 提供实现技巧并列举社区资源、教程和预训练模型。

实验结果

研究问题

  • RQ1哪些深度学习模型和任务最适合特定的科学问题?
  • RQ2如何在科学DL应用中实现数据高效训练和可靠的可解释性?
  • RQ3哪些实践资源(代码、教程、预训练模型)最能促进科学领域的采用?
  • RQ4在科学中设计、验证和部署DL系统的端到端工作流是什么?
  • RQ5在不同的科学情境中,替代的ML方法与DL相比有何差异?

主要发现

  • 提供了跨视觉、序列和图数据相关的模型、任务和训练方法的结构化概览。
  • 突出数据高效方法(自监督、半监督学习、数据增强)和对科学洞见至关重要的可解释性技术。
  • 概述了一个端到端的DL设计过程,包含迭代的数据、学习和验证循环。
  • 提供了经过筛选的教程、开源代码、预训练模型和社区资源以加速采用。
  • 指出DL在处理复杂变换和预测方面具有强大能力,但并非总是最佳初始工具;在适当情况下建议考虑替代的ML方法。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。