QUICK REVIEW

[论文解读] Understanding Biology in the Age of Artificial Intelligence

Elsa Lawrence, A. R. Elshazly|arXiv (Cornell University)|Mar 6, 2024

Genetics, Bioinformatics, and Biomedical Research被引用 9

一句话总结

本论文从认识论视角分析机器学习如何重塑对生物学的理解，提出在生物学中指导 ML 设计与应用的原则，聚焦蛋白质结构预测和单细胞RNA测序。

ABSTRACT

Modern life sciences research is increasingly relying on artificial intelligence approaches to model biological systems, primarily centered around the use of machine learning (ML) models. Although ML is undeniably useful for identifying patterns in large, complex data sets, its widespread application in biological sciences represents a significant deviation from traditional methods of scientific inquiry. As such, the interplay between these models and scientific understanding in biology is a topic with important implications for the future of scientific research, yet it is a subject that has received little attention. Here, we draw from an epistemological toolkit to contextualize recent applications of ML in biological sciences under modern philosophical theories of understanding, identifying general principles that can guide the design and application of ML systems to model biological phenomena and advance scientific knowledge. We propose that conceptions of scientific understanding as information compression, qualitative intelligibility, and dependency relation modelling provide a useful framework for interpreting ML-mediated understanding of biological systems. Through a detailed analysis of two key application areas of ML in modern biological research - protein structure prediction and single cell RNA-sequencing - we explore how these features have thus far enabled ML systems to advance scientific understanding of their target phenomena, how they may guide the development of future ML models, and the key obstacles that remain in preventing ML from achieving its potential as a tool for biological discovery. Consideration of the epistemological features of ML applications in biology will improve the prospects of these methods to solve important problems and advance scientific understanding of living systems.

研究动机与目标

激发对现代生物学中机器学习的哲学与认识论考察。
提出一个框架，用信息压缩、定性可理解性以及依赖性建模来理解机器学习介导的生物学洞见。
分析在蛋白质结构预测和单细胞RNA测序中的机器学习应用如何影响科学理解。
识别阻碍机器学习充分实现生物学发现的障碍。
提出设计原则以改进未来在生物学领域的机器学习模型。

提出的方法

通过认识论工具包调查生物学中现有的机器学习应用。
应用现代科学理解理论来解释机器学习驱动的洞见。
基于信息压缩、可理解性和依赖性建模来评估机器学习输出，开发一个框架。

实验结果

研究问题

RQ1信息压缩、定性可理解性和依赖关系的概念如何揭示机器学习介导的生物学理解？
RQ2当前生物学中的机器学习应用（如蛋白质结构预测、单细胞RNA测序）在多大程度上推动或限制科学理解？
RQ3哪些设计考量应引导未来的机器学习模型，以促进生物学发现？

主要发现

可以通过信息压缩、定性可理解性和依赖性建模来解释生物学中机器学习应用的科学价值。
蛋白质结构预测和单细胞RNA测序作为聚焦案例，用于考察机器学习如何推进理解。
仍然存在阻碍机器学习充分发挥其作为生物学发现工具潜力的障碍。
机器学习应用的认识论特征可以指导未来的模型开发并提升生物学中的问题解决能力。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。