QUICK REVIEW

[论文解读] BioNeMo Framework: a modular, high-performance library for AI model development in drug discovery

Peter C. St. John, Dejun Lin|arXiv (Cornell University)|Nov 15, 2024

Computational Drug Discovery Methods被引用 11

一句话总结

BioNeMo Framework 是一个开源的、模块化库，建立在 NVIDIA NeMo Megatron 之上，用于在数百个 GPU 上训练和扩展生物分子 AI 模型，已验证在吞吐量和内存效率方面的提升。

ABSTRACT

Artificial Intelligence models encoding biology and chemistry are opening new routes to high-throughput and high-quality in-silico drug development. However, their training increasingly relies on computational scale, with recent protein language models (pLM) training on hundreds of graphical processing units (GPUs). We introduce the BioNeMo Framework to facilitate the training of computational biology and chemistry AI models across hundreds of GPUs. Its modular design allows the integration of individual components, such as data loaders, into existing workflows and is open to community contributions. We detail technical features of the BioNeMo Framework through use cases such as pLM pre-training and fine-tuning. On 256 NVIDIA A100s, BioNeMo Framework trains a three billion parameter BERT-based pLM on over one trillion tokens in 4.2 days. The BioNeMo Framework is open-source and free for everyone to use.

研究动机与目标

实现生物分子 AI 模型在规模化下的高效训练与微调。
提供与现有工作流集成的模块化组件（数据加载器、模型和工具）。
展示相对于基线 PyTorch 实现的吞吐量和可扩展性提升。
支持专用数据加载（蛋白质序列、单细胞数据）和内存感知的分组/批处理。
鼓励社区贡献和云规模部署以用于药物发现用例。

提出的方法

构建在 PyTorch 与 Lightning 之上，核心接口在 bionemo-core。
利用 NVIDIA NeMo Megatron 构建大型生物分子 BERT 风格模型（ESM-2、Geneformer）。
提供模块化子包（例如 bionemo-esm2、bionemo-geneformer），用于现成的训练、微调和推理。
为蛋白质序列和单细胞数据实现高性能数据加载器（BioNeMo-SCDL）。
纳入尺寸感知的分组（size-aware batcher 和 bucket batch sampler），以优化图和变长输入的内存使用。
提供 WebDataModule，将 WebDataset 与 LightningDatamodule 集成，以实现数据处理的流线化。

实验结果

研究问题

RQ1BioNeMo 如何在训练吞吐量上超越标准 PyTorch/Transformers 实现？
RQ2在多 GPU 训练大型生物分子模型时，BioNeMo 的可扩展性如何？
RQ3在实践中，BioNeMo 是否能够高效处理多样化数据类型（蛋白质序列、单细胞数据）及内存感知分组？
RQ4使用 BioNeMo 的专用加载器和分组策略，在内存利用和数据加载性能方面有哪些实际收益？

主要发现

在 256 NVIDIA A100 上，一个 3B 参数的 ESM-2 风格 pLM 在超过万亿个令牌上训练完成，耗时 4.2 天。
BioNeMo 在单个 A100 上针对一个 650M 参数模型，相对于 Hugging Face Transformers 的单设备吞吐量提升高达 1.47x，MFU 为 59.2% 对比基线的 40.1%。
在分布式训练中，BioNeMo 的 3B 参数模型在 256 GPU 处达到外推的单节点吞吐量的 96.9%（在 16 个 A100 上为 40% MFU，在 256 GPUs 上为 60% MFU）。
BioNeMo SCDL 提供 1.1–2.75x 比可比的 AnnData 加载器更快的数据加载，而无需将数据加载到内存中。
Bucket size-aware batching 产生接近均匀的数据大小分布，且相比 MiDi/基线方法几乎没有填充，从而减少内存填充。
展示了社区驱动的贡献及云规模部署（AWS），实现更快的推理和更大规模的探索性工作流。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。