QUICK REVIEW

[论文解读] Full-Stack Domain Enhancement for Combustion LLMs: Construction and Optimization

Quanjia Xiao, Weimin Ouyang|arXiv (Cornell University)|Feb 27, 2026

Machine Learning in Materials Science被引用 0

一句话总结

论文提出一个端到端工作流，构建专门的燃烧领域语料库，应用多阶段模型自适应（CPT、SFT、RLVR），并引入 FlameBench，以实现超越通用大模型和RAG基线的燃烧推理的最新水平。

ABSTRACT

Large language models (LLMs) in the direction of task adaptation and capability enhancement for professional fields demonstrate significant application potential. Nevertheless, for complex physical systems such as combustion science, general-purpose LLMs often generate severe hallucinations due to insufficient domain knowledge and the inability to adhere to physical conservation laws. To address this issue, we propose the first full-stack domain-enhanced LLM workflow tailored for the field of combustion science, which integrates automated domain corpus construction, incremental pre-training, instruction fine-tuning, and verifiable reward-based reinforcement learning. This workflow ensures that the model truly internalizes physical laws rather than merely learning textual statistical patterns. We also release FlameBench, a standardized evaluation benchmark specifically designed for complex reasoning tasks in combustion science. Experimental results demonstrate that the model developed in this work significantly outperforms state-of-the-art general-purpose closed-source models and traditional retrieval-augmented generation methods on combustion science reasoning tasks. This work lays a solid technical and resource foundation for the subsequent development of domain-specific scientific research agents with reliable scientific reasoning capabilities.

研究动机与目标

由于领域知识缺口和物理约束，燃烧科学领域对领域特定大模型的需求亟待提升。
提出一个全栈管线，整合语料库构建、增量预训练、监督微调和可验证强化学习，以强化物理一致性。
引入 FlameBench 作为评估燃烧领域推理的标准基准。
证明所提工作流能带来更优的领域推理，并在与RAG和通用大模型的基线比较中具有竞争力。

提出的方法

从英文/中文刊物及物理/化学资源中构建大规模的燃烧领域语料库（约 5B 领域令牌，在约 30B 总令牌中）。
对混合语料执行_continue 预训练（CPT），注入领域知识，同时保持通用语言能力。
执行两阶段监督微调（SFT-General 然后 SFT-Combustion），以对齐指令与领域特定推理模式。
在 KL 约束下使用可验证奖励的强化学习（RLVR），以提升物理一致性多参数推理。
开发 FlameBench，包含 436 道领域特定问题，用于评估领域知识与受限推理。
将端到端训练（CPT–SFT–RLVR）与 CPT、SFT、RAG 基线以及通用大模型进行对比。

Figure 1: Dataset token distribution by category.

实验结果

研究问题

RQ1端到端的全栈领域增强工作流是否能在燃烧领域知识保留方面优于通用领域预训练？
RQ2多阶段自适应（CPT、SFT、RLVR）是否能强制物理一致性并提升燃烧任务中的多物理推理？
RQ3与基于RAG的方法及其他闭源模型相比，端到端领域自适应模型在燃烧领域基准上的表现如何？

主要发现

Model Group	Accuracy (%)
Qwen3-8B-Base	26.8
CPT	33.3
SFT-General	33.5
SFT-Combustion	35.1
RLVR-Opt	43.8

CPT 相较基线取得显著提升，将 FlameBench 的准确率从 26.8% 提升到 33.3%。
SFT-General 相对于 CPT 提供边际增益，而 SFT-Combustion 将准确率提升至 35.1%。
RLVR 优化显著提升准确率至 43.8%，在低熵策略下输出长度更稳定、平均奖励更高。
RLVR-Opt 在 FlameBench 上超越最佳 RAG 基线（RAG + GLM-4）11.71 个百分点。
SFT-Combustion 与 RLVR 使模型在某些子领域接近于闭源模型的表现，具备与 GLM-4 相近的竞争力。
相较于 RAG 方法，RLVR-Opt 获得更高的准确率并消除了检索开销，表明内部化的领域知识与推理能力更强。

Figure 2: Data processing pipeline for the combustion-specific pre-training corpus.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。