QUICK REVIEW

[论文解读] From Memorization to Creativity: LLM as a Designer of Novel Neural-Architectures

Waqas Khalid, Dmitry I. Ignatov|arXiv (Cornell University)|Jan 6, 2026

Machine Learning in Materials Science被引用 0

一句话总结

本文研究了一个具备代码能力的大模型如何通过以低保真性能信号和 MinHash–Jaccard 新颖性过滤为支撑的 22 周期闭环“生成–评估–选择–微调”过程，演变为一个自治的神经架构设计师。

ABSTRACT

Large language models (LLMs) excel in program synthesis, yet their ability to autonomously navigate neural architecture design--balancing syntactic reliability, performance, and structural novelty--remains underexplored. We address this by placing a code-oriented LLM within a closed-loop synthesis framework, analyzing its evolution over 22 supervised fine-tuning cycles. The model synthesizes PyTorch convolutional networks which are validated, evaluated via low-fidelity performance signals (single-epoch accuracy), and filtered using a MinHash-Jaccard criterion to prevent structural redundancy. High-performing, novel architectures are converted into prompt-code pairs for iterative fine-tuning via parameter-efficient LoRA adaptation, initialized from the LEMUR dataset. Across cycles, the LLM internalizes empirical architectural priors, becoming a robust generator. The valid generation rate stabilizes at 50.6 percent (peaking at 74.5 percent), while mean first-epoch accuracy rises from 28.06 percent to 50.99 percent, and the fraction of candidates exceeding 40 percent accuracy grows from 2.04 percent to 96.81 percent. Analyses confirm the model moves beyond replicating existing motifs, synthesizing 455 high-performing architectures absent from the original corpus. By grounding code synthesis in execution feedback, this work provides a scalable blueprint for transforming stochastic generators into autonomous, performance-driven neural designers, establishing that LLMs can internalize empirical, non-textual rewards to transcend their training data.

研究动机与目标

激励并评估在其自身成功生成结果的基础上，LLM 是否能够自治设计新的神经架构。
在生成的 PyTorch 代码的句法有效性、单轮 CIFAR-10 精度的早期学习信号、以及结构新颖性以避免重复结构之间保持三者平衡。
展示一个闭环框架，将经验性架构先验内化并扩展一个多样且高质量的设计语料库。

提出的方法

将 LLM 视为在固定接口约定下的 PyTorch 架构随机生成器。
使用 22 周期的生成–评估–选择–微调循环，包含有效性检查、单轮 CIFAR-10 训练以及 MinHash–Jaccard 新颖性过滤。
用 LoRA 对来自 LEMUR 数据集的自生成架构进行微调，以受选的自生成架构为初始化。
通过低保真代理（首轮精度）和新颖性标准来评估生成的架构，然后再加入训练语料库。
保持固定的提示、解码和训练协议，以隔离迭代微调和数据增长的影响项。

实验结果

研究问题

RQ1在对自身成功设计进行迭代微调是否会提升 LLM 生成有效、高质量且结构上新颖的神经架构的能力？
RQ2将代码合成以执行反馈和新颖性过滤作为底层依据，是否能在可扩展的循环中形成稳健的架构先验？
RQ3有效性、早期轮次表现和设计多样性在多次合成周期中如何演化？

主要发现

Cycle	Valid (%)	Best (%)	Mean (%)	≥40% (%)	Unique Models	Total Train Prompts
1	44.0	47.78	28.06	2.04	1	1698
5	32.0	49.13	29.88	6.82	9	1724
10	53.8	55.48	37.70	38.04	18	1785
15	66.8	58.60	47.40	80.70	34	1911
18	59.1	63.98	50.99	96.81	38	2025
22	41.8	57.62	49.48	92.86	30	2154

在 22 个周期中，生成有效率平均为 50.6%（Wilson 置信区间 [45.0%, 56.1%]）。
mean 首轮 CIFAR-10 精度从 28.06% 提升至 50.99%。
≥40% 精度候选项的比例在第 22 周达到 92.86%（峰值 96.81%）。
在各周期中共发现并加入训练语料库 455 个结构新颖的架构。
总计有 455 个高性能架构在原始语料库中不存在，已并入自生成集合。
该闭环在显著保持架构多样性的同时，提升了可靠性与学习效率。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。