QUICK REVIEW

[論文レビュー] From Memorization to Creativity: LLM as a Designer of Novel Neural-Architectures

Waqas Khalid, Dmitry I. Ignatov|arXiv (Cornell University)|Jan 6, 2026

Machine Learning in Materials Science被引用数 0

ひとこと要約

要約: この論文は、低忠実度の性能指標と MinHash–Jaccard の novelty フィルタを用いた、22 サイクルの closed-loop generate–evaluate–select–fine-tune プロセスを通じて、コード対応可能な LLM が自律的なニューラルアーキテクチャ設計者へと進化する様子を研究する。

ABSTRACT

Large language models (LLMs) excel in program synthesis, yet their ability to autonomously navigate neural architecture design--balancing syntactic reliability, performance, and structural novelty--remains underexplored. We address this by placing a code-oriented LLM within a closed-loop synthesis framework, analyzing its evolution over 22 supervised fine-tuning cycles. The model synthesizes PyTorch convolutional networks which are validated, evaluated via low-fidelity performance signals (single-epoch accuracy), and filtered using a MinHash-Jaccard criterion to prevent structural redundancy. High-performing, novel architectures are converted into prompt-code pairs for iterative fine-tuning via parameter-efficient LoRA adaptation, initialized from the LEMUR dataset. Across cycles, the LLM internalizes empirical architectural priors, becoming a robust generator. The valid generation rate stabilizes at 50.6 percent (peaking at 74.5 percent), while mean first-epoch accuracy rises from 28.06 percent to 50.99 percent, and the fraction of candidates exceeding 40 percent accuracy grows from 2.04 percent to 96.81 percent. Analyses confirm the model moves beyond replicating existing motifs, synthesizing 455 high-performing architectures absent from the original corpus. By grounding code synthesis in execution feedback, this work provides a scalable blueprint for transforming stochastic generators into autonomous, performance-driven neural designers, establishing that LLMs can internalize empirical, non-textual rewards to transcend their training data.

研究の動機と目的

LLM が自分で成功した生成物を反復的に学習することで、独自のニューラルアーキテクチャを自律的に設計できるかを動機づけ・評価する。
生成された PyTorch コードの構文的妥当性、単一エポック CIFAR-10 正解率からの早期学習信号、および重複モチーフを避ける構造的 novelty の三つの目的をバランスさせる。
経験的アーキテクチャ priors を内部化し、多様で高品質な設計語彙を拡張する closed-loop フレームワークを示す。

提案手法

LLM を固定 API 契約の下で PyTorch アーキテクチャの確率的ジェネレータとして扱う。
validity チェック、単一エポック CIFAR-10 学習、MinHash–Jaccard novelty フィルタリングを備えた 22 サイクルの generate–evaluate–select–fine-tune ループを用いる。
LE-MUR データセットから初期化された自己生成アーキテクチャを用いて LoRA で LLM をファインチューニングする。
低忠実 proxy（初エポック精度）と novelty 指標で生成アーキテクチャを評価し、トレーニングコーパスへの追加を判断する。
反復ファインチューニングとデータ増加の効果を分離するため、固定プロンプト、デコード、トレーニングプロトコルを維持する。

実験結果

リサーチクエスチョン

RQ1反復的なファインチューニングが、自身の成功した設計に基づく LLM の有効な高品質かつ構造的に新規なニューラルアーキテクチャの生成能力を向上させるか。
RQ2実行フィードバックと novelty フィルタリングを結びつけたコード合成が、スケーラブルなループ内で頑健なアーキテクチャ priors を生み出すか。
RQ3妥当性、早期エポック性能、設計の多様性は複数の合成サイクルを経てどのように進化するか。

主な発見

Cycle	Valid (%)	Best (%)	Mean (%)	≥40% (%)	Unique Models	Total Train Prompts
1	44.0	47.78	28.06	2.04	1	1698
5	32.0	49.13	29.88	6.82	9	1724
10	53.8	55.48	37.70	38.04	18	1785
15	66.8	58.60	47.40	80.70	34	1911
18	59.1	63.98	50.99	96.81	38	2025
22	41.8	57.62	49.48	92.86	30	2154

22 サイクルにわたり有効生成率は平均 50.6%（Wilson 信頼区間 [45.0%, 56.1%]）。
平均初エポック CIFAR-10 精度は 28.06% から 50.99% に上昇。
40% 以上の精度を持つ候補の割合は、サイクル 22 までに 2.04% から 92.86%へ増加（ピーク時 96.81%）。
サイクルを通じて 455 枚の構造的に新規なアーキテクチャが発見され、トレーニングコーパスへ追加。
全体として、元のコーパスには存在しなかった 455 の高性能アーキテクチャが自己生成セットに組み込まれた。
ループは、信頼性と学習効率を改善しつつ、アーキテクチャの多様性を大きく維持。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。