QUICK REVIEW

[논문 리뷰] From Memorization to Creativity: LLM as a Designer of Novel Neural-Architectures

Waqas Khalid, Dmitry I. Ignatov|arXiv (Cornell University)|2026. 01. 06.

Machine Learning in Materials Science인용 수 0

한 줄 요약

이 논문은 코드 가능 LLM이 저해상도 성능 신호와 MinHash–Jaccard 신 novelty 필터를 사용한 22주기 폐쇄 루프 생성–평가–선정–미세조정 과정을 통해 자율적인 신경망 구조 디자이너로 어떻게 발전하는지 연구한다.

ABSTRACT

Large language models (LLMs) excel in program synthesis, yet their ability to autonomously navigate neural architecture design--balancing syntactic reliability, performance, and structural novelty--remains underexplored. We address this by placing a code-oriented LLM within a closed-loop synthesis framework, analyzing its evolution over 22 supervised fine-tuning cycles. The model synthesizes PyTorch convolutional networks which are validated, evaluated via low-fidelity performance signals (single-epoch accuracy), and filtered using a MinHash-Jaccard criterion to prevent structural redundancy. High-performing, novel architectures are converted into prompt-code pairs for iterative fine-tuning via parameter-efficient LoRA adaptation, initialized from the LEMUR dataset. Across cycles, the LLM internalizes empirical architectural priors, becoming a robust generator. The valid generation rate stabilizes at 50.6 percent (peaking at 74.5 percent), while mean first-epoch accuracy rises from 28.06 percent to 50.99 percent, and the fraction of candidates exceeding 40 percent accuracy grows from 2.04 percent to 96.81 percent. Analyses confirm the model moves beyond replicating existing motifs, synthesizing 455 high-performing architectures absent from the original corpus. By grounding code synthesis in execution feedback, this work provides a scalable blueprint for transforming stochastic generators into autonomous, performance-driven neural designers, establishing that LLMs can internalize empirical, non-textual rewards to transcend their training data.

연구 동기 및 목표

LLM이 반복적으로 자신의 성공적인 생성물에 대해 학습할 때 자율적으로 새로운 신경 아키텍처를 설계할 수 있는지 모티베이션하고 평가한다.
생성된 PyTorch 코드의 구문적 타당성, 단일 에포크 CIFAR-10 정확도의 초기 학습 신호, 중복 모티프를 피하기 위한 구조적 신 novelty의 세 가지 목표의 균형을 맞춘다.
경험적 아키텍처 우선순위를 내부화하고 다양하고 고품질 디자인 코퍼스를 확장하는 폐쇄 루프 프레임워크를 시연한다.]
method:[

제안 방법

고정된 API 계약 하에서 LLM을 PyTorch 아키텍처의 확률적 생성기로 취급한다.
타당성 검사, 단일 에포크 CIFAR-10 학습, MinHash–Jaccard 신 novelty 필터링을 포함한 22주기 생성-평가-선정-미세조정 루프를 사용한다.
LEMUR 데이터셋에서 초기화된 자가생성 아키텍처에 대해 LoRA로 LLM을 미세조정한다.
저충실도 프록시(첫 에포크 정확도)와 새로움 기준을 통해 생성된 아키텍처를 평가한 후 학습 코퍼스에 추가한다.
반복적 미세조정과 데이터 증가의 효과를 분리하기 위해 고정된 프롬프트, 디코딩, 학습 프로토콜을 유지한다.]
research_questions:[

실험 결과

연구 질문

RQ1자체 성공적인 설계에 대한 반복적 미세조정이 LLM의 합법적이고 고품질이며 구조적으로 새로운 신경 아키텍처를 생성하는 능력을 향상시킬 수 있는가?
RQ2코드 합성을 실행 피드백과 신 novelty 필터링에 기반으로 견고한 아키텍처 선험을 확장 가능한 루프 안에 제공하는가?
RQ3다양한 합성 주기 동안 타당성, 초기 에포크 성능, 디자인 다양성은 어떻게 진화하는가?

주요 결과

Cycle	Valid (%)	Best (%)	Mean (%)	≥40% (%)	Unique Models	Total Train Prompts
1	44.0	47.78	28.06	2.04	1	1698
5	32.0	49.13	29.88	6.82	9	1724
10	53.8	55.48	37.70	38.04	18	1785
15	66.8	58.60	47.40	80.70	34	1911
18	59.1	63.98	50.99	96.81	38	2025
22	41.8	57.62	49.48	92.86	30	2154

22주기 전반에서 유효 생성 비율은 평균 50.6%이며 Wilson CI [45.0%, 56.1%]이다.
평균 첫 에포크 CIFAR-10 정확도는 28.06%에서 50.99%로 상승한다.
40% 이상 정확도를 가진 후보의 비율은 2.04%에서 주기 22에 92.86%로 증가(정점 96.81%).
주기 전반에 걸쳐 455개의 구조적으로 새로운 아키텍처가 발견되어 학습 코퍼스에 추가된다.
총 455개의 고성능 아키텍처가 원래 코퍼스에 없었고 자가생성 세트에 포함되었다.
루프는 여전히 상당한 아키텍처 다양성을 유지하면서 신뢰성과 학습 효율성을 향상시킨다.]
table_headers: ["Cycle", "Valid (%)", "Best (%)", "Mean (%)", ">=40% (%)", "Unique Models", "Total Train Prompts"]
table_rows:[ ["1", "44.0", "47.78", "28.06", "2.04", "1", "1698"], ["5", "32.0", "49.13", "29.88", "6.82", "9", "1724"], ["10", "53.8", "55.48", "37.70", "38.04", "18", "1785"], ["15", "66.8", "58.60", "47.40", "80.70", "34", "1911"], ["18", "59.1", "63.98", "50.99", "96.81", "38", "2025"], ["22", "41.8", "57.62", "49.48", "92.86", "30", "2154"]]} }```{

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.