[논문 리뷰] Design-MLLM: A Reinforcement Alignment Framework for Verifiable and Aesthetic Interior Design
Design-MLLM은 공간적 실행 가능성을 미적 선호도와 분리하여 실행 가능하고 미적으로 일관된 실내 디자인을 생성하는 강화 학습 기반 정렬 프레임워크를 소개한다.
Interior design is a requirements-to-visual-plan generation process that must simultaneously satisfy verifiable spatial feasibility and comparative aesthetic preferences. While recent multimodal large language models (MLLMs) offer a unified foundation for interpreting user intent and producing design rationales, our empirical analysis reveals a persistent contradiction in real-world deployment: MLLMs often produce layouts that are unbuildable and aesthetically inconsistent. These findings indicate that simply adding in-domain text is insufficient; effective interior design requires an alignment mechanism that separates hard constraints from soft preferences and coordinates them during optimization. To address this, we propose Design-MLLM, a reinforcement alignment framework that optimizes a feasibility-first preference objective via a dual-branch, aesthetic-oriented reward. Specifically, Design-MLLM (i) explicitly evaluates spatial feasibility using programmatic constraint checks, (ii) assesses aesthetic preference only among feasible candidates to avoid visually appealing but unexecutable shortcuts, and (iii) performs group-relative optimization to obtain stable preference signals. Through this process, Design-MLLM learns a controllable policy that consistently selects and generates solutions that are both executable and aesthetically coherent, rather than occasionally producing visually appealing but infeasible designs. Extensive experiments on various benchmark datasets demonstrate the advantages of Design-MLLM.
연구 동기 및 목표
- Diagnose why generic multimodal LLMs struggle with deployable interior design due to feasibility–aesthetics conflicts.
- Propose a reinforcement alignment framework that decouples hard spatial constraints from soft aesthetic preferences.
- Develop a feasibility-guided generation pipeline and a dual-branch reward to learn a controllable, policy-based design generator.
- Demonstrate improvements in both spatial executability and aesthetic alignment across multiple benchmarks.
제안 방법
- Feasibility-guided candidate generation that produces a group of design candidates and verifies geometric feasibility via constraint checks.
- A dual-branch aesthetic-oriented reward that separately evaluates spatial feasibility and aesthetic preference among feasible candidates.
- GRPO-style group-relative policy optimization to learn more aesthetic solutions within the feasible domain.
- Layout-to-image realization that translates optimized structural plans into high-fidelity renderings.
실험 결과
연구 질문
- RQ1What are the fundamental obstacles that prevent MLLMs from reliably generating verifiable interior design layouts?
- RQ2Can an alignment framework decouple hard spatial constraints from soft aesthetic preferences to improve both feasibility and aesthetics?
- RQ3How can policy optimization leverage groupwise comparisons to learn designs that are executable and aesthetically coherent?
- RQ4Does a feasibility-first, dual-branch reward approach outperform single-branch or naive prompting strategies in interior design tasks?
주요 결과
- Design-MLLM consistently improves spatial executability by enforcing feasibility checks before aesthetics.
- Aesthetic evaluation is constrained to feasible designs to avoid shortcuts that are visually appealing but unbuildable.
- Group-relative policy optimization (GRPO) provides stable learning signals by normalizing rewards within candidate groups.
- A dual-branch reward enables learning a controllable policy that favors more aesthetic designs within the feasible region.
- Ablations confirm the necessity of decoupling feasibility and aesthetics and of using a group-based optimization signal.
더 나은 연구,지금 바로 시작하세요
연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.
카드 등록 없음 · 무료 플랜 제공
이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.