QUICK REVIEW

[论文解读] Design-MLLM: A Reinforcement Alignment Framework for Verifiable and Aesthetic Interior Design

Yuxuan Yang, Xiaotong Mao|arXiv (Cornell University)|Mar 4, 2026

3D Shape Modeling and Analysis被引用 0

一句话总结

tldr: Design-MLLM introduces a reinforcement learning–based alignment framework that decouples spatial feasibility from aesthetic preferences to generate interior designs that are both executable and aesthetically coherent.

ABSTRACT

Interior design is a requirements-to-visual-plan generation process that must simultaneously satisfy verifiable spatial feasibility and comparative aesthetic preferences. While recent multimodal large language models (MLLMs) offer a unified foundation for interpreting user intent and producing design rationales, our empirical analysis reveals a persistent contradiction in real-world deployment: MLLMs often produce layouts that are unbuildable and aesthetically inconsistent. These findings indicate that simply adding in-domain text is insufficient; effective interior design requires an alignment mechanism that separates hard constraints from soft preferences and coordinates them during optimization. To address this, we propose Design-MLLM, a reinforcement alignment framework that optimizes a feasibility-first preference objective via a dual-branch, aesthetic-oriented reward. Specifically, Design-MLLM (i) explicitly evaluates spatial feasibility using programmatic constraint checks, (ii) assesses aesthetic preference only among feasible candidates to avoid visually appealing but unexecutable shortcuts, and (iii) performs group-relative optimization to obtain stable preference signals. Through this process, Design-MLLM learns a controllable policy that consistently selects and generates solutions that are both executable and aesthetically coherent, rather than occasionally producing visually appealing but infeasible designs. Extensive experiments on various benchmark datasets demonstrate the advantages of Design-MLLM.

研究动机与目标

Diagnose why generic multimodal LLMs struggle with deployable interior design due to feasibility–aesthetics conflicts.
Propose a reinforcement alignment framework that decouples hard spatial constraints from soft aesthetic preferences.
Develop a feasibility-guided generation pipeline and a dual-branch reward to learn a controllable, policy-based design generator.
Demonstrate improvements in both spatial executability and aesthetic alignment across multiple benchmarks.

提出的方法

Feasibility-guided candidate generation that produces a group of design candidates and verifies geometric feasibility via constraint checks.
A dual-branch aesthetic-oriented reward that separately evaluates spatial feasibility and aesthetic preference among feasible candidates.
GRPO-style group-relative policy optimization to learn more aesthetic solutions within the feasible domain.
Layout-to-image realization that translates optimized structural plans into high-fidelity renderings.

实验结果

研究问题

RQ1What are the fundamental obstacles that prevent MLLMs from reliably generating verifiable interior design layouts?
RQ2Can an alignment framework decouple hard spatial constraints from soft aesthetic preferences to improve both feasibility and aesthetics?
RQ3How can policy optimization leverage groupwise comparisons to learn designs that are executable and aesthetically coherent?
RQ4Does a feasibility-first, dual-branch reward approach outperform single-branch or naive prompting strategies in interior design tasks?

主要发现

Design-MLLM consistently improves spatial executability by enforcing feasibility checks before aesthetics.
Aesthetic evaluation is constrained to feasible designs to avoid shortcuts that are visually appealing but unbuildable.
Group-relative policy optimization (GRPO) provides stable learning signals by normalizing rewards within candidate groups.
A dual-branch reward enables learning a controllable policy that favors more aesthetic designs within the feasible region.
Ablations confirm the necessity of decoupling feasibility and aesthetics and of using a group-based optimization signal.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。