QUICK REVIEW

[논문 리뷰] Discrete World Models via Regularization

Davide Bizzaro, Luciano Serafini|arXiv (Cornell University)|2026. 03. 02.

AI-based Problem Solving and Planning인용 수 0

한 줄 요약

DWMR은 픽셀 재구성 없이 불리언 세계 모델을 비지도 학습하고, 비트 엔트로피와 독립성을 최대화하며 희소하고 국소적인 전이를 선호하는 규제항을 사용합니다. 이로써 이산적이고 조합적인 환경에서 재구성 기반 기준선보다 우수한 성능을 발휘합니다.

ABSTRACT

World models aim to capture the states and dynamics of an environment in a compact latent space. Moreover, using Boolean state representations is particularly useful for search heuristics and symbolic reasoning and planning. Existing approaches keep latents informative via decoder-based reconstruction, or instead via contrastive or reward signals. In this work, we introduce Discrete World Models via Regularization (DWMR): a reconstruction-free and contrastive-free method for unsupervised Boolean world-model learning. In particular, we introduce a novel world-modeling loss that couples latent prediction with specialized regularizers. Such regularizers maximize the entropy and independence of the representation bits through variance, correlation, and coskewness penalties, while simultaneously enforcing a locality prior for sparse action changes. To enable effective optimization, we also introduce a novel training scheme improving robustness to discrete roll-outs. Experiments on two benchmarks with underlying combinatorial structure show that DWMR learns more accurate representations and transitions than reconstruction-based alternatives. Finally, DWMR can also be paired with an auxiliary reconstruction decoder, and this combination yields additional gains.

연구 동기 및 목표

계획 및 상징적 추론을 지원하는 세계 모델용 컴팩트한 부울 잠재 표현 학습의 동기를 제시한다.
부울 잠재 공간에 맞춘 재구성 없는 학습 목적을 제안한다.
정교하게 설계된 규제항이 잠재 붕괴를 방지하고 정보적이고 서로 분리된 비트 코드를 촉진한다는 것을 보여준다.
DWMR이 재구성 없이 조합적 벤치마크에서 우월한 상태 표현과 전이를 달성함을 보여준다.

제안 방법

관측치를 시그모이드 기반 인코더를 통해 비트 확률을 생성하는 부울 잠재 벡터로 인코딩한다.
현재 잠재 상태와 행동을 조건으로 다음 잠재 상태를 예측하는 예측기 네트워크를 사용한다.
변동성, 상관, 코스큐니스, 변화의 국소성에 대한 규제항과 예측 정확도를 결합한 공동 손실을 최적화한다.
훈련을 안정화하기 위해 EMA 타깃 인코더를 사용하고 이산 입력 학습과 공동 연속 업데이트를 분리하는 두 단계 갱신 스킴을 사용한다.
추가 성능 이점을 얻기 위해 재구성 디코더로 DWMR을 선택적으로 보강한다.

Figure 1: Overview of the model architecture and of the loss function. Encoders map successive observations into a shared Boolean latent space, and a predictor transforms the current latent state into the next, given the action. We illustrate and evaluate this setup on an 8-puzzle benchmark with MNI

실험 결과

연구 질문

RQ1부울 잠재 공간에 특화된 규제가 픽셀 수준의 재구성 없이도 정보적이고 붕괴되지 않는 표현을 생성할 수 있는가?
RQ2비트 단위 엔트로피, 독립성, 그리고 국소성 사전 규칙을 강제하는 것이 이산적 세계 다이나믹스를 모델링하는 데 충분한가?
RQ3인코딩 및 가상 롤아웃 성능 측면에서 DWMR이 조합적 환경에서 재구성 기반 기준선과 어떻게 비교되는가?
RQ4정의된 부울 잠재 공간이 잘 구성된 후에 보조 디코더를 추가하면 성능이 더 향상되는가?

주요 결과

DWMR은 픽셀 재구성 없이도 강력하고 안정적인 인코딩 및 가상 롤아웃 성능을 달성한다.
재구성에 의존하는 기준선(AE, β-VAE, DeepCubeAI)은 DWMR에 비해 성능이 떨어진다.
보조 디코더(DWMR+AE)로 DWMR을 보강하면 추가 이점이 생겨 재구성이 잘 정규화된 잠재 공간 위에서 도움이 될 수 있음을 시사한다.
아블레이션 결과는 변동성, 상관, 코스큐니스, 국소성, 그리고 EMA가 성능과 견고성에 결정적으로 기여한다는 것을 보여준다.

Figure 2: Example transition in IceSlider.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.