QUICK REVIEW

[論文レビュー] Discrete World Models via Regularization

Davide Bizzaro, Luciano Serafini|arXiv (Cornell University)|Mar 2, 2026

AI-based Problem Solving and Planning被引用数 0

ひとこと要約

DWMR はピクセル再構成なしで教師なしのブール世界モデルを学習し、正則化項を用いてビットエントロピーと独立性を最大化しつつ、局所かつまばらな遷移を優先することで、離散的で組合せ的な環境で再構成ベースのベースラインを上回る。

ABSTRACT

World models aim to capture the states and dynamics of an environment in a compact latent space. Moreover, using Boolean state representations is particularly useful for search heuristics and symbolic reasoning and planning. Existing approaches keep latents informative via decoder-based reconstruction, or instead via contrastive or reward signals. In this work, we introduce Discrete World Models via Regularization (DWMR): a reconstruction-free and contrastive-free method for unsupervised Boolean world-model learning. In particular, we introduce a novel world-modeling loss that couples latent prediction with specialized regularizers. Such regularizers maximize the entropy and independence of the representation bits through variance, correlation, and coskewness penalties, while simultaneously enforcing a locality prior for sparse action changes. To enable effective optimization, we also introduce a novel training scheme improving robustness to discrete roll-outs. Experiments on two benchmarks with underlying combinatorial structure show that DWMR learns more accurate representations and transitions than reconstruction-based alternatives. Finally, DWMR can also be paired with an auxiliary reconstruction decoder, and this combination yields additional gains.

研究の動機と目的

計画とシンボリック推論をサポートする世界モデルのためのコンパクトなブール潜在表現の学習を動機づける。
ブール潜在に特化した再構成不要の学習目的を提案する。
潜在崩壊を防ぎ、有益で分離されたビットコードを促進するよう設計された正則化の効果を示す。
ピクセル再構成なしで、組合せベンチマークでDWMRが状態表現と遷移を優越することを示す。

提案手法

シグモイドベースのエンコーダにより観測をブール潜在ベクトルへエンコードし、ビット確率を出力する。
predictor ネットワークを用いて現在の潜在状態と行動に条件付けられた次の潜在状態を予測する。
予測精度と分散・相関・コスークネス・変化の局在性の正則化を組み合わせた結合損失を最適化する。
トレーニングを安定化させるためにEMAターゲットエンコーダを使用し、離散入力訓練と連結した連続更新を分離する二段階更新スキームを採用する。
必要に応じてDWMR に再構成デコーダを追加して追加の性能利得を得る。

Figure 1: Overview of the model architecture and of the loss function. Encoders map successive observations into a shared Boolean latent space, and a predictor transforms the current latent state into the next, given the action. We illustrate and evaluate this setup on an 8-puzzle benchmark with MNI

実験結果

リサーチクエスチョン

RQ1ブール潜在空間に特化した正則化はピクセルレベルの再構成なしで有益で崩壊しない表現を生み出せるか。
RQ2ビット単位のエントロピー・独立性・局所性の事前を課すだけで離散世界ダイナミクスを十分にモデル化できるか。
RQ3DWMR は組合せ環境において、エンコードとイメージングロールアウトの性能の点で再構成ベースのベースラインと比べてどうか。
RQ4ブール潜在空間が十分に構造化された後、補助デコーダを追加すると性能がさらに向上するか。

主な発見

ピクセル再構成なしで強力かつ安定したエンコードと想像的ロールアウト性能を達成する。
AE、β-VAE、DeepCubeAI など再構成に依存するベースラインは DWMR に及ばない。
DWMR に補助デコーダを追加した DWMR+AE はさらなる利得を生み出し、正則化された潜在空間の上で再構成が有用になり得ることを示す。
バリアント実験は分散、相関、コスークネス、局在性、EMA が性能と頑健性に重要に寄与することを示す。

Figure 2: Example transition in IceSlider.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。