QUICK REVIEW

[論文レビュー] MoE3D: A Mixture-of-Experts Module for 3D Reconstruction

Zichen Wang, Ang Cao|arXiv (Cornell University)|Jan 8, 2026

Optical measurement and interference techniques被引用数 0

ひとこと要約

MoE3D は VGGT 3D 復元モデルに軽量な Mixture-of-Experts ヘッドを追加し、複数の深度仮説を生成して画素ごとゲーティングで統合することで、境界の不確実性を抑えつつ深度境界をシャープにする。

ABSTRACT

We propose a simple yet effective approach to enhance the performance of feed-forward 3D reconstruction models. Existing methods often struggle near depth discontinuities, where standard regression losses encourage spatial averaging and thus blur sharp boundaries. To address this issue, we introduce a mixture-of-experts formulation that handles uncertainty at depth boundaries by combining multiple smooth depth predictions. A softmax weighting head dynamically selects among these hypotheses on a per-pixel basis. By integrating our mixture model into a pre-trained state-of-the-art 3D model, we achieve a substantial reduction of boundary artifacts and gains in overall reconstruction accuracy. Notably, our approach is highly compute efficient, delivering generalizable improvements even when fine-tuned on a small subset of training data while incurring only negligible additional inference computation, suggesting a promising direction for lightweight and accurate 3D reconstruction.

研究の動機と目的

feed-forward 3D 復元モデルにおける境界ブラーと飛行点（flying-point）アーティファクトの低減。
境界での深度不確実性を Mixture-of-Experts 設計でモデル化。
事前学習済み VGGT にコンパクトな MoE ヘッドを取り付け、深度予測を改善するよう微調整。
エキスパートの専門化を促進するエントロピー正則化で境界をシャープにしつつ効率を維持。

提案手法

画素ごとゲーティングを用いた K 成分混合モデルとして p(D|I) を表現: p(D|I)=sum_k w_k(I) p_k(D|I).
K 個の深度エキスパートから画素ごとの深度平均 mu_k(I) を予測；最終的な深度は mu_k と重み w_k の加重和。
discontinuities での多峰性を捉えるため深度マップの負対数尤度で訓練。
DPT ヘッドを K 個の並列エキスパートブロックと画素ごとの重みを出力するゲーティングネットワークに置換して VGGT を修正。
画素ごとのエキパワーの near one-hot な割り当てを促すようゲーティングに対して逆エントロピー正則化を適用。
凍結を解除しない VGGT バックボーンを維持し、エンドツーエンドで共同の専門化と性能向上を図る。

Figure 2 : Architecture Overview. We extend the VGGT backbone with a Mixture-of-Experts (MoE) head for depth estimation. The MoE head replaces the DPT head with $K$ expert branches and a gating network that dynamically routes features across experts, improving boundary sharpness and reducing flying-

実験結果

リサーチクエスチョン

RQ1画素ごとの深度エキスパートの混合により、重い計算コストをかけずに深度境界のシャープさを改善できるか？
RQ2エントロピー正則化は幾何学的サブ構造（エッジ対表面）のエキスパートの効果的な専門化を促進するか？
RQ3強力な事前学習済み3D バックボーン（VGGT）に MoE3D を付加した場合、モノクロマビューとマルチビュー復元へ与える影響は？
RQ4MoE3D は境界の正確さと飛行点アーティファクトに対して、最先端のフィードフォワード深度モデルと比べてどの程度改善するか？

主な発見

Method	Acc Mean	Acc Med	Comp Mean	Comp Med	NC Mean	NC Med
Ours (NRGBD)	0.055	0.015	0.061	0.017	0.913	0.995
Ours (7Scenes)	0.035	0.015	0.045	0.017	0.800	0.914

MoE3D は境界をシャープにし、VGGT と比較して飛行点アーティファクトを低減。
マルチビュー3D 復元において、MoE3D は屋内シーンで3D予測精度を30%以上改善（表1）。
モノクロ深度推定において MoE3D は境界シャープネスを向上させつつ VGGT の精度を維持（表2）。
計算オーバーヘッドは控えめ：推論計算量約7%の追加、パラメータ約0.79%増、GFLOPs 約5%増。
アブレーションでは MoE 設計、エンドツーエンド微調整、エントロピー正則化を用いた画素空間 MoE が最も良い性能を発揮（図3–7、表4）。
境界評価では NYU-v2、Sintel、NRGBD 全体で MoE3D による深度エッジのよりシャープで整列された結果を示す（表2）。

Figure 3 : Effect of Entropy Regularization. Visualization of gating assignments (argmax) for four experts (red, blue, green, yellow). Without entropy regularization, the experts exhibit weak specialization. Large regularization values ( $\lambda\!\geq\!10^{-3}$ ) cause premature collapse to one or

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。