QUICK REVIEW

[論文レビュー] MicroVerse: A Preliminary Exploration Toward a Micro-World Simulation

Rongsheng Wang, Minghao Wu|arXiv (Cornell University)|Feb 28, 2026

Computer Graphics and Visualization Techniques被引用数 0

ひとこと要約

The paper introduces MicroWorldBench for microscale video simulation, creates MicroSim-10K expert-verified data, and presents MicroVerse, a microscale video generator that improves scientific fidelity and subcellular task performance.

ABSTRACT

Recent advances in video generation have opened new avenues for macroscopic simulation of complex dynamic systems, but their application to microscopic phenomena remains largely unexplored. Microscale simulation holds great promise for biomedical applications such as drug discovery, organ-on-chip systems, and disease mechanism studies, while also showing potential in education and interactive visualization. In this work, we introduce MicroWorldBench, a multi-level rubric-based benchmark for microscale simulation tasks. MicroWorldBench enables systematic, rubric-based evaluation through 459 unique expert-annotated criteria spanning multiple microscale simulation task (e.g., organ-level processes, cellular dynamics, and subcellular molecular interactions) and evaluation dimensions (e.g., scientific fidelity, visual quality, instruction following). MicroWorldBench reveals that current SOTA video generation models fail in microscale simulation, showing violations of physical laws, temporal inconsistency, and misalignment with expert criteria. To address these limitations, we construct MicroSim-10K, a high-quality, expert-verified simulation dataset. Leveraging this dataset, we train MicroVerse, a video generation model tailored for microscale simulation. MicroVerse can accurately reproduce complex microscale mechanism. Our work first introduce the concept of Micro-World Simulation and present a proof of concept, paving the way for applications in biology, education, and scientific visualization. Our work demonstrates the potential of educational microscale simulations of biological mechanisms. Our data and code are publicly available at https://github.com/FreedomIntelligence/MicroVerse

研究の動機と目的

Define the Micro-World Simulation concept and establish a proof of concept for microscale video generation.
Develop MicroWorldBench, a rubric-based benchmark with expert-annotated criteria for microscale simulations.
Construct MicroSim-10K, a large expert-verified microscale video dataset to ground models in physics and biology.
Train and evaluate MicroVerse on MicroWorldBench, demonstrating improvements in scientific fidelity and subcellular-level tasks.

提案手法

Create a rubric-based MicroWorldBench with 459 tasks across organ-, cellular-, and subcellular-level microscale processes.
Use GPT-4o and biology experts to generate and refine task rubrics and evaluation criteria.
Assemble MicroSim-10K by collecting YouTube microscale videos, segmenting into clips, filtering with VideoMAE classifier, and expert validation.
Fine-tune Wan2.1-1.3B (and scale to Wan2.1-14B with mixed-domain data) on MicroSim-10K using a diffusion-based video generation framework with classifier-free guidance.
Evaluate models with a rubric-guided LLM judge (GPT-5) and human evaluators, reporting scientific fidelity, visual quality, and instruction following.

実験結果

リサーチクエスチョン

RQ1教育・生物医療の可視化のための微小世界シミュレーションの実現可能性と妥当性はどれくらいか。
RQ2臓器レベル・細胞レベル・サブ細胞レベルのタスクを横断して、ルーブリックベースのベンチマーク（MicroWorldBench）が微視的動画生成を信頼性高く評価できるか。
RQ3微視界に特化したデータセット（MicroSim-10K）で学習することが、生成動画の科学的忠実度と生物学的妥当性を向上させるか。
RQ4MicroVerseはオープンソースおよび商用のベースラインと比較して、特にサブ細胞レベルのタスクでどのように性能が異なるか。

主な発見

Model	Average	Organ-level	Cellular-level	Subcellular-level
HunyuanVideo	23.2	23.1	23.8	19.4
CogVideoX-5B	43.5	39.9	47.0	38.6
Wan2.1-T2V-1.3B	49.4	45.9	51.7	52.4
Wan2.2-TI2V-5B	51.6	46.6	53.9	49.5
Wan2.1-T2V-14B	54.8	55.7	54.4	52.8
Wan2.2-T2V-A14B	53.8	56.3	52.0	53.3
MicroVerse-1.3B (Ours)	50.2	47.6	51.7	53.3
Sora	50.7	55.9	46.1	55.0
Veo3	77.2	77.5	76.9	78.2
Model	Average	Scientific Fidelity	Visual Quality	Instruction Following
HunyuanVideo	23.2	15.6	48.2	23.4
CogVideoX-5B	43.5	37.4	64.1	38.6
Wan2.1-T2V-1.3B	49.4	40.3	71.8	50.1
Wan2.2-TI2V-5B	51.6	40.7	82.7	47.0
Wan2.1-T2V-14B	54.8	42.7	86.0	53.8
Wan2.2-T2V-A14B	53.8	37.8	92.8	55.4
MicroVerse-1.3B (Ours)	50.2	43.0	68.5	49.3
Sora	50.7	35.3	96.4	37.9
Veo3	77.2	65.7	97.0	77.0

MicroVerseは、MicroWorldBenchにおける全オープンソースベースラインよりも高い科学的忠実度を達成している（43.0）。
MicroVerseはオープンソースモデルの中でサブ細胞レベルの性能で最高を記録している（53.3）。
完全なモデルスケーリング（Wan2.1-14B）と混合ドメイン学習により、臓器・細胞・サブ細胞レベルのタスクで従来のオープンソース結果を上回る。
MicroWorldBenchは多くのモデルが視覚的一貫性を保つ一方、微視的な物理・生物学的妥当性には苦労していることを示す。
MicroSim-10Kには9,601件の専門家検証済み微視動画クリップとキャプションが含まれ、実顕微鏡画像へのFVDは123.9で、分布の整合性が近いことを示す。
人間による評価は、科学的忠実度でMicroVerseがWan2.1-1.3Bベースラインを上回ることを示し、評価者間の高い一致度（ Cohen’s kappa > 0.80）を示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。