QUICK REVIEW

[论文解读] Learning and Leveraging World Models in Visual Representation Learning

Quentin Garrido, Mahmoud Assran|arXiv (Cornell University)|Mar 1, 2024

Advanced Image and Video Retrieval Techniques被引用 7

一句话总结

引入基于联合嵌入预测架构（JEPA）的图像世界模型（IWM），学习可重用的潜在世界模型，以预测变换的影响；展示预测器微调和可控表示抽象，连接对比学习和掩码图像建模方法之间的桥梁。

ABSTRACT

Joint-Embedding Predictive Architecture (JEPA) has emerged as a promising self-supervised approach that learns by leveraging a world model. While previously limited to predicting missing parts of an input, we explore how to generalize the JEPA prediction task to a broader set of corruptions. We introduce Image World Models, an approach that goes beyond masked image modeling and learns to predict the effect of global photometric transformations in latent space. We study the recipe of learning performant IWMs and show that it relies on three key aspects: conditioning, prediction difficulty, and capacity. Additionally, we show that the predictive world model learned by IWM can be adapted through finetuning to solve diverse tasks; a fine-tuned IWM world model matches or surpasses the performance of previous self-supervised methods. Finally, we show that learning with an IWM allows one to control the abstraction level of the learned representations, learning invariant representations such as contrastive methods, or equivariant representations such as masked image modelling.

研究动机与目标

研究如何在 JEPA 框架内学习可重用的图像世界模型（IWM）。
识别成功 IWM 的关键因素：条件化、变换复杂度和预测器容量。
展示下游判别任务的微调协议以及多任务效率。
展示世界模型容量如何塑造表示抽象（不变性与等变性）。

提出的方法

扩展 JEPA，以学习预测变换后表示的潜在空间世界模型。
在选定的增强下，以源数据 x 和目标 y 进行训练；预测器 p_phi 通过最小二乘损失将 z_y 与 z_x 匹配。
通过序列条件化或特征条件化将变换信息置于预测器之上（默认：特征条件化）。
使用平均倒数等级（MRR）对一组增强目标评估世界模型质量。
通过预测器微调（与编码器微调比较）和多任务微调评估下游迁移；研究不变性与等变性范畴。
描述世界模型容量和变换强度如何影响性能与表示抽象。

实验结果

研究问题

RQ1学习得到的潜在世界模型是否可以重复用于提升下游判别式视觉任务？
RQ2预测器条件化、变换复杂性和模型容量如何影响图像世界模型的质量与效用？
RQ3世界模型中的不变性与等变性如何影响下游性能和表示抽象水平？
RQ4IWMs 能否在多个视觉任务上实现高效的多任务微调？

主要发现

将预测器条件化为变换信息是必不可少的；若不进行条件化，MRR 为 0，而序列条件化或特征条件化可获得较高的 MRR（约 0.8）。
更强更复杂的变换以及更深的预测器深度可提升世界模型保真度（更高的 MRR）。
等变性 IWM 使预测器微调更有效，较随机预测器有显著提升，在许多设置中可达到或超过编码器微调的效率。
不变性 IWM 在线性评估中表现更佳，而等变性 IWM 在微调预测器和多任务场景中表现更出色。
使用 IWMs 的预测器微调在参数效率方面优于编码器微调，多任务微调显示跨任务效率提升。
IWMs 提供介于对比学习式（不变）和 MIM 式（等变）范畴之间的表示抽象光谱，实现可控权衡。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。