QUICK REVIEW

[论文解读] Solving ARC visual analogies with neural embeddings and vector arithmetic: A generalized method

Luca H. Thoms, Karel Veldkamp|arXiv (Cornell University)|Jan 1, 2023

Topic Modeling被引用 1

一句话总结

本文提出了一种广义的深度学习方法，用于在抽象与推理语料库（ARC）中解决视觉类比问题，通过变分自编码器（VAE）将图像编码为低维潜在向量，再应用向量运算推断缺失输出。该方法在ARC上达到2%的准确率，在ConceptARC上达到8.8%，展示了无需硬编码规则的简单、连接主义框架在抽象视觉推理任务中的泛化能力。

ABSTRACT

Analogical reasoning derives information from known relations and generalizes this information to similar yet unfamiliar situations. One of the first generalized ways in which deep learning models were able to solve verbal analogies was through vector arithmetic of word embeddings, essentially relating words that were mapped to a vector space (e.g., king – man + woman =__?). In comparison, most attempts to solve visual analogies are still predominantly task-specific and less generalizable. This project focuses on visual analogical reasoning and applies the initial generalized mechanism used to solve verbal analogies to the visual realm. Taking the Abstraction and Reasoning Corpus (ARC) as an example to investigate visual analogy solving, we use a variational autoencoder (VAE) to transform ARC items into low-dimensional latent vectors, analogous to the word embeddings used in the verbal approaches. Through simple vector arithmetic, underlying rules of ARC items are discovered and used to solve them. Results indicate that the approach works well on simple items with fewer dimensions (i.e., few colors used, uniform shapes), similar input-to-output examples, and high reconstruction accuracy on the VAE. Predictions on more complex items showed stronger deviations from expected outputs, although, predictions still often approximated parts of the item's rule set. Error patterns indicated that the model works as intended. On the official ARC paradigm, the model achieved a score of 2% (cf. current world record is 21 %) and on ConceptARC it scored 8.8\%. Although the methodology proposed involves basic dimensionality reduction techniques and standard vector arithmetic, this approach demonstrates promising outcomes on ARC and can easily be generalized to other abstract visual reasoning tasks.

研究动机与目标

开发一种广义的、连接主义的视觉类比推理方法，避免针对特定任务或符号规则的工程设计。
将词嵌入向量算术在语言类比中的成功应用，迁移至视觉领域，借助神经网络嵌入实现。
评估降维与向量运算是否能够捕捉并泛化ARC类任务中的抽象视觉规则。
评估该模型在复杂、少样本视觉推理问题上的表现，输出为开放式、生成式结果。

提出的方法

训练一个自定义的变分自编码器（VAE），将ARC的输入-输出对编码为低维潜在向量，以保留结构与属性层面的信息。
利用输入与输出示例的潜在向量，通过简单的向量算术（例如，输出 - 输入）计算规则向量。
通过将规则向量加到新未解ARC项的输入潜在表示上，将规则向量应用于新输入网格。
使用解码器网络从结果潜在向量中重建预测输出，并应用重缩放以匹配预期的网格尺寸。
在推理过程中，使用多层感知机（MLP）将输入表示与学习到的规则向量结合。
该方法完全可微分且端到端可训练，不包含任何硬编码规则或符号程序归纳。

实验结果

研究问题

RQ1在学习到的视觉嵌入上应用向量算术，能否泛化以解决ARC基准中的抽象视觉类比？
RQ2基于VAE的潜在空间在少样本、开放式推理任务中，能否有效捕捉视觉变换的潜在规则？
RQ3与当前最先进符号或混合模型相比，纯粹连接主义、非符号方法在ARC上的表现如何？
RQ4重建准确率与输入输出相似性如何影响模型推断正确视觉类比的能力？
RQ5该方法能否推广至ARC以外的其他抽象视觉推理任务？

主要发现

该模型在官方ARC基准上达到2%的测试准确率，显著低于当前最先进水平的21%。
在ConceptARC基准上，该模型得分为8.8%，表明其在相关但不同的视觉推理任务中具备一定的泛化能力。
在颜色较少、形状统一且VAE重建准确率较高的简单任务上，性能最强。
在复杂任务中，预测结果与预期输出偏差较大，但常能近似部分规则集，表明模型捕捉到了潜在的结构模式。
误差分析确认模型按预期运行，偏差具有一致性，与规则复杂度和输入输出差异程度相符。
当重建质量较高时，该方法对输入变化表现出鲁棒性，且重缩放提升了预测结果的视觉合理性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。