QUICK REVIEW

[論文レビュー] Improving neural network representations using human similarity judgments

Lukas Muttenthaler, Lorenz Linhardt|arXiv (Cornell University)|Jun 7, 2023

Domain Adaptation and Few-Shot Learning被引用数 10

ひとこと要約

本論文は、gLocal transformを導入し、ニューラル表現のグローバル構造を人間の類似性判断と一致させつつ、局所構造を保持し、few-shot 学習と異常検知を改善する一方で局所近傍の保持を損なわない。

ABSTRACT

Deep neural networks have reached human-level performance on many computer vision tasks. However, the objectives used to train these networks enforce only that similar images are embedded at similar locations in the representation space, and do not directly constrain the global structure of the resulting space. Here, we explore the impact of supervising this global structure by linearly aligning it with human similarity judgments. We find that a naive approach leads to large changes in local representational structure that harm downstream performance. Thus, we propose a novel method that aligns the global structure of representations while preserving their local structure. This global-local transform considerably improves accuracy across a variety of few-shot learning and anomaly detection tasks. Our results indicate that human visual representations are globally organized in a way that facilitates learning from few examples, and incorporating this global structure into neural network representations improves performance on downstream tasks.

研究の動機と目的

Investigate whether explicit global alignment to human similarity improves downstream transfer.
Develop a transform that combines global alignment with local structure preservation.
Evaluate how global-local alignment affects few-shot learning and anomaly detection across diverse models and datasets.
Assess whether the gLocal transform maintains alignment with human similarity judgments while enhancing task performance.

提案手法

Define a global alignment loss that matches model similarities to human triplet-based judgments via a softmax likelihood over triplets.
Compare a naive linear transform that maximizes global alignment with a regularized global transform toward a scaled identity.
Introduce a local loss that preserves the original space's neighborhood structure using a contrastive objective between untransformed and transformed spaces.
Combine global alignment and locality-preserving loss into the gLocal objective with a regularization term on the transformation matrix.
Embed ImageNet representations in the penultimate layer and optimize W, b to minimize a weighted sum of global and local losses.
Evaluate hyperparameters (alpha, lambda, tau) via grid search to balance alignment and local structure.

実験結果

リサーチクエスチョン

RQ1Does aligning the global structure of representations with human similarity judgments improve downstream task performance?
RQ2Can a regularized transformation preserve local structure while achieving global alignment?
RQ3How does gLocal perform on few-shot learning and anomaly detection compared to naive and original representations?
RQ4Do gLocal-aligned representations maintain alignment with human similarity judgments across multiple human datasets?

主な発見

The gLocal transform preserves local neighborhood structure while incorporating global human-aligned structure.
Naive global alignment can harm downstream performance; gLocal mitigates this by adding locality regularization.
gLocal consistently improves few-shot learning and anomaly detection across several CLIP-based models and datasets.
Representational alignment to human judgments using gLocal is comparable to naive alignment, despite preserving local structure.
gLocal gains are robust across multiple human similarity datasets and do not incur large losses in human-alignment metrics.

(b) Downstream task performance vs. human alignment.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。