QUICK REVIEW

[論文レビュー] Rethinking Importance Weighting for Deep Learning under Distribution Shift

Tongtong Fang, Nan Lu|arXiv (Cornell University)|Jun 8, 2020

Domain Adaptation and Few-Shot Learning被引用数 34

ひとこと要約

This paper introduces Dynamic Importance Weighting (DIW) to address distribution shift in deep learning by iteratively aligning weighted training with validation data through a learned feature transformation and end-to-end optimization.

ABSTRACT

Under distribution shift (DS) where the training data distribution differs from the test one, a powerful technique is importance weighting (IW) which handles DS in two separate steps: weight estimation (WE) estimates the test-over-training density ratio and weighted classification (WC) trains the classifier from weighted training data. However, IW cannot work well on complex data, since WE is incompatible with deep learning. In this paper, we rethink IW and theoretically show it suffers from a circular dependency: we need not only WE for WC, but also WC for WE where a trained deep classifier is used as the feature extractor (FE). To cut off the dependency, we try to pretrain FE from unweighted training data, which leads to biased FE. To overcome the bias, we propose an end-to-end solution dynamic IW that iterates between WE and WC and combines them in a seamless manner, and hence our WE can also enjoy deep networks and stochastic optimizers indirectly. Experiments with two representative types of DS on three popular datasets show that our dynamic IW compares favorably with state-of-the-art methods.

研究の動機と目的

Motivate and address distribution shift in deep learning where training and test distributions differ.
Identify circular dependency between weight estimation and classifier learning in traditional importance weighting.
Propose a dynamic, end-to-end solution that jointly updates weights and model parameters.
Show that DIW can leverage deep feature extraction to improve robustness under DS across datasets and shift types.

提案手法

Formulate weighted classification using a density-ratio w*(x,y)=p_te/p_tr to correct training risk.
Introduce a non-linear data transformation pi(x,y) to simplify weight estimation while preserving information.
Adopt kernel mean matching (MMD) to compute weights by matching transformed training and validation distributions.
Propose DIW that alternates between updating weights W via WE and updating classifier f_theta via WC in mini-batches.
Use pi as either hidden-layer outputs or loss values to realize practical, invertible transformations.
Implement distribution matching in a mini-batch using kernel embeddings and solve a constrained QP to obtain W.

実験結果

リサーチクエスチョン

RQ1How can we perform importance weighting effectively with deep models under distribution shift?
RQ2Can an end-to-end dynamic updating of weights stabilize and improve deep learning under DS compared to static IW?
RQ3What role does a non-linear transformation pi play in enabling accurate weight estimation for deep networks?
RQ4How does DIW perform under covariate shift, class-prior shift, and label noise?

主な発見

DIW outperforms baselines (IW, Reweight, Uniform, Random, and Clean/Truth) under label noise and class-prior shift on Fashion-MNIST and CIFAR datasets.
The learned weights in DIW closely approximate true/optimal weights, reducing bias and improving transferability.
DIW-derived embeddings show more compact, discriminative structure than IW or SIW, indicating better denoising and clustering of intact vs mislabeled data.
Pretraining FE and dynamic weight updates (DIW) yield better performance than static pipelines or single-pass reweighting.
DIW demonstrates robustness to label noise, maintaining higher accuracy as noise rates increase.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。