[논문 리뷰] Self-Supervised Generalisation with Meta Auxiliary Learning
MAXL은 메타 학습된 라벨 생성기와 다중 작업 학습기를 통해 추가 데이터 없이도 주된 작업의 일반화 성능을 향상시키기 위해 보조 라벨을 자동으로 학습합니다.
Learning with auxiliary tasks can improve the ability of a primary task to generalise. However, this comes at the cost of manually labelling auxiliary data. We propose a new method which automatically learns appropriate labels for an auxiliary task, such that any supervised learning task can be improved without requiring access to any further data. The approach is to train two neural networks: a label-generation network to predict the auxiliary labels, and a multi-task network to train the primary task alongside the auxiliary task. The loss for the label-generation network incorporates the loss of the multi-task network, and so this interaction between the two networks can be seen as a form of meta learning with a double gradient. We show that our proposed method, Meta AuXiliary Learning (MAXL), outperforms single-task learning on 7 image datasets, without requiring any additional data. We also show that MAXL outperforms several other baselines for generating auxiliary labels, and is even competitive when compared with human-defined auxiliary labels. The self-supervised nature of our method leads to a promising new direction towards automated generalisation. Source code can be found at https://github.com/lorenmt/maxl.
연구 동기 및 목표
- 보조 학습을 유도하여 일반화를 manual 보조 라벨 없이 향상시키는 동기를 제공합니다.
- 자체 감독 프레임워크를 제안하여 보조 라벨을 자동으로 생성합니다.
- MAXL이 여러 이미지 데이터셋에서 주된 작업의 정확도를 향상시킴을 보입니다.
제안 방법
- Two-network MAXL architecture: a multi-task network for primary and auxiliary tasks, and a label-generation network for auxiliary labels.
- Hierarchical auxiliary label structure per primary class with masked SoftMax (Mask SoftMax) to enforce class-wise auxiliary mapping.
- Meta-learning gradient flow where the label-generator is trained via the primary-task performance (second-derivative Hessian trick).
- Entropy regularisation on auxiliary label distribution to avoid collapsing auxiliary labels.
- Focal loss used for both primary and auxiliary tasks to focus on hard examples.
- Training alternates between updating the multi-task network with generated auxiliary labels and updating the label-generator via the primary-task performance.
실험 결과
연구 질문
- RQ1Can a self-generated auxiliary label space improve primary-task generalisation without any extra data?
- RQ2How effective are automatically generated auxiliary labels compared to random, unsupervised clustering, or human-defined auxiliaries?
- RQ3Does incorporating a hierarchical auxiliary label structure help or hinder performance across datasets?
- RQ4What are the dynamics of gradient similarity between auxiliary and primary losses when using MAXL?
- RQ5Can MAXL approach or match performance of human-defined auxiliary labels without supervision?
주요 결과
| 데이터셋 | 백본 | 단일 작업 | MAXL psi=2 | MAXL psi=3 | MAXL psi=5 | MAXL psi=10 |
|---|---|---|---|---|---|---|
| MNIST | 4-layer ConvNet | 99.57 ± 0.02 | 99.56 ± 0.04 | 99.71 ± 0.02 | 99.59 ± 0.03 | 99.57 ± 0.02 |
| SVHN | 4-layer ConvNet | 94.05 ± 0.07 | 94.39 ± 0.08 | 94.38 ± 0.07 | 94.59 ± 0.12 | 94.41 ± 0.09 |
| CIFAR-10 | VGG-16 | 92.77 ± 0.13 | 93.27 ± 0.09 | 93.47 ± 0.08 | 93.49 ± 0.05 | 93.10 ± 0.08 |
| ImageNet | VGG-16 | 46.67 ± 0.12 | 46.82 ± 0.14 | 46.97 ± 0.10 | 47.02 ± 0.11 | 46.85 ± 0.11 |
| CINIC-10 | ResNet-32 | 85.12 ± 0.08 | 85.66 ± 0.07 | 85.72 ± 0.07 | 85.83 ± 0.08 | 85.80 ± 0.10 |
| UCF-101 | ResNet-32 | 53.15 ± 0.12 | 54.19 ± 0.18 | 55.39 ± 0.16 | 54.70 ± 0.12 | 54.32 ± 0.18 |
- MAXL outperforms single-task learning on seven image datasets using the same labelled data.
- MAXL surpasses baseline auxiliary-label generation methods (Random, K-Means) and is competitive with Human-defined auxiliary labels on CIFAR-100.
- Across CIFAR-100 with hierarchies, MAXL maintains high auxiliary-gradient usefulness (positive cosine similarity) throughout training, unlike fixed-label baselines.
- MAXL yields improved primary-task separability in t-SNE visualisations compared to Single Task, and approaches the separation achieved with human auxiliary labels.
- The method remains robust across a range of hierarchies (psi values) without needing dataset-specific tuning.
더 나은 연구,지금 바로 시작하세요
연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.
카드 등록 없음 · 무료 플랜 제공
이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.