QUICK REVIEW

[论文解读] Relational Multi-Task Learning: Modeling Relations between Data and Tasks

Kaidi Cao, Jiaxuan You|arXiv (Cornell University)|Mar 14, 2023

Machine Learning in Bioinformatics被引用 8

一句话总结

MetaLink 在数据点和任务头之上构建异构知识图谱，将辅助任务的标签转移到目标任务，将关系型多任务学习形式化为图上的链接-标签预测，采用 GNN。

ABSTRACT

A key assumption in multi-task learning is that at the inference time the multi-task model only has access to a given data point but not to the data point's labels from other tasks. This presents an opportunity to extend multi-task learning to utilize data point's labels from other auxiliary tasks, and this way improves performance on the new task. Here we introduce a novel relational multi-task learning setting where we leverage data point labels from auxiliary tasks to make more accurate predictions on the new task. We develop MetaLink, where our key innovation is to build a knowledge graph that connects data points and tasks and thus allows us to leverage labels from auxiliary tasks. The knowledge graph consists of two types of nodes: (1) data nodes, where node features are data embeddings computed by the neural network, and (2) task nodes, with the last layer's weights for each task as node features. The edges in this knowledge graph capture data-task relationships, and the edge label captures the label of a data point on a particular task. Under MetaLink, we reformulate the new task as a link label prediction problem between a data node and a task node. The MetaLink framework provides flexibility to model knowledge transfer from auxiliary task labels to the task of interest. We evaluate MetaLink on 6 benchmark datasets in both biochemical and vision domains. Experiments demonstrate that MetaLink can successfully utilize the relations among different tasks, outperforming the state-of-the-art methods under the proposed relational multi-task learning setting, with up to 27% improvement in ROC AUC.

研究动机与目标

在测试时数据点可能带有辅助任务标签的情况下，激励多任务学习。
提出一种关系型多任务设置，在推断阶段利用辅助任务标签。
引入 MetaLink，通过知识图谱建模 data-task、data-data、task-task 关系。
将对新任务的数据点预测重新表述为图上的链接-标签预测问题。
在生物化学和计算机视觉基准上演示数据高效的改进。

提出的方法

构建一个包含数据节点（嵌入 z^(i)）和任务节点（权重 w_j）的知识图谱，并由以 y_j^(i) 标记的边连接。
将一个任务头作为输入与数据嵌入并列，实现 f_phi(w_j, z^(i)) 的预测。
在数据-任务图上使用异构图神经网络（GNN），通过带有类型感知消息与边特征的 GraphConv 层生成节点嵌入 h_v^(l)。
在消息传递中将边标签 y_v^(u) 作为可训练的边特征，以捕获任务特定的标签信息。
通过多层 GNN 的聚合，使用 EdgePred( h_i^(L), h_j^(L) ) 形成最终预测。
将 MetaLink 应用于关系型、元任务及关系型元设置，包括对未见任务的重标签化策略和新任务节点的归纳初始化。

Figure 1: In the relational multi-task setting, the model learns to incorporate auxiliary knowledge in making predictions to achieve data efficiency. Concretely, given observations $\mathbf{x}^{(i)}$ and their labels $\{y^{(i)}_{j}\}$ (0/1 in this example) on subsets of tasks $\{t_{j}\}$ , the goal

实验结果

研究问题

RQ1在推断时可用的辅助任务标签是否会在多任务 setting 中提升目标任务的预测？
RQ2如何在一个统一的图中对数据点与任务进行建模，使 GNN 能利用数据-任务关系提升性能？
RQ3带有边特征与类别感知消息传递的异构 GNN 是否能有效利用跨任务信息？
RQ4在关系型与元任务设置下，MetaLink 在生物化学和视觉基准上的表现如何，包括未见任务？
RQ5辅助任务标签比例对预测增益的影响如何？

主要发现

Method	Setting	Tox21 (12 tasks)	Sider (27 tasks)	ToxCast (617 tasks)
MPNN (Gilmer et al., 2017)	Standard	80.8 ± 2.4	59.5 ± 3.0	69.1 ± 1.3
DMPNN (Yang et al., 2019)	Standard	82.6 ± 2.3	63.2 ± 2.3	71.8 ± 1.1
MGCN (Lu et al., 2019)	Standard	70.7 ± 1.6	55.2 ± 1.8	66.3 ± 0.9
AttentiveFP (Xiong et al., 2019)	Standard	80.7 ± 2.0	60.5 ± 6.0	57.9 ± 1.0
GROVER(48M) (Rong et al., 2020)	Standard	81.9 ± 2.0	65.6 ± 0.6	72.3 ± 1.0
GROVER(100M) (Rong et al., 2020)	Standard	83.1 ± 2.5	65.8 ± 2.3	73.7 ± 1.0
MetaLink	Relational	83.7 ± 1.9	76.8 ± 3.0	79.4 ± 1.0
MetaLink	Meta	77.5 ± 2.1	57.9 ± 5.0	71.3 ± 2.2
MetaLink	Relational +Meta	79.2 ± 2.9	65.4 ± 4.3	84.3 ± 1.2

在生物化学和视觉基准上，MetaLink 在关系型多任务设置下超越现有基线，ROC AUC 增益高达 27%。
关系型与关系型+元设置带来显著改进，而标准设置在未提供辅助标签时收益可能有限。
消融分析显示对任务间相关性较高的任务，MetaLink 能获得更大提升，证明其学习到了有意义的跨任务关系。
提高辅助任务标签所占比例通常在各数据集上提升性能。
在少样本学习基准（mini-ImageNet, tiered-ImageNet）中，MetaLink 配合 KG 层优于基线，最多使用 2 层 KG 能达到最佳性能。
在 MS-COCO 上，MetaLink 在关系型设置中表现持续提升，且在使用多层 KG 时尤为明显。

Figure 2: Our MetaLink framework allows for modeling four different multi-task learning settings: $\bigcirc$ represent data nodes and $\square$ represent task nodes. Blue represents the data/tasks seen in the training stage and white denotes the data/tasks seen only in the test stage. During model i

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。