QUICK REVIEW

[Paper Review] Multiple instance learning with graph neural networks

Ming Tu, Jing Huang|arXiv (Cornell University)|Jun 12, 2019

Image Retrieval and Classification Techniques15 references55 citations

TL;DR

The paper introduces a novel end-to-end graph neural network (GNN) approach for MIL by treating each bag as a graph, learning bag embeddings via GNNs, and using differentiable pooling or attention to obtain fixed-size representations for bag-level classification, achieving state-of-the-art results on several MIL benchmarks while maintaining interpretability.

ABSTRACT

Multiple instance learning (MIL) aims to learn the mapping between a bag of instances and the bag-level label. In this paper, we propose a new end-to-end graph neural network (GNN) based algorithm for MIL: we treat each bag as a graph and use GNN to learn the bag embedding, in order to explore the useful structural information among instances in bags. The final graph representation is fed into a classifier for label prediction. Our algorithm is the first attempt to use GNN for MIL. We empirically show that the proposed algorithm achieves the state of the art performance on several popular MIL data sets without losing model interpretability.

Motivation & Objective

Motivate MIL as learning bag labels from multiple instances and incorporate relational structure among instances within a bag.
Propose converting each MIL bag into a graph and learning bag embeddings with GNNs.
Develop end-to-end architectures using differentiable pooling (and an attention variant) to produce fixed-size graph embeddings for classification.
Demonstrate superior performance on standard MIL datasets and show interpretability via assignment matrices identifying decisive instances.

Proposed method

Convert each bag of instances into an undirected graph using a distance threshold to form edges.
Apply a GNN to compute node embeddings within the graph (GNN_embd).
Use differentiable pooling to coarsen the graph to a fixed-size representation (GNN_cluster and pooling), enabling a bag embedding.
Optionally apply a second GNN layer on the coarsened graph (GNN_embd2) followed by pooling (max or concatenation) to form the final graph embedding.
Feed the graph embedding into an MLP classifier for bag-level prediction; employ deep supervision by adding auxiliary losses at intermediate stages.
Provide an baseline attention-based graph aggregation variant (attends over node embeddings Z_i) to form a bag embedding.

Experimental results

Research questions

RQ1Can treating MIL bags as graphs and applying GNNs improve bag-level classification accuracy over traditional i.i.d. instance approaches?
RQ2Does differentiable pooling provide superior bag representations and allow interpretable identification of decisive instances within bags?
RQ3How does an attention-based graph aggregation compare to differentiable pooling in MIL with graphs?

Key findings

The proposed GNN-based MIL method achieves higher average accuracy than several baselines (mi-Graph, MI-Net, MI-Net with DS, Attention-MIL, and Attention-MIL with gating) on five MIL benchmarks, with Ours achieving 0.917±0.048 on MUSK1, 0.892±0.011 on MUSK2, 0.679±0.007 on FOX, 0.876±0.015 on TIGER, and 0.903±0.010 on ELEPHANT.
On text categorization tasks, the method competitively outperforms MI-Graph and MI-Net variants, with average improvements across datasets.
In retinal image (Messidor) experiments, the method with graph input (Ours-DP) achieves 74.2% accuracy and 0.77 F1, outperforming several non-graph MIL methods.
The differentiable pooling approach provides heat maps via learned assignment matrices, enabling identification of decisive instances and preserving interpretability.
Graph-based MIL consistently benefits from incorporating structure among instances within bags, supporting the claim that non-iid within-bag relationships improve performance.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.