QUICK REVIEW

[Paper Review] Can neural networks learn persistent homology features?

Guido Montúfar, Nina Otter|arXiv (Cornell University)|Nov 30, 2020

Topological and Geometric Data Analysis4 citations

TL;DR

This paper investigates whether neural networks can learn to approximate persistent homology features—specifically tropical coordinates and binary indicators of points near the diagonal—from raster images. Using MNIST and CIFAR-10 datasets, the authors train CNNs and GNNs to predict these topological features directly from images, achieving high accuracy and significantly faster inference than traditional persistence diagram computation, demonstrating that deep learning can effectively surrogate expensive topological computations for downstream tasks.

ABSTRACT

Topological data analysis uses tools from topology -- the mathematical area that studies shapes -- to create representations of data. In particular, in persistent homology, one studies one-parameter families of spaces associated with data, and persistence diagrams describe the lifetime of topological invariants, such as connected components or holes, across the one-parameter family. In many applications, one is interested in working with features associated with persistence diagrams rather than the diagrams themselves. In our work, we explore the possibility of learning several types of features extracted from persistence diagrams using neural networks.

Motivation & Objective

To investigate whether neural networks can learn to predict persistent homology features directly from images, bypassing traditional, computationally expensive diagram computation.
To evaluate the performance of CNNs and GNNs in approximating key topological features such as tropical coordinates and binary indicators of points near the diagonal in persistence diagrams.
To compare the inference speed of learned features against standard persistence diagram computation, assessing practical utility for downstream machine learning tasks.
To explore the feasibility of using neural networks as a surrogate for topological feature extraction in data analysis pipelines.

Proposed method

The method uses a pipeline: input images → cubical complex construction → filtered cubical complex → boundary matrix → persistence diagram computation via matrix reduction.
Persistent homology features are extracted from diagrams, including tropical coordinates (e.g., mean distance from diagonal) and binary indicators for points within a certain distance from the diagonal.
Neural networks (CNNs and GNNs) are trained to map raw images or intermediate representations (e.g., cubical complexes) directly to these topological features.
For image-to-feature regression, CNNs are trained with early stopping and learning rate reduction; for graph-based tasks, GCN and GIN models are used on filtered cubical complex matrices.
Loss functions include multi-class cross-entropy for classification tasks and mean squared error for regression of tropical coordinates.
Computational efficiency is evaluated by comparing wall-clock time of traditional GUDHI-based diagram computation versus inference time using the trained neural network.

Experimental results

Research questions

RQ1Can neural networks effectively learn to predict persistent homology features such as tropical coordinates and binary indicators from raw images?
RQ2How do the performance and inference speed of neural network-based feature prediction compare to traditional persistence diagram computation?
RQ3Does using raw images as input yield better feature prediction than using intermediate representations like cubical complexes?
RQ4To what extent are the learned features stable and suitable for downstream tasks such as classification?

Key findings

The GCN model achieved test accuracy of approximately 75% on both CC-MNIST and FCC-MNIST for classifying whether a persistence diagram contains a bar of length between 0.1 and 0.3, outperforming the GIN model (51–53%).
For tropical coordinate regression on MNIST, the relative test MSE for the five coordinates ranged from 0.000465 to 0.048468, with the lowest error for the sum of distances of the two furthest points from the diagonal.
On CIFAR-10, the relative test MSE for tropical coordinates ranged from 0.00218928 to 0.00636546, indicating strong regression performance.
The neural network inference time was significantly faster than traditional GUDHI-based persistence diagram computation, offering a practical acceleration for topological feature extraction.
The CNN model for bar prediction on CIFAR-10 achieved test accuracy above 80% for thresholds of 0.15 and 0.3, demonstrating strong generalization for binary classification tasks.
The results show that neural networks can learn stable and informative topological features, suggesting their viability as a surrogate for persistent homology computation in data science pipelines.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.