QUICK REVIEW

[Paper Review] Concrete Autoencoders for Differentiable Feature Selection and Reconstruction

Abubakar Abid, Muhammad Fatih Balin|arXiv (Cornell University)|Jan 27, 2019

Gene expression and cancer classification24 references76 citations

TL;DR

Introduces a differentiable framework (concrete autoencoder) for unsupervised global feature selection using a Concrete selector layer, optimizing reconstruction from a reduced feature set. Demonstrates improved reconstruction and imputation performance, including a large L1000 gene expression case study.

ABSTRACT

We introduce the concrete autoencoder, an end-to-end differentiable method for global feature selection, which efficiently identifies a subset of the most informative features and simultaneously learns a neural network to reconstruct the input data from the selected features. Our method is unsupervised, and is based on using a concrete selector layer as the encoder and using a standard neural network as the decoder. During the training phase, the temperature of the concrete selector layer is gradually decreased, which encourages a user-specified number of discrete features to be learned. During test time, the selected features can be used with the decoder network to reconstruct the remaining input features. We evaluate concrete autoencoders on a variety of datasets, where they significantly outperform state-of-the-art methods for feature selection and data reconstruction. In particular, on a large-scale gene expression dataset, the concrete autoencoder selects a small subset of genes whose expression levels can be use to impute the expression levels of the remaining genes. In doing so, it improves on the current widely-used expert-curated L1000 landmark genes, potentially reducing measurement costs by 20%. The concrete autoencoder can be implemented by adding just a few lines of code to a standard autoencoder.

Motivation & Objective

Identify a subset of informative features in an unsupervised setting while enabling reconstruction of the full data.
Develop an end-to-end differentiable method that selects discrete features via a relaxed, differentiable layer.
Enable minimal reconstruction error using a user-specified number of features across diverse datasets.
Showcase scalability and interpretability benefits, including gene expression inference.

Proposed method

Use a concrete selector layer as the encoder to select k input features through Concrete random variables.
Train with a temperature parameter T that anneals over time to converge from soft to discrete feature selection.
Employ a standard (potentially deep) decoder to reconstruct the full input from the selected features.
Leverage the reparameterization trick to backpropagate through the stochastic feature selection.
Optionally compare with linear or non-linear decoders to assess reconstruction performance.
Provide implementation guidance and public code for reproducibility.

Experimental results

Research questions

RQ1Can a differentiable, end-to-end model identify a subset of input features that minimizes reconstruction error?
RQ2How does annealing the temperature in the Concrete selector layer affect feature selection quality and reconstruction performance?
RQ3Do the selected features generalize across datasets and reconstruction architectures (linear vs. non-linear decoders)?
RQ4Can the method scale to high-dimensional data and large sample sizes (e.g., gene expression)?

Key findings

The concrete autoencoder consistently outperformed other feature selection methods on multiple datasets in reconstruction tasks.
With a non-linear decoder, it achieves lower reconstruction error and higher classification accuracy than competing methods across ISOLET and other datasets.
Using a linear decoder, the concrete autoencoder still yields the lowest reconstruction error on most datasets.
In a large-scale gene expression case study, selecting ~943 genes with a linear decoder reduces the set to ~750 genes while maintaining comparable or improved imputation accuracy versus landmark genes.
The approach identifies related feature groups (e.g., localized pixel groups in MNIST) and yields interpretable feature clusters.
Code and experiments are publicly available for reproducibility.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.