QUICK REVIEW

[Paper Review] Non-Euclidean Universal Approximation

Anastasis Kratsios, Ievgen Bilokopytov|arXiv (Cornell University)|Jun 3, 2020

Morphological variations and asymmetry6 citations

TL;DR

This paper establishes general conditions under which modifying a neural network's input and output layers—such as using non-Euclidean spaces or binary outputs—preserves its universal approximation capability. It proves that deep networks, including CNNs and hyperbolic networks, remain universal approximators when adapted to Cartan-Hadamard manifolds or deterministic binary classification, and shows that randomizing most layers still yields universal function families with probability one.

ABSTRACT

Modifications to a neural network's input and output layers are often required to accommodate the specificities of most practical learning tasks. However, the impact of such changes on architecture's approximation capabilities is largely not understood. We present general conditions describing feature and readout maps that preserve an architecture's ability to approximate any continuous functions uniformly on compacts. As an application, we show that if an architecture is capable of universal approximation, then modifying its final layer to produce binary values creates a new architecture capable of deterministically approximating any classifier. In particular, we obtain guarantees for deep CNNs and deep feed-forward networks. Our results also have consequences within the scope of geometric deep learning. Specifically, when the input and output spaces are Cartan-Hadamard manifolds, we obtain geometrically meaningful feature and readout maps satisfying our criteria. Consequently, commonly used non-Euclidean regression models between spaces of symmetric positive definite matrices are extended to universal DNNs. The same result allows us to show that the hyperbolic feed-forward networks, used for hierarchical learning, are universal. Our result is also used to show that the common practice of randomizing all but the last two layers of a DNN produces a universal family of functions with probability one. We also provide conditions on a DNN's first (resp. last) few layer's connections and activation function which guarantee that these layers can have a width equal to the input (resp. output) space's dimension while not negatively affecting the architecture's approximation capabilities.

Motivation & Objective

To understand how modifications to input and output layers affect a neural network's universal approximation capability.
To identify general conditions on feature and readout maps that preserve universal approximation in deep architectures.
To extend universal approximation guarantees to non-Euclidean spaces such as symmetric positive definite matrices and hyperbolic spaces.
To analyze the impact of architectural modifications like binary output layers and randomization of early/late layers on approximation power.
To provide theoretical justification for common deep learning practices, such as randomizing all but the last two layers.

Proposed method

Derives general conditions on feature and readout maps that preserve universal approximation in deep neural networks.
Applies these conditions to show that modifying the final layer to produce binary outputs maintains universal approximation for classifiers.
Uses geometric analysis on Cartan-Hadamard manifolds to construct valid non-Euclidean feature and readout maps.
Demonstrates that commonly used non-Euclidean regression models on symmetric positive definite matrices can be extended to universal deep networks.
Applies the framework to hyperbolic feed-forward networks, proving their universality.
Establishes that randomizing all but the last two layers of a DNN yields a universal family of functions with probability one.

Experimental results

Research questions

RQ1Under what conditions do modified input and output layers preserve universal approximation in deep neural networks?
RQ2Can deep networks with non-Euclidean input or output spaces, such as symmetric positive definite matrices or hyperbolic spaces, still achieve universal approximation?
RQ3Does randomizing all but the last two layers of a deep network preserve its universal approximation capability?
RQ4What architectural constraints on early or late layers allow them to have width equal to input/output dimension without harming approximation power?
RQ5How can geometrically meaningful feature and readout maps be constructed for Cartan-Hadamard manifolds to ensure universal approximation?

Key findings

Modifying a universal architecture's final layer to produce binary outputs results in a new architecture capable of deterministically approximating any classifier.
Deep convolutional and feed-forward networks remain universal approximators when adapted to non-Euclidean spaces such as symmetric positive definite matrices.
Hyperbolic feed-forward networks are proven to be universal approximators under the proposed framework.
Randomizing all but the last two layers of a deep neural network results in a universal family of functions with probability one.
A DNN's first or last few layers can have width equal to the input or output space dimension without compromising universal approximation, provided activation functions and connection patterns satisfy the derived conditions.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.