QUICK REVIEW

[论文解读] From data to functa: Your data point is a function and you can treat it like one

Emilien Dupont, Hyunjik Kim|arXiv (Cornell University)|Jan 28, 2022

3D Shape Modeling and Analysis被引用 23

一句话总结

该论文提出 functa，一种框架将数据点视为由小幅调制参数化的隐式神经表示(INRs)，实现跨模态的直接对函数进行可扩展的深度学习，并通过元学习管线创建大规模 functaset 数据集。

ABSTRACT

It is common practice in deep learning to represent a measurement of the world on a discrete grid, e.g. a 2D grid of pixels. However, the underlying signal represented by these measurements is often continuous, e.g. the scene depicted in an image. A powerful continuous alternative is then to represent these measurements using an implicit neural representation, a neural function trained to output the appropriate measurement value for any input spatial location. In this paper, we take this idea to its next level: what would it take to perform deep learning on these functions instead, treating them as data? In this context we refer to the data as functa, and propose a framework for deep learning on functa. This view presents a number of challenges around efficient conversion from data to functa, compact representation of functa, and effectively solving downstream tasks on functa. We outline a recipe to overcome these challenges and apply it to a wide range of data modalities including images, 3D shapes, neural radiance fields (NeRF) and data on manifolds. We demonstrate that this approach has various compelling properties across data modalities, in particular on the canonical tasks of generative modeling, data imputation, novel view synthesis and classification. Code: https://github.com/deepmind/functa

研究动机与目标

Motivate representing signals as continuous functions via implicit neural representations (INRs) instead of discrete arrays.
Introduce functa as datasets of INRs and decouple data-to-function conversion from downstream tasks.
Propose a scalable method using modulations and meta-learning to create large functa datasets across modalities.
Demonstrate downstream tasks on functa, including generative modeling, data imputation, novel view synthesis, and classification.

提出的方法

Represent each data point as an INR f_theta trained to minimize mean squared error over coordinates and features (Equation 1).
Parameterize INRs via low-dimensional modulations (shift modulations and latent modulations) applied to a shared base network (SIREN) to obtain functa.
Use meta-learning (MAML/CAVIA-style) to learn a base network initialization and then fit per-datapoint modulations in a few gradient steps, creating large functaset datasets.
Train downstream models directly on the modulation vectors using normalizing flows and diffusion models for generative tasks, and simple MLPs for classification.
Apply MAP inference over the learned modulation prior combined with reconstruction loss for imputation and novel view synthesis (Equation 2).
Demonstrate across modalities: images (CelebA-HQ 64×64), voxels (ShapeNet 64^3), NeRF scenes (SRN Cars), and manifold data (ERA5 temperature).

实验结果

研究问题

RQ1Can data points across diverse modalities be effectively represented as INRs and stored as compact modulations (functa)?
RQ2Does decoupling data-to-functa creation from downstream learning enable scalable generative modeling, imputation, view synthesis, and classification across modalities?
RQ3What are the trade-offs of modulation-based representations versus full INR parameters in terms of reconstruction, scalability, and downstream task performance?
RQ4How does meta-learning facilitate rapid fitting of functa and what are its limitations for large-scale datasets?

主要发现

Dataset/Array size	Split	Modulation dimensionality	64	128	256	512	1024
ShapeNet Chairs, 64^3	Test	64	99.11	99.28	99.38	99.46	99.51
ShapeNet Chairs, 64^3	Train	64	99.43	99.49	99.49	99.51	99.53
ShapeNet 10 Classes, 64^3	Train	64	99.36	99.44	99.47	99.52	99.56
ShapeNet 10 Classes, 64^3	Test	64	99.30	99.40	99.44	99.50	99.55
CelebA-HQ, 64×64	Test	64	21.6	23.5	25.6	28.0	30.7
CelebA-HQ, 64×64	Train	64	24.3	24.2	24.6	24.6	24.4
SRN Cars, 128×128	Test	64	22.4	23.0	23.1	23.2	23.1
SRN Cars, 128×128	Train	64	43.2	43.7	43.8	44.0	44.1
ERA5, 181×360	Test	64	43.2	43.6	43.8	43.9	44.0

Functa scale with resolution more gracefully than discrete arrays, enabling efficient handling of high-resolution and variable-shape data.
Modulations with 256–512 dimensions provide visually close reconstructions while being much smaller than array representations.
Meta-learning the base network and fitting per-datapoint modulations in a few steps yields accurate functa across images, voxels, NeRFs, and manifolds.
Generative models trained on modulations (via normalizing flows and diffusion) produce coherent samples across modalities; NeRF-like scenes can be inferred and interpolated from modulations with prior-guided inference.
Classification on modulations achieves competitive accuracy with far fewer parameters than an equivalent 3D CNN on voxel data (e.g., ShapeNet 10 Classes: 93.6% vs 93.3% with 83k vs 550k params).
The framework enables flexible downstream tasks such as imputation and novel view synthesis by optimizing over the modulation prior plus reconstruction loss (MAP inference).

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。