QUICK REVIEW

[论文解读] Optimal Bayes Classifiers for Functional Data and Density Ratios

Xiongtao Dai, Hans‐Georg Müller|arXiv (Cornell University)|May 12, 2016

Gene expression and cancer classification参考文献 50被引用 28

一句话总结

本文提出了一种针对函数型数据的非参数贝叶斯分类器，通过将曲线投影到公共特征函数上并估计所得得分的密度比来实现。通过将无限维问题简化为一维密度估计，该方法避免了维度灾难，并在正则条件下实现了渐近完美分类，且在模拟和真实数据应用（包括fMRI和基因表达数据）中表现出优异的小样本性能。

ABSTRACT

Bayes classifiers for functional data pose a challenge. This is because probability density functions do not exist for functional data. As a consequence, the classical Bayes classifier using density quotients needs to be modified. We propose to use density ratios of projections on a sequence of eigenfunctions that are common to the groups to be classified. The density ratios can then be factored into density ratios of individual functional principal components whence the classification problem is reduced to a sequence of nonparametric one-dimensional density estimates. This is an extension to functional data of some of the very earliest nonparametric Bayes classifiers that were based on simple density ratios in the one-dimensional case. By means of the factorization of the density quotients the curse of dimensionality that would otherwise severely affect Bayes classifiers for functional data can be avoided. We demonstrate that in the case of Gaussian functional data, the proposed functional Bayes classifier reduces to a functional version of the classical quadratic discriminant. A study of the asymptotic behavior of the proposed classifiers in the large sample limit shows that under certain conditions the misclassification rate converges to zero, a phenomenon that has been referred to as "perfect classification". The proposed classifiers also perform favorably in finite sample applications, as we demonstrate in comparisons with other functional classifiers in simulations and various data applications, including wine spectral data, functional magnetic resonance imaging (fMRI) data for attention deficit hyperactivity disorder (ADHD) patients, and yeast gene expression data.

研究动机与目标

为解决函数型数据最优贝叶斯分类器的构建挑战，传统基于密度的方法因缺乏概率密度函数而失效。
通过投影到公共正交特征函数基上，克服无限维函数型数据固有的严重维度灾难问题。
开发一种非参数贝叶斯分类器，既能保持最小化误分类率的最优性，又在实际估计中具有可行性。
建立所提分类器实现渐近完美分类的条件，即误分类率随样本量增加而收敛至零。
通过模拟和真实世界应用（包括fMRI、葡萄酒光谱数据和酿酒酵母基因表达数据）展示该方法在小样本下的优异性能。

提出的方法

将函数型观测值投影到由各组合并协方差结构导出的公共正交特征函数基上。
使用非参数核密度估计，对每个函数主成分得分的一维密度比进行估计。
将总体密度比分解为各分量上个体密度比的乘积，实现降维并避免高维密度估计。
通过将估计的密度比乘积与阈值比较来构建贝叶斯分类器，依据最大后验概率分配类别归属。
提供一种基于非参数回归的替代实现方式，该方式在小样本中有时优于直接密度比方法。
利用渐近理论建立估计密度比和分类器性能的一致性及收敛速率。

实验结果

研究问题

RQ1尽管缺乏概率密度函数，是否仍可为函数型数据构建非参数贝叶斯分类器？
RQ2如何在保持最小化误分类率最优性的同时，缓解函数型数据分类中的维度灾难问题？
RQ3在何种条件下，所提分类器能实现渐近完美分类，即误分类率收敛至零？
RQ4在小样本设置下，所提分类器的性能与现有函数型分类器相比如何？
RQ5基于非参数回归的替代实现方式在实际中是否优于直接密度比方法？

主要发现

在正则条件下，所提分类器实现渐近完美分类，即随着样本量增加，误分类率收敛至零。
对于高斯函数型数据，该分类器退化为函数型二次判别分析，与经典多元方法建立联系。
通过函数主成分实现降维，将问题简化为一维密度估计，从而避免了维度灾难。
小样本模拟和真实数据应用（包括注意缺陷多动障碍的fMRI数据和酿酒酵母基因表达数据）表明，该方法相对于现有分类器表现更优。
基于非参数回归的分类器实现方式在小样本中通常优于直接密度比方法。
估计密度比的收敛速率被确定为 $ O_P(h + (nh / \ ext{log} n)^{-1/2} + (m^{2/5} h^2)^{-1}) $，其中 $ m $ 为观测点数量，$ h $ 为带宽。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。