QUICK REVIEW

[论文解读] Autoencoded UMAP-Enhanced Clustering for Unsupervised Learning

Malihehsadat Chavooshi, A. Mamonov|ArXiv.org|Jan 13, 2025

Advanced Clustering Algorithms Research被引用 3

一句话总结

本文提出 Autoencoded UMAP-Enhanced Clustering (AUEC)，一个将聚类促进的自编码器与 UMAP 精炼步骤相结合的三阶段无监督框架，在 MNIST 聚类准确率上达到较高水平。

ABSTRACT

We propose a novel approach to unsupervised learning by constructing a non-linear embedding of the data into a low-dimensional space followed by any conventional clustering algorithm. The embedding promotes clusterability of the data and is comprised of two mappings: the encoder of an autoencoder neural network and the output of UMAP algorithm. The autoencoder is trained with a composite loss function that incorporates both a conventional data reconstruction as a regularization component and a clustering-promoting component built using the spectral graph theory. The two embeddings and the subsequent clustering are integrated into a three-stage unsupervised learning framework, referred to as Autoencoded UMAP-Enhanced Clustering (AUEC). When applied to MNIST data, AUEC significantly outperforms the state-of-the-art techniques in terms of clustering accuracy.

研究动机与目标

通过在聚类前使用非线性嵌入揭示数据拓扑来推动改进聚类。
开发一个三阶段框架，使聚类友好嵌入的学习与 UMAP 的 refinement 共同进行。
通过将特征学习与最终聚类步骤分离，实现对下游聚类的灵活性。

提出的方法

阶段 I 通过结合聚类促进成分与重构正则化项的联合损失来训练自编码器。
聚类损失通过基于光谱图理论的相对光谱间隙（RSG）来提升聚类可分性。
阶段 II 对压缩后的嵌入应用 UMAP，获得 refined 的低维表示。
阶段 III 将传统聚类算法应用于 refined 的嵌入（如 K-means 或 DBSCAN 变体）。
为了稳定训练，自编码器也可以仅用重构损失进行预训练。

实验结果

研究问题

RQ1三阶段框架（基于自编码器的嵌入与 UMAP refinement）的结合能否在性能上优于传统的 DR+聚类管线？
RQ2在自编码器训练中使用基于光谱图理论的聚类损失（RSG）是否提升潜在空间的聚类性？
RQ3在 MNIST 上，AUEC 相较于最新的无监督方法在 ACC、NMI 和 ARI 上的表现如何？
RQ4在对测试数据应用时不重新训练阶段 I，AUEC 框架是否具有鲁棒性？
RQ5下游聚类算法的选择（如 K-means 与基于 DBSCAN 的变体）是否会影响 AUEC 流水线带来的增益？

主要发现

Method	ACC	NMI	ARI
KMS	59.07%	50.95%	40.47%
UMAP+KMS	86.59%	85.73%	80.41%
DEC	84.30%	-	-
DCN	83%	81%	75%
FCAE-KMS	79.4%	69.8%	-
AUEC-MDBSCAN	97.52%	93.46%	94.64%

AUEC 与 MDBSCAN 在 MNIST 训练数据上达到 ACC 97.52%、NMI 93.46%、ARI 94.64%。
UMAP+KMS 在未使用自编码器的情况下在 MNIST 上达到 ACC 86.59%、NMI 85.73%、ARI 80.41%，显示出来自 AUEC 的显著提升。
阶段 I 使用基于相对光谱间隙（RSG）的聚类损失来提高聚类可分性，相对于仅重构提升。
阶段 II 的 UMAP 精炼进一步增强可聚类结构，提升下游聚类的灵活性。
鲁棒性研究表明在不重新训练阶段 I 的情况下，测试数据的指标下降较小（ACC 约下降 2% 左右），仍然保持较高水平，显示实际鲁棒性。
与 DEC、DCN 和 FCAE-KMS 相比，AUEC-MDBSCAN 在所报告的指标上显著优于对比方法。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。