QUICK REVIEW

[论文解读] Unsupervised Alignment of Embeddings with Wasserstein Procrustes

Édouard Grave, Armand Joulin|arXiv (Cornell University)|May 29, 2018

Advanced Neural Network Applications参考文献 54被引用 38

一句话总结

该论文提出了一种新颖的无监督方法，通过基于Wasserstein Procrustes的随机优化框架，联合估计正交变换矩阵和排列矩阵，实现高维词嵌入对齐。该方法在无监督词翻译任务中达到最先进性能，优于对抗性方法，且与ICP-based方法相当，同时显著减少计算资源需求，并仅需单次初始化运行。

ABSTRACT

We consider the task of aligning two sets of points in high dimension, which has many applications in natural language processing and computer vision. As an example, it was recently shown that it is possible to infer a bilingual lexicon, without supervised data, by aligning word embeddings trained on monolingual data. These recent advances are based on adversarial training to learn the mapping between the two embeddings. In this paper, we propose to use an alternative formulation, based on the joint estimation of an orthogonal matrix and a permutation matrix. While this problem is not convex, we propose to initialize our optimization algorithm by using a convex relaxation, traditionally considered for the graph isomorphism problem. We propose a stochastic algorithm to minimize our cost function on large scale problems. Finally, we evaluate our method on the problem of unsupervised word translation, by aligning word embeddings trained on monolingual data. On this task, our method obtains state of the art results, while requiring less computational resources than competing approaches.

研究动机与目标

解决在低资源或零样本翻译设置下，无监督对齐两组高维嵌入的挑战。
开发一种可扩展且稳定的优化方法，用于联合估计嵌入对齐中的正交变换矩阵和排列矩阵。
改进现有依赖对抗训练或迭代最近点（ICP）的无监督对齐方法，这些方法计算成本高且对初始化敏感。
提供一种基于凸松弛的初始化方法，以增强非凸优化中的收敛性和性能。

提出的方法

将嵌入对齐问题表述为最小化一个代价函数，结合正交变换和排列矩阵估计，受Procrustes和Wasserstein距离的启发。
使用随机算法最小化基于变换后源嵌入与目标嵌入之间平方Wasserstein距离的代理损失。
引入非凸问题的凸松弛，源自图匹配松弛（Gold & Rangarajan, 1996），以提供具有更好收敛特性的优化初始化。
采用Sinkhorn算法在小批量中高效近似Wasserstein距离，实现大规模数据集的可扩展性。
应用CSLS（标准有符号距离）准则作为精炼步骤以提升对齐质量，尽管即使不使用精炼步骤，该方法仍表现出色。
采用批量随机优化方案，其中批量大小控制计算效率与Wasserstein距离近似精度之间的权衡。

实验结果

研究问题

RQ1联合优化正交变换矩阵和排列矩阵是否能在无监督嵌入对齐中实现优于对抗性或ICP-based方法的性能？
RQ2非凸对齐问题的凸松弛作为初始化策略，在提升收敛性和最终性能方面有多有效？
RQ3在无需精炼或多次随机重启的情况下，该方法在词翻译基准测试中相对于现有无监督方法的优越程度如何？
RQ4随机优化中的批量大小如何影响计算效率与对齐精度之间的权衡？
RQ5图匹配与嵌入对齐之间存在何种关系？从一个领域获得的见解能否启发另一个领域？

主要发现

所提方法在无监督双语词典归纳任务中达到最先进性能，en-es和es-en翻译对的精确率在top-1时分别达到80.2%和80.3%，优于对抗性方法和ICP-based基线。
在en-fr对上，使用CSLS时达到79.8%的top-1精确率，性能与或超过先前无监督方法，且显著更快，仅需单次运行。
在8个语言对中的6个上，该方法在无精炼设置下优于GAN-based方法（如Conneau et al., 2017），证明其初始化质量更优。
当批量大小为1600时，该方法在最大数据集上22分钟内收敛，展现出强大可扩展性，且性能随批量大小增加而提升，因更准确逼近真实Wasserstein距离。
凸松弛初始化可实现一致收敛，且性能优于随机重启，使该方法在鲁棒性和效率方面优于ICP（后者需数百次重启）

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。