QUICK REVIEW

[论文解读] Hash Embeddings for Efficient Word Representations

Dan Tito Svenstrup, Jonas Meinertz Hansen|arXiv (Cornell University)|Sep 12, 2017

Topic Modeling参考文献 15被引用 32

一句话总结

本文提出哈希嵌入（hash embeddings），一种新型高效词表示方法，结合了标准词嵌入与特征哈希的优点。通过使用 k 个哈希函数从共享的嵌入向量池中选取 k 个嵌入向量，并为每个词元学习 k 个可训练权重，哈希嵌入实现了动态词汇处理、隐式剪枝和参数量减少——在多个自然语言处理任务中性能与标准嵌入相当或更优，同时显著降低模型大小。

ABSTRACT

We present hash embeddings, an efficient method for representing words in a continuous vector form. A hash embedding may be seen as an interpolation between a standard word embedding and a word embedding created using a random hash function (the hashing trick). In hash embeddings each token is represented by $k$ $d$-dimensional embeddings vectors and one $k$ dimensional weight vector. The final $d$ dimensional representation of the token is the product of the two. Rather than fitting the embedding vectors for each token these are selected by the hashing trick from a shared pool of $B$ embedding vectors. Our experiments show that hash embeddings can easily deal with huge vocabularies consisting of millions of tokens. When using a hash embedding there is no need to create a dictionary before training nor to perform any kind of vocabulary pruning after training. We show that models trained using hash embeddings exhibit at least the same level of performance as models trained using regular embeddings across a wide range of tasks. Furthermore, the number of parameters needed by such an embedding is only a fraction of what is required by a regular embedding. Since standard embeddings and embeddings constructed using the hashing trick are actually just special cases of a hash embedding, hash embeddings can be considered an extension and improvement over the existing regular embedding types.

研究动机与目标

为解决神经网络 NLP 模型中大规模词汇表带来的参数过多和训练开销过大的问题。
消除在大规模或动态词汇表模型中对预训练词典和训练后剪枝的需求。
开发一种混合方法，结合标准嵌入的表达能力与特征哈希的效率优势。
实现在无需重新训练或构建词典的情况下，支持在线学习和动态词汇扩展。
在保持或提升下游 NLP 任务性能的同时，减少模型大小和参数数量。

提出的方法

每个词元通过 k 个哈希函数从一个包含 B 个向量的共享池中选取 k 个 d 维嵌入向量。
使用一个可学习的 k 维权重向量，对选中的 k 个嵌入向量进行加权求和，形成最终的 d 维表示。
该方法对组件向量和重要性权重使用相同的哈希函数，尽管也探索了为权重使用不同哈希函数以降低冲突风险。
该方法通过仅学习相关哈希桶的权重，隐式实现词汇剪枝，从而减少有效参数数量。
最终表示是模型参数的可微分、连续函数，支持基于梯度的端到端训练。
该方法支持无词典训练和标准词典基训练，可在在线和离线学习场景中灵活应用。

实验结果

研究问题

RQ1结合哈希与可学习权重的混合嵌入方法是否能在减少参数量的同时，实现与标准词嵌入相当的性能？
RQ2在多样化的 NLP 分类任务中，哈希嵌入与标准嵌入和特征哈希相比表现如何？
RQ3哈希嵌入在无需预定义词典或训练后剪枝的情况下，对大规模或动态增长词汇表的处理能力如何？
RQ4在哈希桶上使用可学习权重是否能提供正则化效果，从而相比标准嵌入提升泛化能力？
RQ5哈希嵌入是否能在词汇未知的在线学习场景中有效应用？

主要发现

在包括 AG、DBP、Yelp 和 Amazon 情感分类任务在内的七个基准数据集上，哈希嵌入的性能与标准嵌入相当或更优。
在七个数据集中的五个，哈希嵌入在最先进模型中排名前三，展现出强劲的竞争力。
与标准嵌入相比，该方法显著减少了参数数量，尤其在大规模词汇表下优势明显，同时保持高准确率。
即使没有预定义的词汇词典，性能依然稳定且具有竞争力，支持无缝的在线学习和动态词汇处理。
为组件向量和重要性权重使用不同的哈希函数可带来微小但一致的性能提升，减少由冲突导致的信息损失。
该模型具有内在的正则化效应：仅学习与活跃词元相关的参数，从一开始就有效减少过拟合和参数数量。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。