QUICK REVIEW

[Paper Review] Whitening Sentence Representations for Better Semantics and Faster Retrieval

Jianlin Su, Jiarun Cao|arXiv (Cornell University)|Mar 29, 2021

Topic Modeling16 references204 citations

TL;DR

The paper shows that whitening sentence embeddings from BERT-style models can isotropize the space, improve semantic similarity performance, and reduce embedding dimensionality for faster retrieval, often surpassing BERT-flow baselines.

ABSTRACT

Pre-training models such as BERT have achieved great success in many natural language processing tasks. However, how to obtain better sentence representation through these pre-training models is still worthy to exploit. Previous work has shown that the anisotropy problem is an critical bottleneck for BERT-based sentence representation which hinders the model to fully utilize the underlying semantic features. Therefore, some attempts of boosting the isotropy of sentence distribution, such as flow-based model, have been applied to sentence representations and achieved some improvement. In this paper, we find that the whitening operation in traditional machine learning can similarly enhance the isotropy of sentence representations and achieve competitive results. Furthermore, the whitening technique is also capable of reducing the dimensionality of the sentence representation. Our experimental results show that it can not only achieve promising performance but also significantly reduce the storage cost and accelerate the model retrieval speed.

Motivation & Objective

Investigate the isotropy problem in BERT-based sentence embeddings and its impact on semantic similarity tasks.
Propose a whitening post-processing method to transform sentence embeddings to a standard orthogonal basis.
Explore dimensionality reduction (k) during whitening to balance performance and storage/speed benefits.
Evaluate the method on multiple semantic textual similarity benchmarks without and with NLI supervision.

Proposed method

Apply whitening to a set of sentence embeddings: center to zero mean and transform via W where W^T Σ W = I, with Σ the covariance of embeddings.
Compute whitening matrix W from Σ using SVD: Σ = U Λ U^T and W = U sqrt(Λ^{-1}).
Optionally reduce dimensionality by keeping only the first k columns of W, enabling Whitening-k (PCA-like reduction).
Evaluate performance using cosine similarity on STS benchmarks with and without NLI supervision.
Compare against BERT-flow and SBERT baselines to assess isotropy improvement and retrieval efficiency.

Experimental results

Research questions

RQ1Can whitening transform BERT-based sentence embeddings into an isotropic space to improve cosine-based similarity measurements?
RQ2Does whitening (with or without dimensionality reduction) improve STS task performance compared to flow-based baselines?
RQ3What is the effect of embedding dimensionality k on performance and retrieval efficiency?
RQ4Do whitening-based embeddings maintain gains under supervised (NLI) training settings?

Key findings

Whitening improves Spearman correlation on several STS benchmarks compared to BERT-flow, achieving state-of-the-art-like results on multiple datasets with 256/384 dimensional embeddings.
Dimensionality reduction ( Whitening-k ) often maintains or improves performance while substantially reducing storage and speeding up retrieval.
Using whitening with NLI supervision yields competitive or superior results to flow-based methods across several datasets.
Performance gains are observed for both BERT-base and BERT-large configurations across various STS tasks.
Whitening provides a simpler alternative to flow-based approaches for isotropy and compact representations.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.