QUICK REVIEW

[논문 리뷰] Improved Variational Autoencoders for Text Modeling using Dilated Convolutions

Zichao Yang, Zhiting Hu|arXiv (Cornell University)|2017. 02. 26.

Topic Modeling참고 문헌 23인용 수 241

한 줄 요약

이 논문은 텍스트에 대해 확장된 dilated CNN 디코더를 VAE에 사용하면 디코더의 컨텍스트 용량을 신중히 제어할 때 표준 LSTM 언어 모델보다 잘 수행하고, 준지도 분류 및 비지도 클러스터링에 이점이 있음을 보여준다.

ABSTRACT

Recent work on generative modeling of text has found that variational auto-encoders (VAE) incorporating LSTM decoders perform worse than simpler LSTM language models (Bowman et al., 2015). This negative result is so far poorly understood, but has been attributed to the propensity of LSTM decoders to ignore conditioning information from the encoder. In this paper, we experiment with a new type of decoder for VAE: a dilated CNN. By changing the decoder's dilation architecture, we control the effective context from previously generated words. In experiments, we find that there is a trade off between the contextual capacity of the decoder and the amount of encoding information used. We show that with the right decoder, VAE can outperform LSTM language models. We demonstrate perplexity gains on two datasets, representing the first positive experimental result on the use VAE for generative modeling of text. Further, we conduct an in-depth investigation of the use of VAE (with our new decoding architecture) for semi-supervised and unsupervised labeling tasks, demonstrating gains over several strong baselines.

연구 동기 및 목표

LSTM 디코더를 가진 텍스트 VAE가 LSTM에 비해 성능이 떨어지는 이유를 조사하고, VAE가 언어 모델보다 우수할 수 있는 조건을 식별한다.
디코더의 컨텍스추얼 용량을 유연하게 제어하기 위한 dilated CNN 디코더를 제안한다.
제안된 디코더를 이용해 두 데이터셋에서 언어 모델링 향상을 입증하고, 준지도 및 비지도 텍스트 태스크를 탐색한다.

제안 방법

텍스트 모델링에서 LSTM 디코더를 대체하기 위해 VAE용 dilated CNN 디코더를 도입한다.
확장 패턴과 네트워크 깊이를 통해 디코더의 컨텍스트 용량을 체계적으로 변화시켜 잠재 변수에 대한 의존도를 연구한다.
q(z|x)를 생성하기 위해 LSTM 인코더를 사용하고 가우시안 사전 p(z); z를 디코더 입력과 연결한다.
변분 하한(variational lower bound)과 KL 애닐링으로 포스트eri어 붕괴를 방지한다.
VAE 성능을 높이기 위해 LSTM 언어 모델 파라미터로 미리 학습된 인코더로 초기화하는 것을 탐색한다.
Gumbel-Softmax를 사용하여 이산 라벨에 대해 준지도 분류 및 비지도 클러스터링 프레임워크를 확장한다.

실험 결과

연구 질문

RQ1제어 가능한 컨텍스트 용량을 갖춘 dilated CNN 디코더가 텍스트 VAE를 LSTM 언어 모델보다 우수하게 만드는가?
RQ2디코더 용량이 모델의 잠재 표현(KL 항) 사용 및 전체 perplexity에 어떤 영향을 미치는가?
RQ3확실한 기저선과 비교했을 때 준지도 텍스트 분류 및 비지도 클러스터링에 대해 dilated CNN VAE가 유익한가?

주요 결과

적절한 컨텍스트 용량을 가진 dilated CNN 디코더가 VAE가 두 데이터셋에서 LSTM 언어 모델을 능가하도록 한다.
더 작은 효과적 컨텍스트 윈도우는 디코더가 잠재 변수에 더 의존하게 만들어 KL를 증가시키고 잠재 표현을 향상시킨다.
더 큰 디코더는 잠재 변수 의존성을 줄이고 VAE 이득을 감소시키며, 매우 큰 디코더는 순수 LM 베이스라인과 비슷하게 작동한다.
사전 학습된 LSTM 언어 모델 파라미터로 VAE 인코더를 초기화하면 NLL과 perplexity에서 추가 개선을 얻는다.
준지도 설정에서 특정 dilated CNN VAE들(e.g., SCNN-VAE-Semi)은 특히 라벨이 제한된 데이터에서 베이스라인보다 높은 분류 정확도를 달성하고, 인코더 초기화가 성능을 높인다.
Yahoo 데이터에서 비지도 클러스터링에서 초기화를 사용한 SCNN-VAE가 GMM을 사용하는 베이스라인에 비해 상당한 이득을 얻는다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.