QUICK REVIEW

[논문 리뷰] Professor Forcing: A New Algorithm for Training Recurrent Networks

Alex Lamb, Anirudh Goyal|arXiv (Cornell University)|2016. 10. 27.

Topic Modeling참고 문헌 33인용 수 330

한 줄 요약

Professor Forcing은 RNN의 생성(샘플링) 다이내믹스와 교사강요(teacher-forced) 다이내믹스를 맞추는 적대적 학습 프레임워크를 도입하여 장기 시퀀스 생성 능력을 향상시키고 정규화 역할을 한다.

ABSTRACT

The Teacher Forcing algorithm trains recurrent networks by supplying observed sequence values as inputs during training and using the network's own one-step-ahead predictions to do multi-step sampling. We introduce the Professor Forcing algorithm, which uses adversarial domain adaptation to encourage the dynamics of the recurrent network to be the same when training the network and when sampling from the network over multiple time steps. We apply Professor Forcing to language modeling, vocal synthesis on raw waveforms, handwriting generation, and image generation. Empirically we find that Professor Forcing acts as a regularizer, improving test likelihood on character level Penn Treebank and sequential MNIST. We also find that the model qualitatively improves samples, especially when sampling for a large number of time steps. This is supported by human evaluation of sample quality. Trade-offs between Professor Forcing and Scheduled Sampling are discussed. We produce T-SNEs showing that Professor Forcing successfully makes the dynamics of the network during training and sampling more similar.

연구 동기 및 목표

훈련 시퀀스를 넘어선 장기 시퀀스 생성을 개선하려는 동기 부여.
RNN에 대해 학습 시간과 샘플링 시간 다이내믹스를 구분 불가능하게 만드는 방법 도입.
다이나믹스 매칭이 정규화 역할을 하고 다양한 작업에서 일반화 성능을 향상시킨다는 것을 보이는 것.

제안 방법

교사 강요 대비 자유 실행 행동을 구분하기 위해 생성기 RNN과 판별기를 GAN과 유사한 설정으로 페어링하여 Professor Forcing를 제안한다.
개방 루프(교사 강요) 및 폐쇄 루프(자유 실행) 모드에서의 행동 시퀀스 B(x,y,θg)를 정의한다.
판별기를 훈련시켜 이러한 행동을 구분하고 생성기를 데이터에 맞추는 것(NLL)과 판별기를 속이는 것(C_f, C_t)을 함께 학습시킨다.
전체 행동 시퀀스를 평가하기 위해 양방향 RNN 판별기를 사용한다.
업데이트 규칙은 생성기에 대해 NLL + C_f(및 선택적으로 C_t), 판별기에 대해 C_d를 포함한다.
문자 단위 언어 모델링, 순차 MNIST, 손글씨(handwriting), 원시 파형의 음성 합성에 적용한다.

실험 결과

연구 질문

RQ1훈련 시간과 샘플링 시간 다이내믹스를 적대적으로 맞추는 것이 장기 시퀀스 생성 성능을 향상시키나요?
RQ2Professor Forcing가 재귀적 모델을 정규화하고 다양한 도메인에서 테스트 우도(test likelihood)를 향상시킵니까?
RQ3Professor Forcing가 교사 강요와 비교했을 때 샘플의 품질과 다양성에 어떤 영향을 미치나요?
RQ4장기 의존성 모델링이 다이내믹스 매칭으로부터 가장 큰 이익을 얻는 작업은 무엇인가요?
RQ5Professor Forcing로 학습할 때의 실용적 고려사항(판별기 균형, 학습 시간)은 무엇인가요?

주요 결과

Professor Forcing는 학습 시간과 샘플링 시간의 숨겨진 상태 다이내믹스 간의 발산을 줄인다(T-SNE 시각화를 통해 확인되었다).
Penn Treebank 문자 수준에서 Professor Forcing은 검증 비트-당 문자(BPC)를 1.50에서 1.48로 개선한다.
Professor Forcing는 정규화 역할을 하여 Sequential MNIST 및 음성 합성 작업에서 테스트 우도(test likelihood)를 향상시킨다.
손글씨 생성에서 인간 평가자들은 Professor Forcing 샘플이 Teacher Forcing 샘플보다 더 우수하다고 평가했다.
Sequential MNIST에서 Professor Forcing은 객관적 평가에서 MNLL이 79.58로 PixelRNN의 79.2에 비해 경쟁력을 보인다.
Professor Forcing은 추가적인 판별기 단계로 인해 추가 학습 시간이 필요하지만 수렴 속도를 높이고 샘플 품질을 개선할 수 있다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.