QUICK REVIEW

[논문 리뷰] Online Learning of Recurrent Neural Architectures by Locally Aligning Distributed Representations.

Alexander G. Ororbia, Ankur Mali|arXiv (Cornell University)|2018. 10. 17.

Neural Networks and Reservoir Computing인용 수 1

한 줄 요약

이 논문은 지역 표현 정렬을 통해 훈련되는 생물학적으로 영감을 받은 순환 구조인 병렬 시간 신경 코드화 네트워크를 제안한다. 이는 시간에 따른 역전파(back-propagation through time)를 피하는 局소 학습 규칙으로, 전개(unrolling)과 미분 가능한 활성화 함수의 필요성을 제거함으로써 효율적인 병렬 훈련을 가능하게 하며, Bouncing MNIST 및 Penn Treebank와 같은 시계열 모델링 작업에서 최신 기준 성능을 달성한다. 일부 경우에서는 전체 역전파를 능가하기도 한다.

ABSTRACT

Temporal models based on recurrent neural networks have proven to be quite powerful in a wide variety of applications, including language modeling and speech processing. However, to train these models, one relies on back-propagation through time, which entails unfolding the network over many time steps, making the process of conducting credit assignment considerably more challenging. Furthermore, the nature of back-propagation itself does not permit the use of non-differentiable activation functions and is inherently sequential, making parallelization of the underlying training process very difficult. In this work, we propose the Parallel Temporal Neural Coding Network, a biologically inspired model trained by the local learning algorithm known as Local Representation Alignment, that aims to resolve the difficulties and problems that plague recurrent networks trained by back-propagation through time. Most notably, this architecture requires neither unrolling nor the derivatives of its internal activation functions. We compare our model and learning procedure to other online back-propagation-through-time alternatives (which also tend to be computationally expensive), including real-time recurrent learning, echo state networks, and unbiased online recurrent optimization, and show that it outperforms them on sequence modeling benchmarks such as Bouncing MNIST, a new benchmark we call Bouncing NotMNIST, and Penn Treebank. Notably, our approach can, in some instances, even outperform full back-propagation through time itself as well as variants such as sparse attentive back-tracking. Furthermore, we present promising experimental results that demonstrate our model's ability to conduct zero-shot adaptation.

연구 동기 및 목표

순환 신경망에서 시간에 따른 역전파(BPTT)의 계산 비효율성과 순차적 성질을 해결한다.
BPTT의 한계, 즉 전개 필요성과 미분 가능한 활성화 함수에 대한 의존성을 극복한다.
병렬 처리를 지원하고 순환 모델에서 즉시 적응(zer0-shot adaptation)을 가능하게 하는 훈련 방법을 개발한다.
전역적인 오차 할당(global credit assignment)을 피하면서도 시계열 모델링 작업에서 뛰어난 성능을 유지하는 생물학적으로 타당한 학습 규칙을 설계한다.

제안 방법

지역 학습 규칙을 위한 설계된 순환 아키텍처인 병렬 시간 신경 코드화 네트워크를 제안한다.
지역 표현 정렬(Local Representation Alignment)을 사용해 모델을 훈련시키며, 이는 시간 단계 간 분산 표현을 기울기 없이 정렬하는 지역 학습 알고리즘이다.
시간 단계에 따른 네트워크 전개가 필요 없게 하여 훈련 중 병렬 계산을 가능하게 한다.
내부 활성화 함수의 도함수에 의존하지 않게 하여 비미분 가능한 유닛의 사용을 허용한다.
연속된 시간 단계 간 은닉 상태 표현을 국소 오차 신호를 사용해 정렬함으로써 국소 오차 할당을 통합한다.
전역적으로 역전파된 오차가 아닌 국소 상관관계에 기반해 표현 간 국소 상관관계를 이용해 가중치를 갱신하는 생물학적으로 영감을 받은 메커니즘을 사용한다.

실험 결과

연구 질문

RQ1시간에 따른 역전파 또는 기울기 계산 없이도 순환 신경망을 효과적으로 훈련시킬 수 있는가?
RQ2지역 학습 규칙인 지역 표현 정렬이 BPTT 및 그 변종과 비교해 경쟁 가능한 성능을 내는가?
RQ3제안된 모델이 순차적 작업에서 즉시 적응(zer0-shot adaptation)을 지원하는가?
RQ4장기적인 시간 의존성 유지와 함께 높은 성능을 유지하면서도 효율적으로 병렬화될 수 있는가?
RQ5에코 스테이트 네트워크, 실시간 순환 학습, 비편향 온라인 순환 최적화와 같은 기존 방법들과 비교해 모델의 성능은 어떠한가?

주요 결과

제안된 모델은 Bouncing MNIST 및 Bouncing NotMNIST에서 기존의 온라인 BPTT 대안들인 실시간 순환 학습, 에코 스테이트 네트워크, 비편향 온라인 순환 최적화를 능가한다.
Penn Treebank 언어 모델링 벤치마크에서는 경쟁력 있는 결과를 달성했으며, 특정 설정에서는 전체 시간에 따른 역전파를 초월하기도 한다.
모델은 강력한 즉시 적응 능력을 보이며, 예측 불가능한 시퀀스로의 일반화 능력이 뛰어나다는 것을 시사한다.
전개 없이 기울기 계산이 없기 때문에 효율적인 병렬 훈련이 가능하여 기존 BPTT에 비해 계산 효율성이 크게 향상된다.
비미분 가능한 활성화 함수를 사용함에도 불구하고 높은 성능를 유지함으로써, 표준 역전파와 호환되지 않는 유닛을 사용할 수 있음을 입증한다.
지역 표현 정렬은 전역 오차 신호 없이도 효과적인 오차 할당을 가능하게 하여 생물학적으로 타당한 훈련 메커니즘으로서의 잠재력을 입증한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.