QUICK REVIEW

[논문 리뷰] Long Range Arena: A Benchmark for Efficient Transformers

Yi Tay, Mostafa Dehghani|arXiv (Cornell University)|2020. 11. 08.

Advanced Neural Network Applications참고 문헌 47인용 수 195

한 줄 요약

논문은 Long Range Arena(LRA)를 소개하며, 1K–16K 토큰의 긴 맥락 작업에서 효율 트랜스포머를 평가하기 위한 통합 벤치마크를 제시하고, 다양한 데이터 유형과 작업에 걸쳐 열 개의 모델을 비교합니다. 성능, 속도, 메모리를 분석해 트레이드오프를 강조하고 단일 최적 해가 존재하지 않음을 보여줍니다.

ABSTRACT

Transformers do not scale very well to long sequence lengths largely because of quadratic self-attention complexity. In the recent months, a wide spectrum of efficient, fast Transformers have been proposed to tackle this problem, more often than not claiming superior or comparable model quality to vanilla Transformer models. To this date, there is no well-established consensus on how to evaluate this class of models. Moreover, inconsistent benchmarking on a wide spectrum of tasks and datasets makes it difficult to assess relative model quality amongst many models. This paper proposes a systematic and unified benchmark, LRA, specifically focused on evaluating model quality under long-context scenarios. Our benchmark is a suite of tasks consisting of sequences ranging from $1K$ to $16K$ tokens, encompassing a wide range of data types and modalities such as text, natural, synthetic images, and mathematical expressions requiring similarity, structural, and visual-spatial reasoning. We systematically evaluate ten well-established long-range Transformer models (Reformers, Linformers, Linear Transformers, Sinkhorn Transformers, Performers, Synthesizers, Sparse Transformers, and Longformers) on our newly proposed benchmark suite. LRA paves the way towards better understanding this class of efficient Transformer models, facilitates more research in this direction, and presents new challenging tasks to tackle. Our benchmark code will be released at https://github.com/google-research/long-range-arena.

연구 동기 및 목표

긴 범위 트랜스포머 모델에 대한 단일하고 일반적인 벤치마크를 다중 데이터 양식에서 확립한다.
긴 맥락 도전에 대해 다양한 효율 트랜스포머 아키텍처를 평가한다.
모델 선택과 향후 연구를 안내하기 위한 포괄적 효율성(속도와 메모리) 분석을 제공한다.

제안 방법

롱 컨텍스트 태스크(ListOps, 바이트 단위 텍스트 분류, 바이트 단위 문서 검색, 시퀀스에서의 이미지 분류, Pathfinder 및 Pathfinder-X)를 설계한다.
열 개의 효율 트랜스포머 모델(Reformer, Linformer, Linear Transformers, Sparse Transformers, Longformer, Sinkhorn Transformers, Synthesizers, BigBird, Performers, 및 vanilla Transformer)을 태스크에 대해 평가한다.
필요한 어텐션 스팬을 정량화하고 태스크별 및 전체 성능을 보고한다.
향상된 재현성과 확장을 위한 JAX/Flax 기반의 오픈 소스 벤치마크 코드를 제공한다.

실험 결과

연구 질문

RQ1다양한 효율 트랜스포머 아키텍처가 텍스트, 이미지 및 합성 데이터의 긴 범위 태스크에서 어떻게 성능을 발휘하는가?
RQ2긴 시퀀스 길이에서 이러한 아키텍처 간의 속도와 메모리 트레이드오프는 무엇인가?
RQ3모든 긴 범위 태스크에서 일관되게 뛰어난 단일 모델이 있는가, 아니면 트레이드오프가 지배적인가?
RQ4시퀀스 길이를 늘렸을 때(Pathfinder-X 등)가 모델별 학습 능력에 어떻게 영향을 미치는가?

주요 결과

모델	ListOps	텍스트	검색	이미지	Pathfinder	Path-X	평균
Transformer	36.37	64.27	57.46	42.44	71.40	FAIL	54.39
Local Attention	15.82	52.98	53.39	41.46	66.63	FAIL	46.06
Sparse Trans.	17.07	63.58	59.59	44.24	71.71	FAIL	51.24
Longformer	35.63	62.85	56.89	42.22	69.71	FAIL	53.46
Linformer	35.70	53.94	52.27	38.56	76.34	FAIL	51.36
Reformer	37.27	56.10	53.40	38.07	68.50	FAIL	50.67
Sinkhorn Trans.	33.67	61.20	53.83	41.23	67.45	FAIL	51.39
Synthesizer	36.99	61.68	54.67	41.61	69.45	FAIL	52.88
BigBird	36.05	64.02	59.29	40.83	74.87	FAIL	55.01
Linear Trans.	16.13	65.90	53.09	42.34	75.30	FAIL	50.55
Performer	18.01	65.40	53.82	42.77	77.05	FAIL	51.41
Task Avg (Std)	29 (9.7)	61 (4.6)	55 (2.6)	41 (1.8)	72 (3.7)	FAIL	52 (2.4)

현재 모델들에 대해 모든 LRA 태스크가 도전적이며, 몇몇 태스크에서 최적 성능에 도달하지 못하는 큰 격차가 존재한다.
BigBird는 태스크 간의 균형을 통해 전체 LRA 점수에서 최상위를 차지하지만, 개별 태스크에서 항상 최상은 아니다.
Performer 및 Linear Transformers와 같은 커널 기반 변형은 강한 속도/메모리 트레이드오프를 제공하지만, 때때로 태스크 특화 정확도에 손실을 초래한다.
거의 모든 모델이 극단적인 길이(Path-X)에서 어려움을 겪으며 해결책이 없어서 현재 아키텍처의 초장문 시퀀스 한계를 드러낸다.
만능의 한 가지 해결책은 없다; 정확도, 속도, 메모리 사이의 트레이드오프는 태스크와 모델에 따라 다르다.
메모리 점유율의 차이가 크다; Linformer은 4K에서 거의 1GB에 근접할 수 있는 반면, vanilla Transformer는 4K에서 약 9.48GB가 필요할 수 있어 효율성 격차를 강조한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.