QUICK REVIEW

[논문 리뷰] TensorFlow: A system for large-scale machine learning

Martı́n Abadi, Paul Barham|arXiv (Cornell University)|2016. 05. 27.

Parallel Computing and Optimization Techniques참고 문헌 56인용 수 8,791

한 줄 요약

TensorFlow는 가변 상태, 분산 실행 및 연구와 생산에 대한 확장성을 갖춘 대규모 기계 학습용 데이터 흐름 기반 시스템을 제공합니다.

ABSTRACT

TensorFlow is a machine learning system that operates at large scale and in heterogeneous environments. TensorFlow uses dataflow graphs to represent computation, shared state, and the operations that mutate that state. It maps the nodes of a dataflow graph across many machines in a cluster, and within a machine across multiple computational devices, including multicore CPUs, general-purpose GPUs, and custom designed ASICs known as Tensor Processing Units (TPUs). This architecture gives flexibility to the application developer: whereas in previous "parameter server" designs the management of shared state is built into the system, TensorFlow enables developers to experiment with novel optimizations and training algorithms. TensorFlow supports a variety of applications, with particularly strong support for training and inference on deep neural networks. Several Google services use TensorFlow in production, we have released it as an open-source project, and it has become widely used for machine learning research. In this paper, we describe the TensorFlow dataflow model in contrast to existing systems, and demonstrate the compelling performance that TensorFlow achieves for several real-world applications.

연구 동기 및 목표

대규모 데이터셋과 모델의 학습이 가능한 확장 가능한 ML 시스템의 필요성에 대한 동기 부여.
계산과 가변 상태를 포착하는 통합 데이터 흐름 그래프 모델 소개.
CPU, GPU, TPU 간의 분산 실행 및 디바이스 배치 시연.
내장 미분, 대형 모델 지원, 내결함성 등 연구를 위한 확장성 시연.

제안 방법

계산과 가변 상태를 나타내는 단일 데이터 흐름 그래프 정의.
그래프 내 상태 및 동시 실행 간 조정을 가능하게 하는 변수와 큐 사용.
장치별 서브그래프 및 장치 간 Send/Recv 통신으로 분산 실행 구현.
Switch 및 Merge를 통한 동적 제어 흐름으로 비엄격 평가 지원.
미분, 최적화 알고리즘 및 대형 희소 임베딩에 대한 사용자 수준 확장성 허용.
그래프 내 체크포인팅 및 유연한 동기화 방식으로 내결함성 도입.

실험 결과

연구 질문

RQ1대규모로 학습과 추론을 모두 지원하는 단일 데이터 흐름 그래프 모델은 어떻게 가능할까?
RQ2이질적 아키텍처에서 효율적인 분산 학습 및 디바이스 배치를 가능하게 하는 메커니즘은 무엇일까?
RQ3연구자가 새로운 최적화 및 모델 아키텍처를 가능하게 하는 가변 상태와 조정을 어떻게 노출할 수 있을까?
RQ4어떤 전략(예: 동기식 대 비동기식 업데이트, 백업 작업자)이 학습 처리량과 견고성을 향상시키나?

주요 결과

TensorFlow는 Subsection 6.3에 나와 있는 것처럼 대규모 클러스터에서의 데이터 병렬 학습을 가능하게 하며, 대규모 학습에 적합한 스텝 타임을 제공합니다.
시스템은 비동기 및 동기 복제 조정을 모두 지원하며, 백업 작업자가 처리량을 최대 15%까지 향상시킵니다.
동적 제어 흐름과 그래프 내 상태로 새로운 모델과 최적화 알고리즘의 실험을 런타임을 수정하지 않고도 가능하게 한다.
희소 임베딩과 대형 모델 처리는 파라미터와의 근접 배치 및 샤드 기반 전략으로 촉진된다.
체크포인팅과 내결함성은 그래프 수준에서 구현되어 유연한 정책 선택과 전이 학습 워크플로를 가능하게 한다.
데이터 흐름 모델은 CPU, GPU, TPU를 포함한 디바이스 간 및 생산/모바일 추론 간의 이식성을 제공한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.