QUICK REVIEW

[논문 리뷰] Dissecting the NVidia Turing T4 GPU via Microbenchmarking

Zhe Jia, Marco Maggioni|arXiv (Cornell University)|2019. 03. 18.

Ferroelectric and Negative Capacitance Devices인용 수 73

한 줄 요약

이 기술 보고서는 NVIDIA Turing T4 GPU를 분석하고 해부하기 위한 마이크로벤치마킹을 수행하며, 그 아키텍처 동작에 대한 경험적 분석을 제공합니다.

ABSTRACT

In 2019, the rapid rate at which GPU manufacturers refresh their designs, coupled with their reluctance to disclose microarchitectural details, is still a hurdle for those software designers who want to extract the highest possible performance. Last year, these very reasons motivated us to dissect the Volta GPU architecture using microbenchmarks. The introduction in August 2018 of Turing, NVidia's latest architecture, pressed us to update our study. In this report, we examine Turing and compare it quantitatively against previous NVidia GPU generations. Specifically, we study the T4 GPU: a low-power board aiming at inference applications. We describe its improvements against its inference-oriented predecessor: the P4 GPU based on the Pascal architecture. Both T4 and P4 GPUs achieve significantly higher frequency-per-Watt figures than their full-size counterparts. We study the performance of the T4's TensorCores, finding a much higher throughput on low-precision operands than on the P4 GPU. We reveal that Turing introduces new instructions that express matrix math more succinctly. We map Turing's instruction space, finding the same encoding as Volta, and additional instructions. We reveal that the Turing TU104 chip has the same memory hierarchy depth as the Volta GV100; cache levels sizes on the TU104 are frequently twice as large as those found on the Pascal GP104. We benchmark each constituent of the T4 memory hierarchy and find substantial overall performance improvements over its P4 predecessor. We studied how clock throttling affects compute-intensive workloads that hit power or thermal limits. Many of our findings are novel, published here for the first time. All of them can guide high-performance software developers get closer to the GPU's peak performance.

연구 동기 및 목표

타깃 마이크로벤치마킹을 통해 Turing T4 아키텍처에 대한 이해를 촉진한다.
제어된 실험을 사용하여 T4의 성능 및 동작 특성을 특성화한다.
벤더 확인 여부와 무관하게 재현 가능한 실험 방법론과 분석을 제공한다.

제안 방법

Turing T4 GPU를 조사하기 위해 마이크로벤치마킹 기법을 적용한다.
재현성을 보장하기 위한 실험 설정 및 측정 절차를 설명한다.
관찰된 동작을 분석하여 아키텍처 및 성능 특성을 추론한다.
저자의 실험적 관찰에 근거한 분석과 결과를 제시한다.

실험 결과

연구 질문

RQ1발췌문에 명시적 연구 질문이 제공되지 않는다.

주요 결과

본 보고서는 Turing T4 GPU에 대한 마이크로벤치마크 실험에서 도출된 분석 및 결과를 제시한다.
발견은 저자들의 측정 및 해석으로부터의 경험적 관찰을 반영한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.