QUICK REVIEW

[论文解读] Understanding and Improving the Latency of DRAM-Based Memory Systems

Kevin K. Chang|arXiv (Cornell University)|Dec 22, 2017

Parallel Computing and Optimization Techniques被引用 28

一句话总结

本论文识别并解决了导致DRAM持续高延迟的三个关键原因——低效的大批量数据传输、刷新操作干扰以及固有单元级延迟差异——通过提出LISA（低成本互联子阵列）、DSARP（访问与刷新并行化）、FLY-DRAM（可变延迟DRAM）和Voltron（电压感知延迟优化）技术。这些方法通过架构创新和对商用DRAM行为的经验表征，显著降低了延迟并提升了能效。

ABSTRACT

Over the past two decades, the storage capacity and access bandwidth of main memory have improved tremendously, by 128x and 20x, respectively. These improvements are mainly due to the continuous technology scaling of DRAM (dynamic random-access memory), which has been used as the physical substrate for main memory. In stark contrast with capacity and bandwidth, DRAM latency has remained almost constant, reducing by only 1.3x in the same time frame. Therefore, long DRAM latency continues to be a critical performance bottleneck in modern systems. Increasing core counts, and the emergence of increasingly more data-intensive and latency-critical applications further stress the importance of providing low-latency memory access. In this dissertation, we identify three main problems that contribute significantly to long latency of DRAM accesses. To address these problems, we present a series of new techniques. Our new techniques significantly improve both system performance and energy efficiency. We also examine the critical relationship between supply voltage and latency in modern DRAM chips and develop new mechanisms that exploit this voltage-latency trade-off to improve energy efficiency. The key conclusion of this dissertation is that augmenting DRAM architecture with simple and low-cost features, and developing a better understanding of manufactured DRAM chips together lead to significant memory latency reduction as well as energy efficiency improvement. We hope and believe that the proposed architectural techniques and the detailed experimental data and observations on real commodity DRAM chips presented in this dissertation will enable development of other new mechanisms to improve the performance, energy efficiency, or reliability of future memory systems.

研究动机与目标

解决DRAM延迟这一长期存在的性能瓶颈，尽管过去20年容量提升了128倍、带宽提升了20倍，但延迟仅改善了1.3倍。
识别并解决DRAM子阵列间批量数据传输中的低效问题，当前涉及冗余的芯片外传输。
缓解DRAM刷新操作导致的性能下降，刷新操作会阻塞内存访问。
通过分类快速与慢速单元，利用DRAM单元固有的延迟差异（由制造不规则性引起），实现优化访问。
表征并利用DRAM中的电压-延迟权衡关系，通过动态电压调节提升能效。

提出的方法

设计LISA，一种低成本的子阵列间互联架构，支持子阵列间快速的片上数据传输，减少芯片外传输。
开发DSARP，一种双重技术，用于将内存访问与刷新操作并行化，降低刷新引起的延迟。
提出FLY-DRAM，将DRAM单元划分为快速与慢速区域，通过硬件和控制器修改，对快速区域实现更低延迟访问。
引入Voltron，一种动态电压调节机制，利用性能模型调整DRAM供电电压，在保证可靠性的同时最小化延迟。
对真实商用DRAM芯片进行广泛实验表征，测量单元间延迟差异及电压依赖的延迟行为。
使用基于FPGA的测试平台（SoftMC）和自定义模拟器（Ramulator、NoCulator）验证并评估所有提出的机制。

实验结果

研究问题

RQ1如何优化DRAM子阵列间的片上数据传输，以减少芯片外传输和延迟？
RQ2在多大程度上可以将内存访问与DRAM刷新操作重叠，以减少刷新引起的性能下降？
RQ3单个DRAM芯片内部分立DRAM单元的固有延迟差异程度和性质如何？如何加以利用？
RQ4供电电压如何影响DRAM访问延迟、能耗和可靠性？这一关系能否用于提升能效？
RQ5能否在真实系统中设计并验证利用单元级延迟差异和电压-延迟权衡的架构技术？

主要发现

LISA实现了子阵列间快速、节能的数据传输，减少了对芯片外传输的依赖，并支持低延迟机制，如快速复制和降低的bank准备延迟。
DSARP技术通过将访问与刷新操作重叠，显著降低了刷新引起的延迟，仅需极少硬件改动，即可实现接近无刷新理想系统的性能。
FLY-DRAM通过选择性访问更快的单元区域，降低了DRAM访问延迟，从而利用了固有的单元级延迟差异，提升了系统性能。
Voltron基于性能模型动态调整DRAM供电电压，通过利用电压-延迟权衡关系，降低了延迟并提升了能效。
研究揭示了由于制造不规则性，DRAM单元间存在显著的延迟差异，部分单元固有地比其他单元更快，这挑战了传统假设中‘最坏情况延迟一致’的观念。
实验表征表明，提高DRAM阵列供电电压可可靠地降低访问延迟，为系统优化开辟了新的能效-延迟权衡空间。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。