QUICK REVIEW

[论文解读] Tb/s Polar Successive Cancellation Decoder 16nm ASIC Implementation

Altuğ Süral, E. Göksu Sezer|arXiv (Cornell University)|Sep 20, 2020

Error Correcting Code Techniques被引用 7

一句话总结

本文提出了一种在16nm FinFET工艺下实现的优化极化逐次消除（OPSC）解码器ASIC设计，在0.79 mm²面积上实现了1.2 Tb/s编码吞吐量和0.95 pJ/bit的能量效率。该设计通过自适应LLR量化和寄存器优化/平衡技术，最小化面积与功耗，同时通过流水线和展开技术实现超高速吞吐，是首个在布局布线后ASIC结果中实现超过1 Tb/s吞吐量的极化解码器。

ABSTRACT

This work presents an efficient ASIC implementation of successive cancellation (SC) decoder for polar codes. SC is a low-complexity depth-first search decoding algorithm, favorable for beyond-5G applications that require extremely high throughput and low power. The ASIC implementation of SC in this work exploits many techniques including pipelining and unrolling to achieve Tb/s data throughput without compromising power and area metrics. To reduce the complexity of the implementation, an adaptive log-likelihood ratio (LLR) quantization scheme is used. This scheme optimizes bit precision of the internal LLRs within the range of 1-5 bits by considering irregular polarization and entropy of LLR distribution in SC decoder. The performance cost of this scheme is less than 0.2 dB when the code block length is 1024 bits and the payload is 854 bits. Furthermore, some computations in SC take large space with high degree of parallelization while others take longer time steps. To optimize these computations and reduce both memory and latency, register reduction/balancing (R-RB) method is used. The final decoder architecture is called optimized polar SC (OPSC). The post-placement-routing results at 16nm FinFet ASIC technology show that OPSC decoder achieves 1.2 Tb/s coded throughput on 0.79 mm$^2$ area with 0.95 pJ/bit energy efficiency.

研究动机与目标

为太比特每秒吞吐量的极化码解码在6G及高速数据中心应用中提供支持。
解决在ASIC实现中实现超高速吞吐与低功耗、小面积的挑战。
通过自适应量化和寄存器优化，降低逐次消除（SC）解码的实现复杂度。
展示一种适用于16nm FinFET工艺的可扩展、高能效极化解码器架构。

提出的方法

采用流水线化和展开的架构，以在1.2 GHz时钟频率下实现高吞吐量操作。
实现了一种自适应对数似然比（LLR）量化方案，根据LLR分布熵和极化不规则性动态调整内部位精度（1–5位）。
应用寄存器优化/平衡（R-RB）技术，以均衡各流水线阶段延迟，提升时钟频率，同时降低面积和功耗。
集成硬件快捷路径（R0, R1, SPC, REP）以加速简单码段的解码，减少计算时间。
采用混合解码策略：对复杂段使用标准SC，对SPC/REP节点使用Wagner/MAP解码器，以在极小面积开销下提升性能。
完成完整的物理设计流程，包括综合、布局、时钟树综合和布线优化，使用Cadence Innovus工具链及TSMC 16nm工艺库进行时序签核。

实验结果

研究问题

RQ1极化SC解码器是否能在16nm ASIC中实现太比特每秒吞吐量，同时保持可接受的功耗和面积效率？
RQ2自适应LLR量化在不造成显著性能损失的前提下，对降低实现复杂度的效率如何？
RQ3寄存器优化/平衡（R-RB）在深度流水线解码器中，对提升时钟频率和降低面积的改善程度如何？
RQ4与现有最先进极化解码器实现相比，所提出的OPSC架构在性能、面积和能量效率方面表现如何？

主要发现

OPSC解码器在16nm FinFET工艺下，于0.79 mm²面积上实现1.2 Tb/s编码吞吐量，能量效率为0.95 pJ/bit。
FPGA原型机实现200 Gb/s吞吐量，在8 dB Eb/No下误码率为1.1 × 10⁻¹³，验证了低误码率下的性能表现。
自适应LLR量化将内部精度降低至1–5位，对(1024, 854)码的性能损失小于0.2 dB。
设计以寄存器为主导（占总面积的69.8%），但R-RB技术有效平衡了流水线阶段，支持1.2 GHz运行。
布局布线后结果表明，面积效率达1554 Gb/s/mm²，是次优实现的10倍。
与已流片的ASIC相比，OPSC实现16倍更高的吞吐量、7.2倍更低的延迟和10倍更优的面积效率。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。