QUICK REVIEW

[论文解读] Chip Placement with Deep Reinforcement Learning

Azalia Mirhoseini, Anna Goldie|arXiv (Cornell University)|Apr 22, 2020

VLSI and FPGA Design Techniques参考文献 36被引用 152

一句话总结

论文将芯片放置建模为强化学习问题，使用一个领域自适应策略，学习自过去的网表以快速生成未见区块的高质量放置，在6小时内达到超人类或可比结果。

ABSTRACT

In this work, we present a learning-based approach to chip placement, one of the most complex and time-consuming stages of the chip design process. Unlike prior methods, our approach has the ability to learn from past experience and improve over time. In particular, as we train over a greater number of chip blocks, our method becomes better at rapidly generating optimized placements for previously unseen chip blocks. To achieve these results, we pose placement as a Reinforcement Learning (RL) problem and train an agent to place the nodes of a chip netlist onto a chip canvas. To enable our RL policy to generalize to unseen blocks, we ground representation learning in the supervised task of predicting placement quality. By designing a neural architecture that can accurately predict reward across a wide variety of netlists and their placements, we are able to generate rich feature embeddings of the input netlists. We then use this architecture as the encoder of our policy and value networks to enable transfer learning. Our objective is to minimize PPA (power, performance, and area), and we show that, in under 6 hours, our method can generate placements that are superhuman or comparable on modern accelerator netlists, whereas existing baselines require human experts in the loop and take several weeks.

研究动机与目标

在满足密度与布线约束的同时，最小化功耗、性能和面积（PPA）
实现迁移学习，使策略在更多芯片区块上得到提升并能泛化到未见网表
通过有监督的奖励预测任务进行地面状态表示学习以提升泛化能力
减少对人类专家的依赖，通过快速实现大型网表的高质量放置

提出的方法

将芯片放置建模为马尔可夫决策过程，在网格上逐步放置宏块
使用通过近端策略优化（PPO）训练的策略网络，在密度约束下基于代理线长和拥塞的奖励进行最大化
通过有监督的图神经网络进行地面表示学习以预测放置奖励，使策略编码器具备迁移学习能力
将芯片画布离散化为m x n网格并强制硬密度约束（max_density = 0.6）以修剪不可行放置
先由RL代理放置宏块，再用力导向法完成标准单元的放置；通过快速、近似奖励进行评估
通过在多份网表上进行预训练并对未见区块进行微调实现领域自适应，从而实现更快的收敛和更好的结果

实验结果

研究问题

RQ1学习到的策略是否能通过领域自适应泛化到未见的芯片网表？
RQ2在多样化网表上进行预训练是否能实现对新区块的零样本或快速微调放置？
RQ3基于RL的方法在PPA、密度和布线拥塞方面与最先进基线相比有何差异？

主要发现

该方法在真实加速器网表上实现的放置结果在6小时内达到超人类或可比水平。
使用预训练策略在未见网表上实现零样本放置且在不到一秒内完成（无需微调）。
对预训练策略进行微调可缩短收敛时间，并相对于从零开始训练的策略提高最终成本表现。
与从零开始训练相比，领域自适应将训练时间缩短约8倍。
预训练策略在不同区块上始终优于从零开始训练的策略。
放置在视觉上符合专家直觉，标准单元居中放置，宏块围绕其布置。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。