[Paper Review] Chip Placement with Deep Reinforcement Learning
The paper formulates chip placement as a reinforcement learning problem, using a domain-adaptive policy that learns from past netlists to rapidly generate high-quality placements for unseen blocks, achieving superhuman or comparable results in under 6 hours.
In this work, we present a learning-based approach to chip placement, one of the most complex and time-consuming stages of the chip design process. Unlike prior methods, our approach has the ability to learn from past experience and improve over time. In particular, as we train over a greater number of chip blocks, our method becomes better at rapidly generating optimized placements for previously unseen chip blocks. To achieve these results, we pose placement as a Reinforcement Learning (RL) problem and train an agent to place the nodes of a chip netlist onto a chip canvas. To enable our RL policy to generalize to unseen blocks, we ground representation learning in the supervised task of predicting placement quality. By designing a neural architecture that can accurately predict reward across a wide variety of netlists and their placements, we are able to generate rich feature embeddings of the input netlists. We then use this architecture as the encoder of our policy and value networks to enable transfer learning. Our objective is to minimize PPA (power, performance, and area), and we show that, in under 6 hours, our method can generate placements that are superhuman or comparable on modern accelerator netlists, whereas existing baselines require human experts in the loop and take several weeks.
Motivation & Objective
- Minimize power, performance, and area (PPA) while satisfying density and routing constraints.
- Enable transfer learning so a policy improves with more chip blocks and generalizes to unseen netlists.
- Ground state representation learning via a supervised reward-prediction task to improve generalization.
- Reduce reliance on human experts by achieving high-quality placements quickly for large netlists.
Proposed method
- Formulate chip placement as a Markov Decision Process with macros placed sequentially on a grid.
- Use a policy network trained with Proximal Policy Optimization (PPO) to maximize a reward based on proxy wirelength and congestion under density constraints.
- Ground representation learning with a supervised graph neural network to predict placement reward, enabling the policy encoder for transfer learning.
- Discretize the chip canvas into an m x n grid and enforce a hard density constraint (max_density = 0.6) to prune infeasible placements.
- Place macros first with the RL agent and finish with a force-directed method for standard cells; evaluate via a fast, approximate reward.
- Employ domain adaptation by pre-training on multiple netlists and fine-tuning to unseen blocks, achieving faster convergence and better results.
Experimental results
Research questions
- RQ1Can a learned policy generalize to unseen chip netlists through domain adaptation?
- RQ2Does pre-training on diverse netlists enable zero-shot or rapidly finetuned placement for new blocks?
- RQ3How does the RL-based approach compare to state-of-the-art baselines in terms of PPA, density, and routing congestion?
Key findings
- The method achieves placement results that are superhuman or comparable on real accelerator netlists within under 6 hours.
- Zero-shot placements on unseen netlists can be generated in under a second using a pre-trained policy without finetuning.
- Finetuning a pre-trained policy shortens convergence time and improves final cost relative to policies trained from scratch.
- Domain adaptation reduces training time by about 8-fold compared to training from scratch.
- Pre-trained policies consistently outperform policies trained from scratch across blocks.
- Placements visually align with expert intuition, placing standard cells centrally with macros arranged around them.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.