[Paper Review] Learning to Paint With Model-based Deep Reinforcement Learning
This work trains a model-based DRL agent that paints target images by sequentially placing hundreds of strokes via a differentiable neural renderer, achieving realistic results on MNIST, SVHN, CelebA, and ImageNet without human stroke data.
We show how to teach machines to paint like human painters, who can use a small number of strokes to create fantastic paintings. By employing a neural renderer in model-based Deep Reinforcement Learning (DRL), our agents learn to determine the position and color of each stroke and make long-term plans to decompose texture-rich images into strokes. Experiments demonstrate that excellent visual effects can be achieved using hundreds of strokes. The training process does not require the experience of human painters or stroke tracking data. The code is available at https://github.com/hzwer/ICCV2019-LearningToPaint.
Motivation & Objective
- Enable an agent to decompose a target image into an ordered sequence of strokes to recreate the image on canvas.
- Develop a differentiable neural renderer to enable end-to-end, model-based DRL training for painting.
- Handle continuous stroke parameters and long-horizon planning to reproduce texture-rich images.
- Demonstrate painting quality across varied real-world datasets without requiring human stroke data.
Proposed method
- Model the painting process as a Markov Decision Process with continuous action space representing stroke parameters.
- Use a model-based DDPG framework where a differentiable neural renderer provides transition dynamics and rewards.
- Define rewards via a WGAN-based discriminator to measure similarity between the painting and target images.
- Employ an Action Bundle strategy to predict multiple strokes per training step and adjust the discount factor accordingly.
- Represent strokes as quadratic Bézier curves with control points, thickness, transparency, and RGB color, rendered by a differentiable neural renderer.
- Train with adversarial training (discriminator, critic) to improve pixel-level realism and overall painting quality.
Experimental results
Research questions
- RQ1Can a model-based DRL agent learn to decompose target images into hundreds of strokes to recreate the image on a canvas?
- RQ2Does using a differentiable neural renderer and model-based planning improve painting quality and convergence speed over model-free approaches?
- RQ3What is the impact of reward design (WGAN-based vs L2) on the realism and fidelity of generated paintings?
- RQ4How do stroke count and action-bundle settings affect performance across datasets of increasing complexity?
- RQ5How well does the method generalize across diverse datasets such as MNIST, SVHN, CelebA, and ImageNet?
Key findings
- A model-based DDPG agent achieves substantially better painting fidelity than model-free variants, with about 5x smaller ell2 distance than DDPG with PatchQ and 20x smaller than the original DDPG on CelebA tests.
- WGAN-based rewards yield richer textures and can achieve lower ell2 loss than purely L2 rewards on testing data.
- Increasing the number of strokes improves painting quality for texture-rich images (e.g., 200 vs 400 vs 1000 strokes).
- An Action Bundle of 5 strokes per step provides a favorable trade-off between learning speed and planning capability.
- The method handles multiple stroke designs (quadratic Bézier curves, straight, triangle, circle) and can produce visually similar results across datasets from digits to natural scenes.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.