[Paper Review] Multi-style Generative Network for Real-time Transfer
MSG-Net introduces a CoMatch Layer to learn 2D style representations by matching second-order statistics, enabling real-time multi-style transfer with brush-size control and high image quality.
Despite the rapid progress in style transfer, existing approaches using feed-forward generative network for multi-style or arbitrary-style transfer are usually compromised of image quality and model flexibility. We find it is fundamentally difficult to achieve comprehensive style modeling using 1-dimensional style embedding. Motivated by this, we introduce CoMatch Layer that learns to match the second order feature statistics with the target styles. With the CoMatch Layer, we build a Multi-style Generative Network (MSG-Net), which achieves real-time performance. We also employ an specific strategy of upsampled convolution which avoids checkerboard artifacts caused by fractionally-strided convolution. Our method has achieved superior image quality comparing to state-of-the-art approaches. The proposed MSG-Net as a general approach for real-time style transfer is compatible with most existing techniques including content-style interpolation, color-preserving, spatial control and brush stroke size control. MSG-Net is the first to achieve real-time brush-size control in a purely feed-forward manner for style transfer. Our implementations and pre-trained models for Torch, PyTorch and MXNet frameworks will be publicly available.
Motivation & Objective
- Motivate improved quality and flexibility in real-time style transfer beyond 1D style embeddings.
- Introduce a CoMatch Layer to match second-order feature statistics (Gram matrices) of target styles.
- Build MSG-Net, a multi-style generator with upsampling strategies to avoid artifacts.
- Enable real-time brush-size control and compatibility with content-style interpolation, color preservation, and spatial control.
- Demonstrate scalability from 100 to 1,000 styles with minimal quality loss.
Proposed method
- Propose CoMatch Layer that learns a 2D style representation by matching Gram matrix statistics of style targets.
- Embed CoMatch Layer into a Multi-style Generative Network (MSG-Net) with an upsampled convolution decoder to avoid checkerboard artifacts.
- Use an upsampling residual block to preserve channel richness and enable deeper networks.
- Train with a loss that combines content and multi-scale style terms using a pre-trained loss network (e.g., VGG) and perceptual losses.
- Adopt a Siamese encoder to extract style statistics and a transformation network to apply them via CoMatch Layers at multiple scales.
- Allow brush-size control by varying the input style image size and training with multiple style sizes.
Experimental results
Research questions
- RQ1Can a 2D style representation via Gram matrix statistics enable high-quality multi-style transfer in a single feed-forward network?
- RQ2Does the CoMatch Layer improve style fidelity over traditional 1D embeddings across multiple styles?
- RQ3Can real-time brush-size control be achieved in a purely feed-forward framework without artifacts?
- RQ4How scalable is MSG-Net when increasing the number of styles from 100 to 1,000?
- RQ5Is MSG-Net compatible with existing style-transfer techniques like content-style interpolation and spatial control?
Key findings
- MSG-Net achieves superior image fidelity and real-time speeds compared to several prior methods.
- A 2D style representation via the CoMatch Layer effectively matches second-order statistics of target styles.
- Upsampled convolution and upsampling residual blocks reduce checkerboard artifacts while maintaining efficiency.
- MSG-Net-100 (2.3M parameters) runs at over 90 fps for 256×256 inputs and remains compatible with style interpolation, color-preserving, and spatial control features.
- Increasing to MSG-Net-1K (8.9M parameters) yields no obvious quality degradation and demonstrates scalable multi-style capability.
- The model enables real-time brush-size control by varying the input style image size, a first for purely feed-forward multi-style transfer.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.