QUICK REVIEW

[Paper Review] Multi-style Generative Network for Real-time Transfer

Hang Zhang, Kristin Dana|arXiv (Cornell University)|Mar 20, 2017

Generative Adversarial Networks and Image Synthesis43 references113 citations

TL;DR

MSG-Net introduces a CoMatch Layer to learn 2D style representations by matching second-order statistics, enabling real-time multi-style transfer with brush-size control and high image quality.

ABSTRACT

Despite the rapid progress in style transfer, existing approaches using feed-forward generative network for multi-style or arbitrary-style transfer are usually compromised of image quality and model flexibility. We find it is fundamentally difficult to achieve comprehensive style modeling using 1-dimensional style embedding. Motivated by this, we introduce CoMatch Layer that learns to match the second order feature statistics with the target styles. With the CoMatch Layer, we build a Multi-style Generative Network (MSG-Net), which achieves real-time performance. We also employ an specific strategy of upsampled convolution which avoids checkerboard artifacts caused by fractionally-strided convolution. Our method has achieved superior image quality comparing to state-of-the-art approaches. The proposed MSG-Net as a general approach for real-time style transfer is compatible with most existing techniques including content-style interpolation, color-preserving, spatial control and brush stroke size control. MSG-Net is the first to achieve real-time brush-size control in a purely feed-forward manner for style transfer. Our implementations and pre-trained models for Torch, PyTorch and MXNet frameworks will be publicly available.

Motivation & Objective

Motivate improved quality and flexibility in real-time style transfer beyond 1D style embeddings.
Introduce a CoMatch Layer to match second-order feature statistics (Gram matrices) of target styles.
Build MSG-Net, a multi-style generator with upsampling strategies to avoid artifacts.
Enable real-time brush-size control and compatibility with content-style interpolation, color preservation, and spatial control.
Demonstrate scalability from 100 to 1,000 styles with minimal quality loss.

Proposed method

Propose CoMatch Layer that learns a 2D style representation by matching Gram matrix statistics of style targets.
Embed CoMatch Layer into a Multi-style Generative Network (MSG-Net) with an upsampled convolution decoder to avoid checkerboard artifacts.
Use an upsampling residual block to preserve channel richness and enable deeper networks.
Train with a loss that combines content and multi-scale style terms using a pre-trained loss network (e.g., VGG) and perceptual losses.
Adopt a Siamese encoder to extract style statistics and a transformation network to apply them via CoMatch Layers at multiple scales.
Allow brush-size control by varying the input style image size and training with multiple style sizes.

Experimental results

Research questions

RQ1Can a 2D style representation via Gram matrix statistics enable high-quality multi-style transfer in a single feed-forward network?
RQ2Does the CoMatch Layer improve style fidelity over traditional 1D embeddings across multiple styles?
RQ3Can real-time brush-size control be achieved in a purely feed-forward framework without artifacts?
RQ4How scalable is MSG-Net when increasing the number of styles from 100 to 1,000?
RQ5Is MSG-Net compatible with existing style-transfer techniques like content-style interpolation and spatial control?

Key findings

MSG-Net achieves superior image fidelity and real-time speeds compared to several prior methods.
A 2D style representation via the CoMatch Layer effectively matches second-order statistics of target styles.
Upsampled convolution and upsampling residual blocks reduce checkerboard artifacts while maintaining efficiency.
MSG-Net-100 (2.3M parameters) runs at over 90 fps for 256×256 inputs and remains compatible with style interpolation, color-preserving, and spatial control features.
Increasing to MSG-Net-1K (8.9M parameters) yields no obvious quality degradation and demonstrates scalable multi-style capability.
The model enables real-time brush-size control by varying the input style image size, a first for purely feed-forward multi-style transfer.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.