Skip to main content
QUICK REVIEW

[论文解读] Xception: Deep Learning with Depthwise Separable Convolutions

François Chollet|arXiv (Cornell University)|Oct 7, 2016
Domain Adaptation and Few-Shot Learning参考文献 16被引用 357
一句话总结

Xception 用逐通道卷积替代 Inception 模块,在参数量相近的情况下实现相当或更好的准确率,并且在大规模数据集(JFT)上获得显著更高的增益。

ABSTRACT

We present an interpretation of Inception modules in convolutional neural networks as being an intermediate step in-between regular convolution and the depthwise separable convolution operation (a depthwise convolution followed by a pointwise convolution). In this light, a depthwise separable convolution can be understood as an Inception module with a maximally large number of towers. This observation leads us to propose a novel deep convolutional neural network architecture inspired by Inception, where Inception modules have been replaced with depthwise separable convolutions. We show that this architecture, dubbed Xception, slightly outperforms Inception V3 on the ImageNet dataset (which Inception V3 was designed for), and significantly outperforms Inception V3 on a larger image classification dataset comprising 350 million images and 17,000 classes. Since the Xception architecture has the same number of parameters as Inception V3, the performance gains are not due to increased capacity but rather to a more efficient use of model parameters.

研究动机与目标

  • Motivate replacing Inception modules with depthwise separable convolutions to improve efficiency.
  • Propose a complete architecture (Xception) built from depthwise separable convolutions with residual connections.
  • Evaluate Xception against Inception V3 on ImageNet and a large-scale JFT dataset.
  • Analyze the impact of residual connections and intermediate activations on performance.
  • Discuss implications for future CNN design leveraging depthwise separable convolutions.

提出的方法

  • Interpret Inception modules as an intermediate form between regular convolutions and depthwise separable convolutions.
  • Design Xception as a linear stack of depthwise separable convolutions with 36 layers and 14 modules, using residual connections.
  • Train and evaluate on ImageNet (1000 classes) and a large-scale JFT-based task (17,000 classes) with comparable parameter counts to Inception V3.
  • Compare performance with Inception V3 under the same optimization and regularization settings.
  • Experiment with the presence of residual connections and with/without an intermediate non-linearity between depthwise and pointwise operations.

实验结果

研究问题

  • RQ1Does replacing Inception modules with depthwise separable convolutions improve classification performance given similar parameter counts?
  • RQ2How do residual connections affect convergence and final accuracy in Xception?
  • RQ3Is an intermediate non-linearity between depthwise and pointwise convolutions beneficial in depthwise separable architectures?
  • RQ4How does Xception perform on ImageNet compared to Inception V3 and on a large-scale JFT-based task?
  • RQ5What are the practical implications for model size and speed when using depthwise separable convolutions?

主要发现

数据集模型Top-1 准确率Top-5 准确率
ImageNetVGG-160.7150.901
ImageNetResNet-1520.7700.933
ImageNetInception V30.7820.941
ImageNetXception0.7900.945
JFTInception V3 - no FC layers6.36NA
JFTXception - no FC layers6.70NA
JFTInception V3 with FC layers6.50NA
JFTXception with FC layers6.78NA
ImageNetInception V3 (params anomaly)N/AN/A
ImageNetXception (params anomaly)N/AN/A
ImageNetInception V3N/AN/A
ImageNetXceptionN/AN/A
  • On ImageNet, Xception shows marginally better Top-1 accuracy and Top-5 accuracy than Inception V3.
  • On JFT (MAP@100), Xception achieves 6.70 without fully-connected layers and 6.78 with FC layers, outperforming Inception V3 variants.
  • Xception has a similar parameter count to Inception V3 (about 22.9M vs 23.6M) but yields better results on JFT and comparable or better results on ImageNet.
  • Residual connections are essential for convergence and performance in Xception.
  • Removing an intermediate non-linearity between depthwise and pointwise convolutions can improve training speed and final accuracy in this architecture.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。