QUICK REVIEW

[论文解读] Searching Central Difference Convolutional Networks for Face Anti-Spoofing

Zitong Yu, Chenxu Zhao|arXiv (Cornell University)|Mar 9, 2020

Biometric Identification and Security参考文献 64被引用 38

一句话总结

引入 Central Difference Convolution (CDC) 及其 CNNs (CDCN/CDCN++)，用于帧级人脸防欺骗，采用 NAS 搜索骨干网和多尺度注意力融合模块；在六个基准数据集上实现跨数据集与数据内的最新性能。

ABSTRACT

Face anti-spoofing (FAS) plays a vital role in face recognition systems. Most state-of-the-art FAS methods 1) rely on stacked convolutions and expert-designed network, which is weak in describing detailed fine-grained information and easily being ineffective when the environment varies (e.g., different illumination), and 2) prefer to use long sequence as input to extract dynamic features, making them difficult to deploy into scenarios which need quick response. Here we propose a novel frame level FAS method based on Central Difference Convolution (CDC), which is able to capture intrinsic detailed patterns via aggregating both intensity and gradient information. A network built with CDC, called the Central Difference Convolutional Network (CDCN), is able to provide more robust modeling capacity than its counterpart built with vanilla convolution. Furthermore, over a specifically designed CDC search space, Neural Architecture Search (NAS) is utilized to discover a more powerful network structure (CDCN++), which can be assembled with Multiscale Attention Fusion Module (MAFM) for further boosting performance. Comprehensive experiments are performed on six benchmark datasets to show that 1) the proposed method not only achieves superior performance on intra-dataset testing (especially 0.2% ACER in Protocol-1 of OULU-NPU dataset), 2) it also generalizes well on cross-dataset testing (particularly 6.5% HTER from CASIA-MFSD to Replay-Attack datasets). The codes are available at \href{https://github.com/ZitongYu/CDCN}{https://github.com/ZitongYu/CDCN}.

研究动机与目标

推动鲁棒的帧级人脸防欺骗，降低对照明和环境变化的敏感性。
引入 Central Difference Convolution (CDC)，在不增加额外参数的情况下捕捉强度信息和梯度信息。
开发带有 NAS 搜索骨干网的 CDCN 和 CDCN++，以及 Multiscale Attention Fusion Module (MAFM) 以提升性能。
在六个数据集上的数据内与跨数据集 FAS 基准测试中展示最先进的性能。

提出的方法

Define Central Difference Convolution (CDC) as a weighted blend of vanilla convolution and a central-difference gradient term controlled by theta, enabling richer detail capture without extra parameters.
Replace vanilla convolutions with CDC in a depth-supervised FAS backbone to form CDCN, optimizing depth-map prediction with L_MSE and L_CDL losses.
Propose CDCN++, incorporating NAS-based backbone search over varied multi-level cells (low/mid/high) and a Node Attention mechanism for architecture selection.
Integrate a Multiscale Attention Fusion Module (MAFM) to refine and fuse multi-level CDC features with spatial attention for improved discrimination.
Conduct NAS-based backbone search in a bi-level optimization framework, followed by discrete architecture derivation and performance evaluation.

实验结果

研究问题

RQ1Can Central Difference Convolution (CDC) improve frame-level face anti-spoofing by capturing fine-grained invariant features under varying conditions?
RQ2Does NAS-based backbone search combined with CDCN++ and MAFM yield superior FAS performance both within datasets and across datasets?
RQ3What is the impact of different theta values and CDC variants on FAS performance?
RQ4How does CDCN/CCDC+ generalize to cross-type and cross-dataset spoofing attacks compared to existing methods?

主要发现

CDC outperforms vanilla convolutions and other variants; best results achieved with theta = 0.7.
CDC-based networks (CDCN) achieve state-of-the-art intra-dataset performance on OULU-NPU Protocol-1 (ACER as low as 1.0% for CDCN) and SiW (ACER 0.12% for CDCN++ in Protocols).
CDCN++ with NAS-based backbone and MAFM delivers superior intra-dataset results across all OULU-NPU protocols and strong cross-dataset gains (e.g., 6.5% HTER on CASIA-MFSD to Replay-Attack).
Cross-dataset testing shows CDCN++ achieves competitive to best-known results in CR/RC protocols (e.g., 6.5% HTER on CR; 29.8% on RC in the reported setup).
MAFM and varied-cell NAS backbones contribute to improved ACER across protocols and datasets, demonstrating the benefit of learned multi-level, attention-guided fusion.]
table_headers: ["Prot.", "Method", "APCER(%)", "BPCER(%)", "ACER(%)"]
table_rows: [["1", "GRADIANT", "1.3", "12.5", "6.9"], ["1", "STASN", "1.2", "2.5", "1.9"], ["1", "Auxiliary", "1.6", "1.6", "1.6"], ["1", "FAS-TD", "2.5", "0.0", "1.3"], ["1", "DeepPixBiS", "0.8", "0.0", "0.4"], ["1", "CDCN", "0.4", "1.7", "1.0"], ["1", "CDCN++", "0.4", "0.0", "0.2"]]} }{ }{
format_version_placeholder

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。