QUICK REVIEW

[논문 리뷰] Multicolumn Networks for Face Recognition

Weidi Xie, Andrew Zisserman|arXiv (Cornell University)|2018. 07. 24.

Face recognition and analysis참고 문헌 17인용 수 79

한 줄 요약

본 논문은 시각적 품질로 이미지를 가중하고 콘텐츠 관련성으로 재보정하여 집합 단위 얼굴 디스크립터를 계산하는 Multicolumn Networks를 제안하고, 기존 방법들에 비해 IJB 벤치마크에서 성능을 향상시킨다.

ABSTRACT

The objective of this work is set-based face recognition, i.e. to decide if two sets of images of a face are of the same person or not. Conventionally, the set-wise feature descriptor is computed as an average of the descriptors from individual face images within the set. In this paper, we design a neural network architecture that learns to aggregate based on both "visual" quality (resolution, illumination), and "content" quality (relative importance for discriminative classification). To this end, we propose a Multicolumn Network (MN) that takes a set of images (the number in the set can vary) as input, and learns to compute a fix-sized feature descriptor for the entire set. To encourage high-quality representations, each individual input image is first weighted by its "visual" quality, determined by a self-quality assessment module, and followed by a dynamic recalibration based on "content" qualities relative to the other images within the set. Both of these qualities are learnt implicitly during training for set-wise classification. Comparing with the previous state-of-the-art architectures trained with the same dataset (VGGFace2), our Multicolumn Networks show an improvement of between 2-6% on the IARPA IJB face recognition benchmarks, and exceed the state of the art for all methods on these benchmarks.

연구 동기 및 목표

단순 평균 풀링을 넘어 품질 인식 집계를 학습하여 세트 기반 얼굴 검증을 다루는 것.
저품질 이미지를 낮은 가중치로 처리하기 위한 시각적 품질 제어 모듈을 도입한다.
세트 내에서의 상대적 판별 중요도에 따라 이미지를 재가중하기 위한 콘텐츠 품질 제어 모듈을 도입한다.
제안된 MN 아키텍처가 VGGFace2로 학습된 백본을 사용하여 IJB-A/B/C 검증 성능을 향상시킨다는 것을 보인다.
MN이 ResNet50에 최소한의 파라미터 오버헤드를 추가하면서 일관된 이점을 제공한다는 것을 보여준다.

제안 방법

각 이미지를 공용 ResNet50 백본으로 임베딩하여 이미지별 디스크립터를 얻는다.
시그모이드 활성화된 FC 계층을 통해 각 이미지에 대한 자기 인식 시각 품질 가중치를 계산한다.
각 이미지를 세트 평균 얼굴과 관련지어 콘텐츠 인식 품질 가중치를 계산하고, 두 번째 시그모이드 활성화 FC 계층으로 집계한다.
시각적 가중치와 콘텐츠 가중치를 결합하여 이미지 디스크립터의 가중합을 통해 세트 디스크립터를 형성한다.
먼저 VGGFace2에서 이미지별 사전 학습으로 훈련한 뒤, 세트 기반 분류로 엔드-투-엔드 미세 조정한다.
세트 디스크립터에 대한 코사인 유사도를 사용하여 IJB-A/B/C 벤치마크에서 평가한다.

실험 결과

연구 질문

RQ1세트 기반 얼굴 디스크립터를 개선하기 위해 각 이미지의 기여도가 단순한 절대 품질뿐만 아니라 상대적 세트 콘텐츠 품질에도 의존하도록 만들면 개선될 수 있는가?
RQ2시각적 및 콘텐츠 품질 제어를 모두 통합하는 것이 비제약적 얼굴 벤치마크에서 단순 평균 풀링 및 기존 주의 기법보다 더 나은 성능을 보이는가?
RQ3MN을 시각적 품질 제어만 사용할 때와 시각적+콘텐츠 품질 제어를 사용할 때 IJB-A/B/C 벤치마크에서의 성능 향상은 어느 정도인가?

주요 결과

Dataset	Architecture	FAR=1e-5	FAR=1e-4	FAR=1e-3	FAR=1e-2	FAR=1e-1
IJB-B	MN-v	0.683	0.818	0.902	0.955	0.984
IJB-B	MN-vc	0.708	0.831	0.909	0.958	0.985
IJB-C	MN-v	0.755	0.852	0.920	0.965	0.988
IJB-C	MN-vc	0.771	0.862	0.927	0.968	0.989

MN with visual quality (MN-v) outperforms prior state-of-the-art on IJB benchmarks using the same backbone.
Adding the content quality control (MN-vc) yields further improvements across IJB-B and IJB-C datasets.
Compared to ResNet50 baseline, MN introduces about 6K extra parameters and achieves 2-6% absolute gains on IJB-B and IJB-C.
On IJB-B, MN-v and MN-vc achieve TARs of 0.683/0.708, 0.818/0.831, 0.902/0.909, 0.955/0.958, 0.984/0.985 at FAR=1e-5…1e-1 respectively for MN-v and MN-vc.
On IJB-C, MN-v and MN-vc achieve TARs of 0.755/0.771, 0.852/0.862, 0.920/0.927, 0.965/0.968, 0.988/0.989 at the same FARs.
The results show the most pronounced improvements at very low FARs (1e-5 to 1e-3) due to better suppression of aberrant images and emphasis on discriminative views.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.