QUICK REVIEW

[논문 리뷰] High-Resolution Representations for Labeling Pixels and Regions

Ke Sun, Yang Zhao|arXiv (Cornell University)|2019. 04. 09.

Digital Image Processing Techniques참고 문헌 129인용 수 661

한 줄 요약

이 논문은 HRNet을 확장하여 모든 병렬 고해상도-저해상도 분기에서 표현을 집계하는 HRNetV2를 도입하고, 시맨틱 세분화와 얼굴 랜드마크 검출에서 최첨단 결과를 달성하는 더 강력한 고해상도 특성을 제공하며, 물체 검출을 위한 다중 레벨 표현을 강화한다.

ABSTRACT

High-resolution representation learning plays an essential role in many vision problems, e.g., pose estimation and semantic segmentation. The high-resolution network (HRNet)~\cite{SunXLW19}, recently developed for human pose estimation, maintains high-resolution representations through the whole process by connecting high-to-low resolution convolutions in \emph{parallel} and produces strong high-resolution representations by repeatedly conducting fusions across parallel convolutions. In this paper, we conduct a further study on high-resolution representations by introducing a simple yet effective modification and apply it to a wide range of vision tasks. We augment the high-resolution representation by aggregating the (upsampled) representations from all the parallel convolutions rather than only the representation from the high-resolution convolution as done in~\cite{SunXLW19}. This simple modification leads to stronger representations, evidenced by superior results. We show top results in semantic segmentation on Cityscapes, LIP, and PASCAL Context, and facial landmark detection on AFLW, COFW, $300$W, and WFLW. In addition, we build a multi-level representation from the high-resolution representation and apply it to the Faster R-CNN object detection framework and the extended frameworks. The proposed approach achieves superior results to existing single-model networks on COCO object detection. The code and models have been publicly available at \url{https://github.com/HRNet}.

연구 동기 및 목표

포즈 추정 범위를 넘어 픽셀/영역 표기 작업을 위한 고해상도 표현을 동기 부여하고 개선한다.
모든 병렬 해상도에서 표현을 활용하기 위한 HRNet의 간단한 수정 방법을 조사한다.
시맨틱 세분화, 얼굴 랜드마크 검출 및 물체 검출 작업 전반에 걸쳐 이 방법을 시연한다.
다중 수준의 고해상도 특성이 작은 물체의 탐지 및 전체 성능 향상에 기여함을 보인다.

제안 방법

반복적인 다스케일 융합을 갖춘 병렬 다해상도 컨볼루션을 통해 고해상도 표현을 유지한다.
고해상도 스트림뿐만 아니라 모든 병렬 해상도에서 업샘플링된 표현을 집계하여 HRNetV2를 도입한다.
저해상도 가지의 특징을 업샘플링하고 연결하여 더 풍부한 고해상도 표현을 형성한다.
검출을 위해 고해상도 표현을 다운샘플링하여 특징 피라미드(HRNetV2 p)를 위한 다중 수준 특징을 생성한다.
네 단계 백본과 다해상도 블록으로 인스턴스화하고, 태스크별 헤드 이전에 모든 해상도의 특징을 혼합한다.
시맨틱 세분화 및 얼굴 랜드마크 히트맵용으로 고해상도 출력에 분할 헤드를 적용하고, Faster R-CNN/Mask R-CNN/Cascade R-CNN용 다중 레벨 특징을 구성한다.

실험 결과

연구 질문

RQ1모든 고→저해상도 분기에서 표현을 집계하면 고해상도 특징의 품질이 향상될 수 있는가?
RQ2HRNetV2 표현이 기존 HRNet보다 더 나은 시맨틱 세분화 및 얼굴 랜드마크 검출을 제공하는가?
RQ3다중 수준의 HRNet 표현이 Faster R-CNN 및 확장 버전과 같은 물체 검출 프레임워크를 향상시킬 수 있는가?

주요 결과

HRNetV2는 모든 병렬 해상도를 활용함으로써 고해상도 표현을 크게 강화한다.
효율적인 모델 크기와 연산으로 Cityscapes, PASCAL Context, LIP에서 시맨틱 세분화의 최첨단 성능을 달성한다.
얼굴 랜드마크 검출에서 AFLW, COFW, 300W, WFLW에서 최상의 결과를 얻는다.
다중 수준 HRNet 표현(HRNetV2 p)은 Faster R-CNN/Mask R-CNN/Cascade R-CNN에 통합될 때 COCO 객체 검출을 향상시킨다.
Faster R-CNN 및 Cascade R-CNN 설정에서 다중 스케일 훈련/테스트 없이 COCO test-dev의 유사 단일 모델 검출기보다 우수하다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.