QUICK REVIEW

[논문 리뷰] Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers

Bo Dong, Wenhai Wang|arXiv (Cornell University)|2021. 08. 16.

Advanced Neural Network Applications참고 문헌 120인용 수 170

한 줄 요약

Polyp-PVT는 폴립 세분화를 개선하기 위해 세 가지 보조 모듈(CFM, CIM, SAM)이 있는 피라미드 비전 트랜스포머 인코더를 도입하여, 여러 벤치마크에서 최첨단 또는 경쟁력 있는 Dice 점수를 달성한다.

ABSTRACT

Most polyp segmentation methods use CNNs as their backbone, leading to two key issues when exchanging information between the encoder and decoder: 1) taking into account the differences in contribution between different-level features and 2) designing an effective mechanism for fusing these features. Unlike existing CNN-based methods, we adopt a transformer encoder, which learns more powerful and robust representations. In addition, considering the image acquisition influence and elusive properties of polyps, we introduce three standard modules, including a cascaded fusion module (CFM), a camouflage identification module (CIM), and a similarity aggregation module (SAM). Among these, the CFM is used to collect the semantic and location information of polyps from high-level features; the CIM is applied to capture polyp information disguised in low-level features, and the SAM extends the pixel features of the polyp area with high-level semantic position information to the entire polyp area, thereby effectively fusing cross-level features. The proposed model, named Polyp-PVT, effectively suppresses noises in the features and significantly improves their expressive capabilities. Extensive experiments on five widely adopted datasets show that the proposed model is more robust to various challenging situations (e.g., appearance changes, small objects, rotation) than existing representative methods. The proposed model is available at https://github.com/DengPingFan/Polyp-PVT.

연구 동기 및 목표

폴립 세분화를 위한 교차 수준 특성 융합에서 CNN 백본의 한계를 해결한다.
강건한 다중 스케일 표현을 학습하기 위해 트랜스포머 기반 인코더(PVT)를 도입한다.
높은 수준 특징과 낮은 수준 특징을 융합하고 노이즈를 억제하기 위해 세 가지 모듈(CFM, CIM, SAM)을 제안한다.
다섯 개의 도전적인 데이터셋에서 Polyp-PVT를 평가하고 최첨단 방법들과 비교한다.

제안 방법

입력 이미지에서 다중 스케일 특징 X1–X4를 추출하기 위해 인코더로 Pyramid Vision Transformer(PVTv2)를 채택한다.
Cascaded Fusion Module(CFM)을 사용하여 고수준 특징을 점진적으로 융합하고 T1을 생성한다.
Camouflage Identification Module(CIM)을 적용하여 채널 및 공간 주의(attention)를 통해 낮은 수준 특징 X1을 T2로 향상시킨다.
비국소성 및 그래프 합성 연산을 결합하는 Similarity Aggregation Module(SAM)을 도입하여 T1과 T2를 융합해 최종 특징 Z를 얻는다.
1x1 컨브 헤드로 세분화를 예측하고, 주손실(IoU + BCE)과 중간 출력용 보조 손실로 학습한다.

실험 결과

연구 질문

RQ1표준 벤치마크에서 Polyp-PVT가 CNN 기반 백본과 비교해 폴립 세분화에서 어떤 성능을 보이나요?
RQ2CFM, CIM, SAM이 전체 성능과 어려운 조건(노이즈, 위장, 교차 도메인 데이터)에 대한 강건성에 어떤 기여를 하나요?
RQ3내시경 영상의 외관 변화, 작은 폴립, 회전에 트랜스포머 기반 인코더가 어떻게 대응하나요?

주요 결과

Polyp-PVT는 Kvasir-SEG(mDic 0.917) 및 ClinicDB(mDic 0.937)에서 강한 교차-데이터셋 성능을 달성한다.
ColonDB에서 Polyp-PVT는 mDic 0.808을 달성하여 SANet보다 여유 있는 차이로 앞섰다(보고된 바에 따라).
ETIS에서 Polyp-PVT는 mDic 0.787로 SANet보다 눈에 띄는 차이로 상회한다.
Endoscene에서 Polyp-PVT는 mDic 0.900 및 mIoU 0.833에 도달하여 어려운 조건에서도 강건한 성능을 나타낸다.
전반적으로 Polyp-PVT는 외관 변화, 작은 객체, 회전에 대한 강건성을 보이며, SANet 및 PraNet 등 대표적인 베이스라인을 능가한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.