QUICK REVIEW

[논문 리뷰] Towards Seamless Adaptation of Pre-trained Models for Visual Place Recognition

Feng Lu, Lijun Zhang|arXiv (Cornell University)|2024. 02. 22.

Robotics and Sensor-Based Localization인용 수 10

한 줄 요약

본 논문은 SelaVPR을 제안하며, 경량 어댑터와 상호 최근 이웃 로컬 로스를 활용한 사전 학습된 기초 모델의 글로벌-로컬 하이브리드 적응으로 최소한의 파인튜닝으로 빠르고 정확한 2단계 VPR을 가능하게 한다.

ABSTRACT

Recent studies show that vision models pre-trained in generic visual learning tasks with large-scale data can provide useful feature representations for a wide range of visual perception problems. However, few attempts have been made to exploit pre-trained foundation models in visual place recognition (VPR). Due to the inherent difference in training objectives and data between the tasks of model pre-training and VPR, how to bridge the gap and fully unleash the capability of pre-trained models for VPR is still a key issue to address. To this end, we propose a novel method to realize seamless adaptation of pre-trained models for VPR. Specifically, to obtain both global and local features that focus on salient landmarks for discriminating places, we design a hybrid adaptation method to achieve both global and local adaptation efficiently, in which only lightweight adapters are tuned without adjusting the pre-trained model. Besides, to guide effective adaptation, we propose a mutual nearest neighbor local feature loss, which ensures proper dense local features are produced for local matching and avoids time-consuming spatial verification in re-ranking. Experimental results show that our method outperforms the state-of-the-art methods with less training data and training time, and uses about only 3% retrieval runtime of the two-stage VPR methods with RANSAC-based spatial verification. It ranks 1st on the MSLS challenge leaderboard (at the time of submission). The code is released at https://github.com/Lu-Feng/SelaVPR.

연구 동기 및 목표

사전 학습과 시각 장소 인식(VPR) 작업 간의 격차를 전체 파인튜닝 없이 기초 모델을 적응시켜 줄 수 있도록 한다.
빠른 검색과 정확한 재정렬을 지원하기 위해 글로벌 및 로컬 특징을 모두 생성한다.
재정렬에서 비싼 공간 검증을 제거하고, 농밀한 로컬 특징을 활용해 직접 매칭한다.
데이터 효율적인 학습과 실시간 검색 능력을 주요 VPR 벤치마크 전반에서 입증한다.

제안 방법

글로벌 적응 도입: MHA 뒤에 직렬 어댑터를 추가하고 트랜스포머 블록의 각 MLP에 병렬 어댑터를 추가해 판별 가능한 랜드마크에 초점을 맞춘다.
로컬 적응 도입: 백본 뒤에 업-컨볼루션 계층을 두어 재정렬용 밀집 로컬 특징 맵을 생성한다.
특징 맵에 GeM 풀링을 적용하여 후보 검색을 위한 글로벌 특징을 얻는다.
쿼리/후보 로컬 특징 간 상호 최근 이웃을 통한 로컬 매치를 계산하고 매칭 수를 재정렬 점수로 사용한다(공간 검증은 사용하지 않음).
공동 손실로 학습: 전역 트리플렛 손실 Lg와 로컬 특징 손실 Ll를 가중치 lambda와 함께 사용해 재정렬용 로컬 특징을 최적화한다.
기본 아키텍처는 가중치를 고정한 DINOv2 ViT-L/14 백본과 경량 어댑터를 사용한다(매개변수 효율적 전이 학습).

실험 결과

연구 질문

RQ1경량 어댑터가 전체 파인튜닝 없이 사전 학습된 기초 모델을 VPR에 원활하게 적응시키게 할 수 있는가?
RQ2하이브리드 글로벌-로컬 적응이 VPR에서 글로벌 검색과 로컬 재정렬 모두를 개선하는가?
RQ3상호 최근 이웃 로컬 특징 손실이 RANSAC 없이도 재정렬에 적합한 효과적인 밀집 로컬 특징을 생성하는가?
RQ4표준 벤치마크에서 SelaVPR이 최첨단 VPR 방법과 비교하여 성능 및 런타임 측면에서 어떠한가?

주요 결과

SelaVPR은 여러 VPR 벤치마크에서 최첨단 결과를 달성했고 제출 시점의 MSLS 챌린지 리더보드에서 1위를 차지했다.
SelaVPR(global)은 글로벌 검색에서 다수의 단일-단계 방법을 능가하며 데이터셋 전반에서 강한 R@5 및 R@10을 달성한다.
전체 SelaVPR(global + local 적응)은 상당한 이득을 제공하며 재정렬 후 Tokyo24/7 및 Pitts30k에서 주목할 만한 R@1 향상을 보인다.
로컬 적응은 Tokyo24/7에서 큰 R@1 이득을 가져왔으며, 도전적 환경에서 밀집 로컬 특징의 이점을 강조한다.
SelaVPR은 공간 검증 없이 재정렬을 가능하게 하며, RANSAC 기반 2단계 방법의 약 3% 수준의 검색 시간과 Pitts30k-test에서 총 실행 시간 4% 미만을 달성한다.
ablation 연구는 글로벌과 로컬 적응의 필요성을 보여주며, 매개변수 효율적인 튜닝이 전이 가능성을 보존하면서 성능을 향상시킴을 시사한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.