QUICK REVIEW

[논문 리뷰] Space-Time Correspondence as a Contrastive Random Walk

Allan Jabri, Andrew Owens|arXiv (Cornell University)|2020. 06. 25.

Human Pose and Action Recognition참고 문헌 113인용 수 116

한 줄 요약

이 논문은 palindrome 기반의 사이클-일관성에 의해 안내되고 edge dropout 및 test-time adaptation으로 강화된 video-derived 시공간 그래프 위에서 대조적 무작위 보행으로 시각적 시공간 대응을 학습하는 self-supervised 접근법을 도입한다.

ABSTRACT

This paper proposes a simple self-supervised approach for learning a representation for visual correspondence from raw video. We cast correspondence as prediction of links in a space-time graph constructed from video. In this graph, the nodes are patches sampled from each frame, and nodes adjacent in time can share a directed edge. We learn a representation in which pairwise similarity defines transition probability of a random walk, so that long-range correspondence is computed as a walk along the graph. We optimize the representation to place high probability along paths of similarity. Targets for learning are formed without supervision, by cycle-consistency: the objective is to maximize the likelihood of returning to the initial node when walking along a graph constructed from a palindrome of frames. Thus, a single path-level constraint implicitly supervises chains of intermediate comparisons. When used as a similarity metric without adaptation, the learned representation outperforms the self-supervised state-of-the-art on label propagation tasks involving objects, semantic parts, and pose. Moreover, we demonstrate that a technique we call edge dropout, as well as self-supervised adaptation at test-time, further improve transfer for object-centric correspondence.

연구 동기 및 목표

unlabeled video에서 공간과 시간에 걸친 시각적 대응을 포착하는 표현을 학습한다.
비디오 패치의 시공간 그래프에서 경로 탐색 문제로 대응을 형식화한다.
라벨 없이 감독을 제공하는 palindrome 시퀀스의 사이클-일관성을 사용한다.
edge dropout과 test-time adaptation을 통해 강인성과 전이성을 개선한다.

제안 방법

노드가 비디오 프레임의 패치이고 간선은 학습된 유사도를 바탕으로 이웃 프레임의 패치를 연결하는 방향성 시공간 그래프를 구성한다.
패치를 위한 임베딩 phi를 학습하여 쌍 간 유사도가 무작위 보행의 확률적 전이 행렬을 정의하도록 한다.
zero-shot 타깃을 제공하는 palindrome 시퀀스를 사용하여 앞으로 걷기와 뒤로 걷기 간의 사이클-일관성을 강제한다.
학습을 시작 노드로의 반환 가능성을 최대로 하는 관점으로 표현하도록 하는 대조적 학습 목표와 동일하게 경로를 따라 시작 노드로 돌아오는 것을 최대화하는 학습 문제로 형식화한다.
transition 행렬에 edge dropout을 도입하여 보행자가 대체 경로에 의존하도록 유도하고 공통 운명 영역의 그룹화를 개선한다.
라벨 전파 이전의 unlabeled video에서 임베딩을 미세 조정하여 테스트 시 self-supervised adaptation을 선택적으로 수행한다.

실험 결과

연구 질문

RQ1 Raw video 데이터를 통해 self-supervised 표현이 강건한 시각적 대응을 학습할 수 있는가?
RQ2 palindrome 시퀀스의 사이클-일관성이 ground-truth 라벨 없이 감독을 제공할 수 있는가?
RQ3 Edge dropout를 도입하면 객체 중심의 대응 및 분할 작업이 개선되는가?
RQ4 테스트 시 self-supervised 적응이 다운스트림 라벨 전파 작업으로의 전이를 더 강화하는가?

주요 결과

학습된 표현은 라벨 전파의 유사도 메트릭으로 활용될 때, 객체, 자세 키포인트, 의미적 부분을 포함하는 작업에서 태스크 특화된 적응 없이도 최첨단의 자기지도 방식들을 능가한다.
학습 중 보행 길이를 늘리면 다운스트림 성능이 향상되어 더 긴 거리 맥락의 이점을 시사한다.
edge dropout은 모델이 여러 타당한 경로에 의존하도록 강제하여 강건성을 높이고 객체 중심의 대응을 개선한다.
테스트 시 self-supervised adaptation은 특히 분할 품질의 재현률에서 객체 전파 품질에 추가 이득을 준다.
이 접근법은 더 긴 보행으로 확장 가능하며, 복잡한 감독 없이도 간단한 확장으로 확장할 수 있다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.