QUICK REVIEW

[논문 리뷰] SDF-SRN: Learning Signed Distance 3D Object Reconstruction from Static Images

Chen-Hsuan Lin, Chaoyang Wang|arXiv (Cornell University)|2020. 10. 20.

Advanced Vision and Imaging인용 수 50

한 줄 요약

SDF-SRN은 differentiable rendering을 사용하여 단일 시점 이미지와 2D silhouettes로 밀집 3D signed distance function 표현을 학습하고, explicit multi-view 매칭 없이 단일 시점 학습을 가능하게 하며 ShapeNet과 PASCAL3D+에서 최첨단 방법을 능가한다.

ABSTRACT

Dense 3D object reconstruction from a single image has recently witnessed remarkable advances, but supervising neural networks with ground-truth 3D shapes is impractical due to the laborious process of creating paired image-shape datasets. Recent efforts have turned to learning 3D reconstruction without 3D supervision from RGB images with annotated 2D silhouettes, dramatically reducing the cost and effort of annotation. These techniques, however, remain impractical as they still require multi-view annotations of the same object instance during training. As a result, most experimental efforts to date have been limited to synthetic datasets. In this paper, we address this issue and propose SDF-SRN, an approach that requires only a single view of objects at training time, offering greater utility for real-world scenarios. SDF-SRN learns implicit 3D shape representations to handle arbitrary shape topologies that may exist in the datasets. To this end, we derive a novel differentiable rendering formulation for learning signed distance functions (SDF) from 2D silhouettes. Our method outperforms the state of the art under challenging single-view supervision settings on both synthetic and real-world datasets.

연구 동기 및 목표

ground-truth 3D 모양이 없는 단일 시점 이미지 컬렉션으로부터 실용적인 3D 재구성을 촉진한다.
2D silhouettes로 학습된 continuous implicit 3D 표현을 통해 signed distance functions (SDF)를 제안한다.
3D 표면을 RGB 이미지로 최적화하기 위한 Scene Representation Networks 기반의 differentiable rendering 프레임워크를 개발한다.
단일 시점 데이터로 카테고리 특정의 실제 세계 3D 재구성을 가능하게 한다.
ShapeNet과 PASCAL3D+에서 3D-unsupervised baselines와 비교하여 재구성 품질이 우수함을 입증한다.

제안 방법

3D 모양을 연속적인 암시 함수 f: R^3 -> R로 표현하고 0 레벨 셋이 표면을 정의한다.
2D silhouette 거리 변환을 활용해 back-projected cones와 circles로부터 3D SDF의 하한을 유도하여 이미지의 모든 픽셀에서 감독 신호를 가능하게 한다.
2D 거리 변환에서 도출된 하한 b(z;u)를 강제하는 손실 L_SDF로 암시적 SDF f(theta)를 학습한다(Eq. 4).
Scene Representation Networks에 기초한 differentiable 렌더링 프로세스를 사용하여 표면을 RGB 이미지와 맞추고, implicit surface와 광선 마샬 깊이의 일치를 강제하는 bilevel 최적화(Eq. 7)와 RGB 재구성(Eq. 8)을 수행한다.
RGB I로부터 f, g, h (theta, phi, psi)의 매개변수를 예측하는 이미지 조건화 하이퍼네트워크를 채택하고, f의 단위 노름 기울기(에이콜로) 정규화를 포함한다(Eq. 10).
L_SDF, L_RGB, L_ray, L_eik의 가중합으로 엔드투엔드 학습을 수행한다(Eq. 11).

실험 결과

연구 질문

RQ1단일 시점 이미지와 2D 실루엣으로 explicit multi-view 감독 없이도 Dense한 3D signed distance function을 학습할 수 있는가?
RQ22D silhouette 거리 변환을 어떻게 활용하여 3D 표면 학습에 풍부한 기하학적 감독 신호를 제공할 수 있는가?
RQ3explicit SDF에 고정된 differentiable 렌더링이 점유 기반 또는 메쉬 기반 사전정보에 비해 실제 이미지에서 3D 재구성 품질을 향상시키는가?
RQ4합성 및 자연 이미지 데이터셋에서 암시적 3D 형태 학습을 위한 단일 시점 학습 방식의 카테고리 특화 이점은 무엇인가?

주요 결과

SDF-SRN은 ShapeNet 데이터에서 비행기, 자동차, 의자 카테고리에서 단일 시점 감독 하에 SoftRas 및 DVR보다 우수한 성능을 보인다(표 1의 정확도 및 커버리지 지표).
SDF-SRN은 ShapeNet에서 시각적 헐(depth)에서의 깊이 supervise DVR보다 더 나은 3D 형태 복구를 달성한다(표 1).
PASCAL3D+에서 단일 시점 감독 하에 SDF-SRN은 비행기, 자동차, 의자 카테고리에서 CMR 및 DVR에 비해 정량적 이득을 보인다(표 4).
ablations는 RGB 렌더링 제거, 중요도 가중치 또는 위치 인코딩의 제거가 성능 저하를 야기하며, 테스트 시 최적화를 포함한 전체 SDF-SRN이 최상의 결과를 낳는다(표 3).
SDF-SRN은 단일 시점 데이터로부터 강건한 3D 토폴로지 회복을 보이며 실제 세계 이미지(PASCAL3D+)에서도 잘 작동하여 대규모 실제 데이터셋에의 적용 가능성을 시사한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.