QUICK REVIEW

[논문 리뷰] Factorizable Net: An Efficient Subgraph-based Framework for Scene Graph Generation

Yikang Li, Wanli Ouyang|arXiv (Cornell University)|2018. 06. 29.

Multimodal Machine Learning Applications인용 수 46

한 줄 요약

요소화 가능한 네트(F-Net)를 도입하여 장면 그래프를 하위 그래프로 인수분해하고 중간 표현을 축소하여 공간 정보를 갖춘 모듈로 더 빠르고 더 정확한 장면 그래프 생성을 가능하게 한다.

ABSTRACT

Generating scene graph to describe all the relations inside an image gains increasing interests these years. However, most of the previous methods use complicated structures with slow inference speed or rely on the external data, which limits the usage of the model in real-life scenarios. To improve the efficiency of scene graph generation, we propose a subgraph-based connection graph to concisely represent the scene graph during the inference. A bottom-up clustering method is first used to factorize the entire scene graph into subgraphs, where each subgraph contains several objects and a subset of their relationships. By replacing the numerous relationship representations of the scene graph with fewer subgraph and object features, the computation in the intermediate stage is significantly reduced. In addition, spatial information is maintained by the subgraph features, which is leveraged by our proposed Spatial-weighted Message Passing~(SMP) structure and Spatial-sensitive Relation Inference~(SRI) module to facilitate the relationship recognition. On the recent Visual Relationship Detection and Visual Genome datasets, our method outperforms the state-of-the-art method in both accuracy and speed.

연구 동기 및 목표

“2차 관계 표현으로의 비효율적 제약을 넘어서는 효율적인 장면 그래프 생성을 мотив한다.”
“하위 그래프 기반 인수분해를 제안하여 구문 표현을 공유하고 계산을 줄인다.”
“2-D 하위 그래프 특징 맵을 통해 공간 정보를 보존하고 SMP 및 SRI 모듈을 설계한다.”
“VRD 및 Visual Genome 데이터셋에서 속도와 정확도 향상을 입증한다.”

제안 방법

모든 연결 관계를 갖는 객체 관계 그래프를 구성하고 유사한 관계 영역을 하위 그래프로 클러스터링한다.
공간 구조를 보존하기 위해 하위 그래프를 공유 2-D 특징 맵으로 표현한다.
Spatial-weighted Message Passing을 적용해 주의 기반 집계로 객체 및 하위 그래프 특징을 다듬는다.
Spatial-sensitive Relation Inference를 사용해 주체, 객체 및 하위 그래프 특징을 융합하여 종속성(predicates) 예측을 병목형 컨볼루션 방식으로 수행한다.

실험 결과

연구 질문

RQ1하위 그래프 기반 표현이 정확도 손실 없이 장면 그래프 생성의 계산 부담을 줄일 수 있는가?
RQ2하위 그래프 특징 맵에서 공간 정보를 유지하면 서술(predicate) 인식이 개선되는가?
RQ3공간 인식 메시지 전달 및 관계 추론이 최첨단 방법보다 성능을 향상시키는가?
RQ4표준 데이터셋(VRD 및 Visual Genome)에서 속도와 정확도 측면에서 모델은 어떻게 성능을 보이는가?

주요 결과

데이터셋	모델	PhrDet Rec@50	Rec@100	SGGen Rec@50	Rec@100	속도(s/img)
VRD	LP	16.17	17.03	13.86	14.70	1.18 ∗
VRD	ViP-CNN	22.78	27.91	17.32	20.01	0.78
VRD	DR-Net	19.93	23.45	17.73	20.88	2.83
VRD	ILC	16.89	20.70	15.08	18.37	2.70 ∗∗
VRD	Ours Full:1-SMP	25.90	30.52	18.16	21.04	0.45
VRD	Ours Full:2-SMP	26.03	30.77	18.32	21.20	0.55
VG-MSDN	ISGG [58]	15.87	19.45	8.23	10.88	1.64
VG-MSDN	MSDN [35]	19.95	24.93	10.72	14.22	3.56
VG-MSDN	Ours-Full: 2-SMP	22.84	28.57	13.06	16.47	0.55
VG-DR-Net	DR-Net [6]	23.95	27.57	20.79	23.76	2.83
VG-DR-Net	Ours-Full: 2-SMP	26.91	32.63	19.88	23.95	0.55

장점 상태-최첨단 방법과 비교해 VRD 및 Visual Genome 벤치마크에서 정확도와 속도 모두에서 우수하다.
하위 그래프 기반 클러스터링으로 중간 구문 표현을 크게 줄여 추론 속도를 높인다.
2-D 하위 그래프 특징 맵이 공간 정보를 보존하여 predicate 인식을 개선한다.
Spatial-weighted Message Passing 및 Spatial-sensitive Inference가 SGGen 재현율 및 구문 탐지에서 관측 가능한 이점을 제공한다.
SMP 모듈 수를 늘리면 정확도가 높아지지만 속도는 다소 감소하는 trade-off가 나타난다.
2-SMP를 포함한 전체 모델이 baselines에 비해 강력한 성능을 달성한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.