QUICK REVIEW

[논문 리뷰] Outfit Generation and Style Extraction via Bidirectional LSTM and Autoencoder

Takuma Nakamura, Ryosuke Goto|arXiv (Cornell University)|2018. 06. 29.

Generative Adversarial Networks and Image Synthesis참고 문헌 3인용 수 40

한 줄 요약

이 논문은 BiLSTM 기반 의상 시퀀스 모델과 비지도 스타일 추출 오토인코더를 공동으로 도입하여 패션 호환성과 해석 가능한 의상 스타일을 학습하고, 스타일 제어가 가능한 의상 생성을 가능하게 한다.

ABSTRACT

When creating an outfit, style is a criterion in selecting each fashion item. This means that style can be regarded as a feature of the overall outfit. However, in various previous studies on outfit generation, there have been few methods focusing on global information obtained from an outfit. To address this deficiency, we have incorporated an unsupervised style extraction module into a model to learn outfits. Using the style information of an outfit as a whole, the proposed model succeeded in generating outfits more flexibly without requiring additional information. Moreover, the style information extracted by the proposed model is easy to interpret. The proposed model was evaluated on two human-generated outfit datasets. In a fashion item prediction task (missing prediction task), the proposed model outperformed a baseline method. In a style extraction task, the proposed model extracted some easily distinguishable styles. In an outfit generation task, the proposed model generated an outfit while controlling its styles. This capability allows us to generate fashionable outfits according to various preferences.

연구 동기 및 목표

아이템 호환성과 함께 전반적인 의상 스타일을 모델링할 필요성을 제시한다.
의상 시퀀스와 글로벌 스타일을 함께 학습하는 엔드-투-엔드 아키텍처를 제안한다.
해석 가능한 스타일 벡터를 제공하는 비지도 스타일 추출을 가능하게 한다.
목표 스타일에 조건화된 의상 생성을 시연하고 실제 데이터셋에서 평가한다.

제안 방법

의상을 CNN에서 추출된 아이템 특징의 시퀀스로 표현한다.
의상 내 아이템 간 호환성을 학습하기 위해 BiLSTM을 사용한다(전방 및 역방향 패스).
이미지 특징과 텍스트 속성을 맞추기 위해 Visual-Semantic Embedding (VSE)을 활용한다(가능할 때).
스타일 임베딩(SE) 모듈을 도입하여 기본 스타일들의 혼합으로 의상 스타일을 소프트맥스화된 스타일 벡터로 인코딩한다.
E_f + E_b + E_e + E_s + E_r를 결합한 목적함수로 학습하여 비지도 스타일 학습과 엔드투엔드 최적화를 가능하게 한다.
조합 시퀀스 우도와 스타일 유사도 항을 최적화하는 빔 탐색을 통해 스타일 제어 의상 생성을 가능하게 한다.

실험 결과

연구 질문

RQ1BiLSTM 기반 모델이 로컬 아이템 쌍을 넘어 전반적인 의상 호환성을 포착할 수 있는가?
RQ2비지도 스타일 추출 모듈이 의상 전체에 걸쳐 공유되는 해석 가능한 스타일 벡터를 산출할 수 있는가?
RQ3스타일 임베딩 모듈의 도입이 누락 아이템 예측을 개선하고 스타일 제어 의상 생성을 가능하게 하는가?

주요 결과

데이터셋	방법	gamma	정확도
Polyvore	Bi-LSTM + VSE (Han et al., 2017)	-	0.726
Polyvore	Bi-LSTM + SE (this paper)	0.0	0.729
Polyvore	Bi-LSTM + SE (this paper)	0.2	0.727
Polyvore	Bi-LSTM + SE (this paper)	0.5	0.723
Polyvore	Bi-LSTM + VSE + SE (this paper)	0.0	0.728
Polyvore	Bi-LSTM + VSE + SE (this paper)	0.2	0.732
Polyvore	Bi-LSTM + VSE + SE (this paper)	0.5	0.732
IQON	Bi-LSTM	-	0.703
IQON	Bi-LSTM + SE (this paper)	-	0.715
IQON	Bi-LSTM + SE (this paper)	0.2	0.713
IQON	Bi-LSTM + SE (this paper)	0.5	0.711

BiLSTM + SE 모델은 Polyvore에서 기준 모델보다 누락 아이템 예측 정확도가 높고 IQON에서 경쟁력 있는 결과를 보인다.
SE 모듈에서 추출된 스타일 벡터는 해석 가능하며 선형 결합으로 새로운 의상 스타일을 형성할 수 있다.
목표 스타일로 생성된 의상은 의도된 스타일 특성을 반영하며 스타일 제어 가능한 생성이 있음을 보인다.
VSE 없이도 BiLSTM + SE를 사용하면 강력한 성능을 얻어 라벨링된 속성이 없어도 효과적인 스타일 인식 시퀀스 모델링임을 나타낸다.
다중 요소의 스타일 기저는 기저 스타일의 혼합을 통해 복잡한 의상을 표현할 수 있다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.