QUICK REVIEW

[논문 리뷰] Query, Decompose, Compress: Structured Query Expansion for Efficient Multi-Hop Retrieval

JungMin Yun, YoungBin Kim|arXiv (Cornell University)|2026. 01. 14.

Information Retrieval and Search Behavior인용 수 0

한 줄 요약

DeCoR은 다단계 조회 보정을 위한 구조화된 쿼리 보정 기술을 도입하여, 복잡한 쿼리를 서브쿼리로 분해하고 검색 문서에서 증거를 압축하여, 더 작은 LLM으로도 성능을 향상시킵니다.

ABSTRACT

Large Language Models (LLMs) have been increasingly employed for query expansion. However, their generative nature often undermines performance on complex multi-hop retrieval tasks by introducing irrelevant or noisy information. To address this challenge, we propose DeCoR (Decompose and Compress for Retrieval), a framework grounded in structured information refinement. Rather than generating additional content, DeCoR strategically restructures the query's underlying reasoning process and distills supporting evidence from retrieved documents. It consists of two core components tailored to the challenges of multi-hop retrieval: (1) Query Decomposition, which decomposes a complex query into explicit reasoning steps, and (2) Query-aware Document Compression, which synthesizes dispersed evidence from candidate documents into a concise summary relevant to the query. This structured design ensures that the final query representation remains both robust and comprehensive. Experimental results demonstrate that, despite utilizing a relatively small LLM, DeCoR outperforms strong baselines that rely on larger models. This finding underscores that, in complex retrieval scenarios, sophisticatedly leveraging the reasoning and summarization capabilities of LLMs offers a more efficient and effective solution than relying solely on their generative capability.

연구 동기 및 목표

다단계 검색에서 생성적 확장이 노이즈를 도입하는 강건한 쿼리 확장의 필요성을 제시한다.
새로운 내용을 생성하기보다는 기존 정보를 다듬기 위해 DeCoR를 제안한다.
구조화된 보정이 있는 소형 LLM이 다중단 IR에서 더 큰 베이스라인보다 성능이 우수하다는 것을 보인다.

제안 방법

두 가지 핵심 구성요소를 소개한다: 쿼리 분해(Query Decomposition)와 쿼리 인지 문서 압축(Query-aware Document Compression).
효율성을 위해 서브쿼리별로 BM25를 사용하여 검색을 수행한다.
후보 문서를 연결하고 전역적 중요성, 문서 간 증거 통합, 의미 기반 중복 제거를 사용하는 LLM으로 압축한다.
원래 쿼리와 서브쿼리+압축문서 쌍의 임베딩 평균으로 최종 쿼리를 확장한다.
확장된 쿼리 임베딩과 문서 임베딩 간의 코사인 유사도로 문서를 순위매김한다.

실험 결과

연구 질문

RQ1구조화된 정보 보정이 분해와 압축을 통해 다단계 검색에서 생성적 쿼리 확장을 능가할 수 있는가?
RQ2분해와 압축을 갖춘 더 작은 LLM을 사용하면 정확도를 희생하지 않으면서 효율성을 얻을 수 있는가?
RQ3DeCoR 구성요소의 제거가 검색 성능에 어떤 영향을 미치는가?
RQ4다른 임베딩 전략이 최종 검색 품질에 미치는 영향은 어떠한가?

주요 결과

방법	Hits@10	Hits@4	MAP@10	MARR@10
Contriever (Baseline)	62.75	48.43	17.98	40.57
Contriever + DeCoR	64.48	50.91	20.07	44.60
e5-base-v2 (Baseline)	69.05	53.61	19.60	44.55
e5-base-v2 + DeCoR	72.42	59.42	22.66	51.95
bge-large-en-v1.5 (Baseline)	68.96	54.63	19.97	45.20
bge-large-en-v1.5 + DeCoR	72.06	58.23	22.70	51.39

DeCoR은 세 가지 기본 검색기에서 Baselines 및 다른 확장들보다 Hits@10, Hits@4, MAP@10, MARR@10를 일관되게 향상시킨다.
e5-base-v2를 사용하면 DeCoR은 Hits@10=72.42, Hits@4=59.42, MAP@10=22.66, MARR@10=51.95를 달성한다.
특정 구성요소를 제거하면 성능이 저하되며, 쿼리 분해는 다양성과 포괄성에 현저한 기여를 한다.
연결-압축 후 임베딩 평균 방식이 문서별 압축, 단순 연결과 같은 대안보다 더 우수하다.
비교적 작은 모델(Qwen2.5-7B)을 사용하는 DeCoR은 다중단 IR에서 더 큰 생성 기반 기준모델(GPT-3.5 등)보다 우수할 수 있다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.