QUICK REVIEW

[논문 리뷰] HiSAC: Hierarchical Sparse Activation Compression for Ultra-long Sequence Modeling in Recommenders

Kun Yuan, Junyu Bi|arXiv (Cornell University)|2026. 02. 24.

Recommender Systems and Techniques인용 수 0

한 줄 요약

HiSAC는 계층적 희소 활성화와 소프트-라우팅 어텐션을 도입하여 초장기 사용자 시퀀스를 압축하고, 개인화된 관심 에이전트를 생성하며 Taobao 배포에서 CTR 상승 1.65%를 달성한다.

ABSTRACT

Modern recommender systems leverage ultra-long user behavior sequences to capture dynamic preferences, but end-to-end modeling is infeasible in production due to latency and memory constraints. While summarizing history via interest centers offers a practical alternative, existing methods struggle to (1) identify user-specific centers at appropriate granularity and (2) accurately assign behaviors, leading to quantization errors and loss of long-tail preferences. To alleviate these issues, we propose Hierarchical Sparse Activation Compression (HiSAC), an efficient framework for personalized sequence modeling. HiSAC encodes interactions into multi-level semantic IDs and constructs a global hierarchical codebook. A hierarchical voting mechanism sparsely activates personalized interest-agents as fine-grained preference centers. Guided by these agents, Soft-Routing Attention aggregates historical signals in semantic space, weighting by similarity to minimize quantization error and retain long-tail behaviors. Deployed on Taobao's "Guess What You Like" homepage, HiSAC achieves significant compression and cost reduction, with online A/B tests showing a consistent 1.65% CTR uplift -- demonstrating its scalability and real-world effectiveness.

연구 동기 및 목표

생산 환경에서 초장기 사용자 행동 시퀀스의 효율적인 모델링을 촉진한다.
정량화 오류를 줄이기 위해 개인화된 다중 수준의 사용자 관심 표현을 개발한다.
희소하게 사용자 맞춤 관심 에이전트를 활성화하는 계층적 투표 메커니즘을 도입한다.
역사를 압축하면서 긴 꼬리 선호를 보존하기 위해 소프트-라우팅 어텐션을 제안한다.
지연 시간 감소와 CTR 상승에 대한 산업적 배치를 시연한다.

제안 방법

다중 모달 인코더와 Residual Quantized VAE (RQ-VAE)를 통해 기록을 다단 계층 의미 식별자로 토큰화한다.
전역 다단 계층 의미 트리를 구성하고 계층적 투표를 적용하여 사용자 맞춤 관심 에이전트를 희소하게 활성화한다.
소프트-라우팅 어텐션을 사용하여 편향되지 않은 의미 유사성으로 과거 신호를 집계하고, 의미 프로토타입과 랭킹 임베딩을 결합한다.
의미 임베딩(라우팅용으로 고정)과 학습 가능한 랭킹 임베딩(집계용)을 분리한다.
오프라인 관심-에이전트 구성 및 요청-단위 압축을 배치하여 온라인 지연 및 계산 비용을 감소시킨다.
캐싱 전략과 40%+ 엔드투엔드 지연 감소를 포함하는 엔드-투-엔드 개선을 달성한다.

실험 결과

연구 질문

RQ1사용자 간 변수 그레인으로 사용자 맞춤 관심 중심을 어떻게 정확하게 식별할 수 있는가?
RQ2장코드 히스토리 행동을 관심 중심에 할당하면서 긴 꼬리 신호를 보존하고 양자화 오차를 최소화하려면 어떻게 해야 하는가?
RQ3계층적 희소 활성화와 소프트-라우팅 어텐션이 추천 품질을 손상시키지 않으면서 초장기 시퀀스를 위한 확장 가능하고 산업 등급의 압축을 제공할 수 있는가?
RQ4현실 세계의 대규모 추천 시스템에서 HiSAC의 배치 시사점과 지연 시간/비용 이점은 무엇인가?

주요 결과

HiSAC는 산업 배치에서 상당한 압축 및 비용 감소를 달성한다.
온라인 A/B 시험은 기존의 가장 강력한 압축 방법 대비 일관되게 1.65% CTR 상승을 보인다.
계층적 투표는 에이전트 수를 대략 2/3 줄이면서 예측 성능의 손실도 미미하다.
소프트-라우팅 어텐션은 긴 꼬리 고객 관심을 보존하고 양자화 오차를 줄이는 데 도움을 준다.
의미 임베딩과 랭킹 임베딩의 분리는 코드북 편향을 방지하고 관심 다양성을 보존한다.
오프라인 관심-에이전트 구성 및 요청-단위 캐시가 상당한 지연 개선을 가져오며 정확도에 해를 주지 않으면서 엔드-투-엔드 지연을 크게 개선한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.