QUICK REVIEW

[논문 리뷰] Accelerating Diffusion via Hybrid Data-Pipeline Parallelism Based on Conditional Guidance Scheduling

Euisoo Jung, Byunghyun Kim|arXiv (Cornell University)|2026. 02. 25.

Generative Adversarial Networks and Image Synthesis인용 수 0

한 줄 요약

본 논문은 조건 기반 데이터 파이프라인 분할과 노이즈 제거 불일치를 가이드로 하는 적응형 병렬화 전환을 결합한 확산 추론용 하이브리드 데이터 파이프라인 병렬화를 제안하여, U-Net 및 DiT 아키텍처 전반에서 이미지 품질을 유지하면서 지연 시간을 상당히 줄이는 것을 보여준다.

ABSTRACT

Diffusion models have achieved remarkable progress in high-fidelity image, video, and audio generation, yet inference remains computationally expensive. Nevertheless, current diffusion acceleration methods based on distributed parallelism suffer from noticeable generation artifacts and fail to achieve substantial acceleration proportional to the number of GPUs. Therefore, we propose a hybrid parallelism framework that combines a novel data parallel strategy, condition-based partitioning, with an optimal pipeline scheduling method, adaptive parallelism switching, to reduce generation latency and achieve high generation quality in conditional diffusion models. The key ideas are to (i) leverage the conditional and unconditional denoising paths as a new data-partitioning perspective and (ii) adaptively enable optimal pipeline parallelism according to the denoising discrepancy between these two paths. Our framework achieves $2.31 imes$ and $2.07 imes$ latency reductions on SDXL and SD3, respectively, using two NVIDIA RTX~3090 GPUs, while preserving image quality. This result confirms the generality of our approach across U-Net-based diffusion models and DiT-based flow-matching architectures. Our approach also outperforms existing methods in acceleration under high-resolution synthesis settings. Code is available at https://github.com/kaist-dmlab/Hybridiff.

연구 동기 및 목표

재학습 없이 확산 추론을 가속하고 품질 저하 없이 달성하려는 동기를 제시한다.
패치 기반 데이터 병렬성 및 정적 파이프라인 병렬성의 한계를 다룬다.
조건 가이던스와 적응형 스케줄링 메커니즘을 활용한 이중 경로 데이터 분할 방식을 제안한다.
아키텍처(U-Net 및 DiT)와 고해상도 설정에서의 견고함을 입증한다.

제안 방법

전역 일관성을 향상시키기 위해 다수의 GPU에서 조건부 및 비조건부 확산 경로를 처리하는 조건 기반 분할을 도입한다.
조건부 예측과 비조건부 예측 간의 차이를 정량화하기 위해 denoising discrepancy 메트릭(rel-MAE_t(ε_c, ε_u))를 정의한다.
추론을 세 단계(Warm-Up, Parallelism, Fully-Connecting)로 분할하고 discrepancy에 따라 직렬 실행과 병렬 실행 간에 전환한다.
계산된 discrepancy와 안전 상한을 사용하여 노이즈 제거 중 자동으로 전환점 τ1 및 τ2를 결정하여 적응형 하이브리드 병렬화를 가능하게 한다.
Score 분해를 통한 discrepancy의 이론적 해석을 제공하고, 조건 정보의 강도와 비조건적 데이터 사전 간의 관계를 관련짓는다.
배치 수준 또는 레이어 수준 확장을 통해 더 많은 GPU로의 확장성을 보인다.

Figure 2 : Comparison of parallel strategies for diffusion inference. (a) Patch-based data parallel frameworks suffer from bottlenecks caused by all-gather operations and artifacts at patch boundaries, leading to limited acceleration and quality degradation. (b) Pipeline parallel frameworks incur ex

실험 결과

연구 질문

RQ1확산 추론에서 조건 기반 분할이 경계 인공물(boundary artifacts)을 감소시키면서 전역 이미지 일관성을 유지할 수 있는가?
RQ2denoising discrepancy로 안내되는 적응형 전환이 생성 품질을 저하시키지 않으면서 속도 향상을 개선하는가?
RQ3다양한 확산 백본(U-Net, DiT)과 고해상도 합성에 걸쳐 이 접근법의 일반성은 어느 정도인가?
RQ4실제에서 병렬성 간격 k(또는 τ1, τ2)를 변화시킬 때의 속도-정확도 트레이드오프는 무엇인가?

주요 결과

두 GPU에서 SDXL에서 2.31×, SD3에서 2.07×의 대기 시간 감소를 달성하고 이미지 충실도를 보존한다.
속도-정확도 트레이드오프에서 기존 분산 추론 방법을 능가하고 커뮤니케이션 비용을 크게 줄인다.
아키텍처(U-Net, DiT) 및 고해상도 합성 작업에서의 견고함을 입증한다.
변수 실험에서 하이브리드 프레임워크(조건 기반 + 적응형 전환)가 순수 조건 기반 분할만을 사용하는 것보다 더 우수함을 보여준다.
H200 GPU가 포함된 SDXL의 고해상도 실험에서 1024×1024에서 2560×2560까지 일관된 속도 향상을 보인다.

Figure 3 : Overview of the proposed diffusion inference hybrid parallel framework. Our method adaptively switches parallelism modes at $\tau_{1}$ and $\tau_{2}$ , optimizing the trade-off between computational efficiency and consistency of conditional guidance, and demonstrates superior inference ac

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.