QUICK REVIEW

[논문 리뷰] Rethinking Inter-Process Communication with Memory Operation Offloading

Misun Park, Richi Dubey|arXiv (Cornell University)|2026. 01. 09.

Parallel Computing and Optimization Techniques인용 수 0

한 줄 요약

Rocket은 하드웨어- 및 소프트웨어 기반 메모리 오프로딩을 공유 메모리 IPC에 결합한 IPC 런타임으로, 명령 수를 줄이고 데이터 집약적 노드 내 워크로드의 처리량과 대기 시간을 개선합니다.

ABSTRACT

As multimodal and AI-driven services exchange hundreds of megabytes per request, existing IPC runtimes spend a growing share of CPU cycles on memory copies. Although both hardware and software mechanisms are exploring memory offloading, current IPC stacks lack a unified runtime model to coordinate them effectively. This paper presents a unified IPC runtime suite that integrates both hardware- and software-based memory offloading into shared-memory communication. The system characterizes the interaction between offload strategies and IPC execution, including synchronization, cache visibility, and concurrency, and introduces multiple IPC modes that balance throughput, latency, and CPU efficiency. Through asynchronous pipelining, selective cache injection, and hybrid coordination, the system turns offloading from a device-specific feature into a general system capability. Evaluations on real-world workloads show instruction count reductions of up to 22%, throughput improvements of up to 2.1x, and latency reductions of up to 72%, demonstrating that coordinated IPC offloading can deliver tangible end-to-end efficiency gains in modern data-intensive systems.

연구 동기 및 목표

멀티모달/AI 워크로드에서 증가하는 데이터 이동으로 인해 메모리 오프로드를 고려한 IPC의 필요성을 제시한다.
하드웨어 메모리 오프로딩(예: Intel DSA)가 IPC 런타임 및 캐시 동작에 어떻게 상호작용하는지 조사한다.
오프로드 전략과 IPC 실행을 조정하여 효율성을 높이는 소프트웨어 런타임(Rocket)을 설계한다.
실제 워크로드에서 Rocket을 평가하여 데이터 집약적 파이프라인의 엔드-투-엔드 이익을 정량화한다.

제안 방법

IPC에서 하드웨어 보조 메모리 오프로딩의 시스템 수준 병목 현상(캐시, 동기화, 페이지 결함)을 특성화한다.
공유 메모리 IPC 프로토콜, 비동기 배칭, 그리고 CPU-DSA 중첩을 갖춘 Rocket을 설계한다.
구성 가능한 실행 모드(sync, async, pipelined)와 캐시 주입 옵션을 제공한다.
지연과 CPU 오버헤드의 균형을 맞추기 위해 하이브리드 폴링 전략(UMWAIT + 크기 인지 지연)을 사용한다.
페이지 결함을 피하고 DSA 전송을 가능하게 하기 위해 지속 가능한 공유 메모리 영역을 재사용한다.
고수준 API를 사용한 오프로드 결정으로 대표 워크로드에 대해 Intel DSA 지원 하드웨어에서 Rocket을 평가한다.

실험 결과

연구 질문

RQ1공유 메모리 파이프라인에서 오프로드 전략이 IPC 실행과 어떻게 상호작용하는가?
RQ2IPC에서 오프로드 효율성을 결정하는 핵심 병목 현상(캐시, 동기화, 페이지 결함)은 무엇인가?
RQ3구성 가능한 IPC 런타임이 과도한 CPU 사용 없이 더 낮은 레이턴시와 더 높은 처리량을 달성하도록 하드웨어 오프로드를 조정할 수 있는가?
RQ4데이터 집약적 IPC 워크로드에 대한 엔드-투-엔드 이익을 가져오는 실용적인 설계 선택들(모드, 캐시 주입, 배칭)은 무엇인가?

주요 결과

Rocket은 명령 수를 최대 22%까지 줄인다.
Rocket은 CPU 기준선 대비 처리량을 최대 2.1배 증가시킨다.
Rocket은 데이터 집약적 IPC 워크로드에서 지연 시간을 최대 72% 감소시킨다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.