QUICK REVIEW

[論文レビュー] Rethinking Inter-Process Communication with Memory Operation Offloading

Misun Park, Richi Dubey|arXiv (Cornell University)|Jan 9, 2026

Parallel Computing and Optimization Techniques被引用数 0

ひとこと要約

Rocketはハードウェアとソフトウェアベースのメモリオフロードを共有メモリIPCに統合し、データ重めのノード内ワークロードに対して命令数を削減し、スループットとレイテンシを改善します.

ABSTRACT

As multimodal and AI-driven services exchange hundreds of megabytes per request, existing IPC runtimes spend a growing share of CPU cycles on memory copies. Although both hardware and software mechanisms are exploring memory offloading, current IPC stacks lack a unified runtime model to coordinate them effectively. This paper presents a unified IPC runtime suite that integrates both hardware- and software-based memory offloading into shared-memory communication. The system characterizes the interaction between offload strategies and IPC execution, including synchronization, cache visibility, and concurrency, and introduces multiple IPC modes that balance throughput, latency, and CPU efficiency. Through asynchronous pipelining, selective cache injection, and hybrid coordination, the system turns offloading from a device-specific feature into a general system capability. Evaluations on real-world workloads show instruction count reductions of up to 22%, throughput improvements of up to 2.1x, and latency reductions of up to 72%, demonstrating that coordinated IPC offloading can deliver tangible end-to-end efficiency gains in modern data-intensive systems.

研究の動機と目的

multimodal/AIワークロードにおけるデータ移動の増加に伴うメモリオフロード対応IPCの必要性を動機づける。
ハードウェアメモリオフロード（例：Intel DSA）がIPCランタイムとキャッシュ挙動にどう作用するかを調査する。
オフロード戦略とIPC実行を協調させるソフトウェアランタイム（Rocket）を設計し、効率を向上させる。
データ集約型パイプラインにおけるエンドツーエンドの利得を定量化するため、実ワークロードでRocketを評価する。

提案手法

IPCにおけるハードウェア支援メモリオフロードのシステムレベルのボトルネック（キャッシュ、同期、ページフォールト）を特徴づける。
共有メモリIPCプロトコル、非同期バッチ処理、およびCPU-DSAオーバーラップを備えたRocketを設計する。
同期、非同期、パイプライン化した実行モードとキャッシュ注入オプションを提供する。
待機とサイズ意識された遅延回避を組み合わせたハイブリッドポーリング戦略を用いてレイテンシとCPUオーバーヘッドのバランスを取る。
ページフォールトを避け、DSA転送を可能にするために永続的な共有メモリ領域を再利用する。
代表的なワークロードでIntel DSA対応ハードウェア上のRocketを評価し、オフロード決定のハイレベルAPIを用いる。

実験結果

リサーチクエスチョン

RQ1共有メモリパイプラインにおけるオフロード戦略はIPC実行とどのように相互作用するか。
RQ2IPCにおけるオフロードの有効性を決定づける主なボトルネック（キャッシュ、同期、ページフォールト）は何か。
RQ3設定可能なIPCランタイムが過剰なCPU使用量を伴わずにハードウェアオフロードを統合して遅延を低減し、スループットを高めることができるか。
RQ4データ集約型IPCワークロードのエンドツーエンドの利得を生む実践的な設計選択肢（モード、キャッシュ注入、バッチ処理）は何か。

主な発見

Rocketは命令数を最大22%削減する。
RocketはCPUベースラインと比較してスループットを最大2.1倍向上させる。
Rocketはデータ集約型IPCワークロードのレイテンシを最大72%低減する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。