QUICK REVIEW

[論文レビュー] CacheMind: From Miss Rates to Why -- Natural-Language, Trace-Grounded Reasoning for Cache Replacement

Kaushal Mhapsekar, Azam Ghanbari|arXiv (Cornell University)|Feb 12, 2026

Parallel Computing and Optimization Techniques被引用数 0

ひとこと要約

CacheMind は、イベントごとのトレーススライスと自然言語クエリに grounded なキャッシュ置換分析を行う会話型・Retrieval-augmented システムであり、新しい CacheMindBench ベンチマークによって検証される。

ABSTRACT

Cache replacement remains a challenging problem in CPU microarchitecture, often addressed using hand-crafted heuristics, limiting cache performance. Cache data analysis requires parsing millions of trace entries with manual filtering, making the process slow and non-interactive. To address this, we introduce CacheMind, a conversational tool that uses Retrieval-Augmented Generation (RAG) and Large Language Models (LLMs) to enable semantic reasoning over cache traces. Architects can now ask natural language questions like, "Why is the memory access associated with PC X causing more evictions?", and receive trace-grounded, human-readable answers linked to program semantics for the first time. To evaluate CacheMind, we present CacheMindBench, the first verified benchmark suite for LLM-based reasoning for the cache replacement problem. Using the SIEVE retriever, CacheMind achieves 66.67% on 75 unseen trace-grounded questions and 84.80% on 25 unseen policy-specific reasoning tasks; with RANGER, it achieves 89.33% and 64.80% on the same evaluations. Additionally, with RANGER, CacheMind achieves 100% accuracy on 4 out of 6 categories in the trace-grounded tier of CacheMindBench. Compared to LlamaIndex (10% retrieval success), SIEVE achieves 60% and RANGER achieves 90%, demonstrating that existing Retrieval-Augmented Generation (RAGs) are insufficient for precise, trace-grounded microarchitectural reasoning. We provided four concrete actionable insights derived using CacheMind, wherein bypassing use case improved cache hit rate by 7.66% and speedup by 2.04%, software fix use case gives speedup of 76%, and Mockingjay replacement policy use case gives speedup of 0.7%; showing the utility of CacheMind on non-trivial queries that require a natural-language interface.

研究の動機と目的

固定 miss-rate 指標を超えた対話的で説明可能なキャッシュ置換分析を動機付ける。
数百万のトレースイベントに対して、意味的・ per-PC および per-address の質問を可能にする。
マイクロアーキテクチャ的文脈での LLM ベース推論を評価するための検証済みベンチマーク・スイートを提供する。
ポリシーとワークロードの相互作用について、トレース grounded な説明を生む retrieval-augmented 推論を実証する。

提案手法

CacheMind を導入する。これはデュアルリトリーバ（Sieve と Ranger）と、トレース grounded な説明を生成するジェネレータ LLM から成る。
Sieve は ChampSim トレースからタスク固有のトレーススライスを抽出するための記号的・意味的フィルタリングを実行する。
Ranger は自然言語クエリを外部トレースデータベースに対して実行可能なリトリーバルコードへ翻訳する。
Retrieval-Augmented Generation (RAG) を用いて LLM の出力を retrieved トレース証拠で grounding する。
CacheMindBench を開発する。トレース上の事実・比較・算術・意味推論を網羅する100問ベンチマーク。

Figure 1 . The method filters raw traces to a task-specific slice and returns the most informative evidence for the user’s query. Old ChampSim could tell you a miss; CacheMind shows which PC missed on which data, under which policy, and why, for every event, acting as a microarchitectural microscope

実験結果

リサーチクエスチョン

RQ1会話型でトレース grounded なシステムは、 verifiable evidence を持つイベントごと・PCごとのキャッシュ質問に回答できるのか。
RQ2symbolic-semantic と LLM ベースのリトリーバルは、キャッシュ分析の精度と柔軟性においてどう比較されるのか。
RQ3LLM 推論をトレースデータで grounding することは、精度と信頼性にどんな影響を与えるのか。
RQ4 bypass 推定、ソフトウェア修正、ポリシー設計のために、トレース grounded な推論からどんな実用的洞察を得られるのか。

主な発見

CacheMind は Sieve リトリーバーで 75 件の未見トレース grounded 質問に対して 66.67% の正確度、未見ポリシー固有推論タスクに対して 25 件で 84.80% の正確度を達成。
Ranger を用いた場合、同じ評価でそれぞれ 89.33% と 64.80% を達成し、トレース grounded カテゴリのうち 6 語のうち 4 で 100% の正確性を獲得。
CacheMindBench は評価設定で LlamaIndex より 9 倍高いリトリーバル正確度を示す。
ベンチマーク全体で、CacheMind は bypass 関連のヒット率向上 7.66%、IPC 改善 2.04%、ソフトウェア修正による高速化 76%、Mockingjay RDP コンテキストでの 0.7% のスピードアップといった実用的洞察を提供。
推論機能を備えたトレース grounded な分析は、キャッシュポリシー評価のための従来の固定指標レポートを上回ることを示している。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。