QUICK REVIEW

[论文解读] Can We Trust LLMs on Memristors? Diving into Reasoning Ability under Non-Ideality

Taiqiang Wu, Yuxin Cheng|arXiv (Cornell University)|Mar 14, 2026

Advanced Memory and Neural Computing被引用 0

一句话总结

直接回答摘要：论文分析模拟 memristor 非理想性在类LLM推理中的影响，并评估不需要训练的缓解策略，建议采用浅层层次冗余和针对模式的策略。

ABSTRACT

Memristor-based analog compute-in-memory (CIM) architectures provide a promising substrate for the efficient deployment of Large Language Models (LLMs), owing to superior energy efficiency and computational density. However, these architectures suffer from precision issues caused by intrinsic non-idealities of memristors. In this paper, we first conduct a comprehensive investigation into the impact of such typical non-idealities on LLM reasoning. Empirical results indicate that reasoning capability decreases significantly but varies for distinct benchmarks. Subsequently, we systematically appraise three training-free strategies, including thinking mode, in-context learning, and module redundancy. We thus summarize valuable guidelines, i.e., shallow layer redundancy is particularly effective for improving robustness, thinking mode performs better under low noise levels but degrades at higher noise, and in-context learning reduces output length with a slight performance trade-off. Our findings offer new insights into LLM reasoning under non-ideality and practical strategies to improve robustness.

研究动机与目标

评估典型 memristor 非理想性对在基准测试（IFEval, GPQA-Diamond, MATH-500）中LLM推理准确性的影响。
系统性评估训练无关缓解策略（思维模式、上下文学习、模块冗余）在非理想性下的鲁棒性。
为在基于 memristor 的 CIM 硬件上部署 LLM 提供实用指南。

提出的方法

将 memristor 非理想性在 LLM 推理时以分块高斯噪声和“卡死”故障对权重矩阵进行仿真。
在 varying noise levels（σ ∈ {0.005,0.01,0.015,0.02}）下评估三个基准（IFEval, GPQA-Diamond, MATH-500）的推理。
测试训练无关策略：思维模式、上下文学习（ICL）、以及模块冗余（重复模块/层）。
使用 CIM 能耗模型和 Opencompass 框架在不同硬件情景下分析能耗/面积影响。
对卡死故障率（p）及应冗余复制的模型层/模块进行消融研究。

实验结果

研究问题

RQ1memristor 非理想性如何影响在不同基准上的LLM推理性能？
RQ2训练无关策略是否能缓解非理想性引起的降级，在什么条件下有效？
RQ3在 memristor 基 CIM 部署 LLM 时，关于噪声容忍、能耗和体系结构冗余有哪些实用指南？

主要发现

随着 memristor 噪声增大，推理性能下降，且在高噪声水平下不稳定性更大。
数学推理（MATH-500）对权重扰动特别脆弱，在高噪声下表现出显著的性能下降和输出增多。
微小噪声有时可改善部分指标（如在 σ=0.005 时 IFEval Pass@8），提示在极低噪声下存在潜在正则化效应。
思维模式在低至中等噪声（σ ≤ 0.01）下鲁棒，但在高噪声（σ > 0.015）时崩溃，原因是模式坍塌和输出冗长、无结构性。
上下文学习在输出长度上更短，但在各噪声水平下普遍落后于 vanilla 基线；在极高噪声时仅因输出更短而带来能耗收益。
模块冗余通常提升鲁棒性；浅层冗余（前几层）尤为有效，深层冗余可能带来能耗/面积惩罚并降低性能。
一种实用的浅层冗余策略（将前1/4层重复四次，称为“Shallow (4 ×)”）在性能与能耗之间取得良好平衡，即使对于更大的 LLM（Qwen3 1.7B、Llama 3.2 1B）。
对于 Qwen3 1.7B 和 Llama 3.2 1B，Shallow (4 ×) 能显著恢复 MATH-500 和 IFEval 的性能，同时大幅降低能耗（如 σ=0.02 时 Qwen3 1.7B 的能耗降至 0.3J，基线为 7.2J）。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。