QUICK REVIEW

[论文解读] How Vulnerable Are Edge LLMs?

Ao Ding, Hongzong Li|arXiv (Cornell University)|Mar 25, 2026

Adversarial Robustness in Machine Learning被引用 0

一句话总结

本文显示边部署的大模型量化并不阻止基于查询的知识提取，并提出 CLIQ——一个簇化指令查询框架，在有限查询预算下提高提取效率，在量化的 Qwen 模型上得到证明。

ABSTRACT

Large language models (LLMs) are increasingly deployed on edge devices under strict computation and quantization constraints, yet their security implications remain unclear. We study query-based knowledge extraction from quantized edge-deployed LLMs under realistic query budgets and show that, although quantization introduces noise, it does not remove the underlying semantic knowledge, allowing substantial behavioral recovery through carefully designed queries. To systematically analyze this risk, we propose extbf{CLIQ} ( extbf{Cl}ustered extbf{I}nstruction extbf{Q}uerying), a structured query construction framework that improves semantic coverage while reducing redundancy. Experiments on quantized Qwen models (INT8/INT4) demonstrate that CLIQ consistently outperforms original queries across BERTScore, BLEU, and ROUGE, enabling more efficient extraction under limited budgets. These results indicate that quantization alone does not provide effective protection against query-based extraction, highlighting a previously underexplored security risk in edge-deployed LLMs.

研究动机与目标

评估在现实查询预算下量化的边缘部署大模型是否会泄露行为知识。
开发一个结构化查询框架，以最大化语义覆盖并最小化冗余。
演示簇基指令查询（CLIQ）在有限查询下的高效提取效果。
在不同量化等级和模型规模下评估提取效率。
提供关于边 on-device LLM 部署的安全性影响与防护措施的见解。

提出的方法

提出 CLIQ（Clustered Instruction Querying）将候选指令查询组织成语义簇。
使用句子嵌入和 MiniBatchKMeans 将查询聚类，创建语义区域。
通过对强大 LLM 的簇条件提示，生成簇感知的代表性查询。
在查询响应对上训练学生模型，以量化信息泄露和模型行为再现。
在固定查询预算（例如 1000 次查询）下，将 CLIQ 与原始查询在 INT8/INT4 量化的教师与学生模型上进行比较。
使用包括 BERTScore、BLEU 和 ROUGE 的评估指标来评估提取质量。

Figure 1: Overview of the proposed framework for query-based knowledge extraction from edge-deployed quantized LLMs. Previous approaches (blue) rely on unstructured queries, which often lead to redundant probing and noisy responses, resulting in low-fidelity reconstruction of model behavior. CLIQ (r

实验结果

研究问题

RQ1量化的边缘 LLM 是否能够通过有限的、基于查询的交互保留可提取的语义知识？
RQ2与天真查询相比，结构化查询构建在边缘部署约束下是否提高了提取效率？
RQ3不同的量化等级（INT8 vs INT4）如何影响通过查询学习边缘模型行为的能力？
RQ4簇感知查询对重建质量和样本效率有何影响？

主要发现

方法	BERT-F1	BLEU	RLsum
Original Queries	77.97	1.05	13.37
CLIQ (Ours)	84.35	2.77	17.50

在相同查询预算下，CLIQ 在 BERT-F1、BLEU 和 ROUGE 指标上持续优于原始查询策略。
一个 1.7B INT8 量化的学生模型经由 CLIQ 蒸馏后，达到或超过更大教师的性能，体现了通过结构化查询实现高效知识转移。
量化对教师性能有边际降级，但结构化查询在提取行为方面仍然有效。
在固定预算（如 500 次查询）下，CLIQ 的 BERT-F1、BLEU、ROUGE-L 相较于 Original Queries 更高，且提升更快、达到饱和更早。
在 CLIQ 下，提取效率在查询量从 100 提升到 300 时迅速提升，超过该点后收益递减，显示高样本效率。

Figure 2: Threat framework for query-based knowledge extraction from quantized edge-deployed LLMs. Traditional extraction settings (top) assume full-precision teacher models in high-performance server environments, where abundant compute allows large-scale query probing. In contrast, edge-deployed L

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。