QUICK REVIEW

[论文解读] Detecting Cryptographically Relevant Software Packages with Collaborative LLMs

Eduard Hirsch, Kristina Raab|arXiv (Cornell University)|Mar 7, 2026

Advanced Malware Detection Techniques被引用 0

一句话总结

本论文提出一种离线的协作式大模型框架，使用对多個本地模型的多数投票来 heuristically 识别与密码学相关的软件包，在大约 65k 个 Fedora 软件包上进行评估。

ABSTRACT

IT systems are facing an increasing number of security threats, including advanced persistent attacks and future quantum-computing vulnerabilities. The move towards crypto-agility and post-quantum cryptography (PQC) requires a reliable inventory of cryptographic assets across heterogeneous IT environments. Due to the sheer amount of packets, it is infeasible to manually detect cryptographically relevant software. Further, static code analysis pipelines often fail to address the diversity of modern ecosystems. Our research explores the use of large language models (LLMs) as heuristic tools for cryptographic asset discovery. We propose a collaborative framework that employs multiple LLMs to assess software relevance and aggregates their outputs through majority voting. To preserve data privacy, the approach operates on-premises without reliance on external servers. Using over 65,000 Fedora Linux packages, we evaluate the reliability of this method through statistical analysis, inter-model agreement, and manual validation. Preliminary results suggest that~LLM ensembles can serve as an efficient first-pass filter for identifying cryptographic software, resulting in reduced manual workload and assisting PQC transition. The study also compares on-premises and online LLM configurations, highlighting key advantages, limitations, and future directions for automated cryptographic asset discovery.

研究动机与目标

识别大型语言模型如何启发性地检测软件包中的密码学功能。
评估将多個本地托管的大模型聚合是否能提升检测质量。
展示一个适用于企业环境中密码资产发现的离线工作流。
提供可复现的密码学资产发现的指南与开源产物。

提出的方法

从包管理器（Fedora）收集基线软件包列表，包含名称、描述和一级依赖关系。
用精心设计的 JSON 输出提示对多个本地大模型进行提示，以评估每个软件包的密码学相关性。
通过多数投票机制聚合大模型输出，产生最终的密码学相关性判定。
通过带标签的样本和交叉验证验证模型选择与多数投票结果。
将离线大模型配置与在线配置进行对比，分析响应质量与依赖关系。

实验结果

研究问题

RQ1RQ1 如何利用大模型在启发性层面对实现或依赖密码学功能的软件包进行识别？
RQ2RQ2 在多个大模型之间的聚合是否能提升对密码学相关性判断的质量？

主要发现

model	size	valid	invalid	error-rate
phi	2.1 GB	65,222	72	0.11%
deepseek	5.2 GB	65,199	95	0.15%
llama	4.4 GB	65,094	200	0.31%
mistral	3.9 GB	64,974	320	0.49%
gpt4all	6.9 GB	64,157	1,137	1.74%
agg	—	63,529	1,765	2.70%

大模型集合可以作为识别密码学软件资产的高效初筛。
在五个本地模型上的多数投票策略对软件包的分类具有鲁棒性。
对 390 个软件包的人工验证支持迭代改进和模型选择。
该研究提供开源代码和数据，便于可复现性与进一步研究。
离线（本地部署）的大模型设置展示了面向 PQC 相关资产发现的实际可行性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。