QUICK REVIEW

[论文解读] Automated Dynamic Analysis of Ransomware: Benefits, Limitations and use for Detection

Daniele Sgandurra, Luis Muñoz-González|arXiv (Cornell University)|Sep 10, 2016

Advanced Malware Detection Techniques参考文献 14被引用 143

一句话总结

EldeRan 使用动态沙箱分析、通过互信息进行特征选择，以及正则化逻辑回归来高精度检测勒索软件并在无需完整家族集合的情况下识别新的变体。

ABSTRACT

Recent statistics show that in 2015 more than 140 millions new malware samples have been found. Among these, a large portion is due to ransomware, the class of malware whose specific goal is to render the victim's system unusable, in particular by encrypting important files, and then ask the user to pay a ransom to revert the damage. Several ransomware include sophisticated packing techniques, and are hence difficult to statically analyse. We present EldeRan, a machine learning approach for dynamically analysing and classifying ransomware. EldeRan monitors a set of actions performed by applications in their first phases of installation checking for characteristics signs of ransomware. Our tests over a dataset of 582 ransomware belonging to 11 families, and with 942 goodware applications, show that EldeRan achieves an area under the ROC curve of 0.995. Furthermore, EldeRan works without requiring that an entire ransomware family is available beforehand. These results suggest that dynamic analysis can support ransomware detection, since ransomware samples exhibit a set of characteristic features at run-time that are common across families, and that helps the early detection of new variants. We also outline some limitations of dynamic analysis for ransomware and propose possible solutions.

研究动机与目标

评估是否可以通过分析动态行为特征来及早识别勒索软件。
确定用于勒索软件检测的最具信息量的动态特征。
在该任务中比较正则化逻辑回归、SVM 和朴素贝叶斯的性能。
评估 EldeRan 检测新勒索软件家族的能力并与 VirusTotal 进行比较。

提出的方法

在沙箱（Cuckoo Sandbox）中对样本进行动态分析以收集特征：Windows API 调用、注册表操作、文件系统操作、按文件扩展名的操作、目录操作、放置的文件和字符串。
使用互信息进行特征选择，从大量特征中挑选最具判别性的特征。
使用正则化逻辑回归（L2 正则化）进行分类，采用批量梯度下降和交叉熵损失进行训练。
在用户个人计算机上使用已训练的分类器进行在线实时检测，离线在沙箱数据集上进行训练。
数据集组成：582 个勒索软件样本，跨 11 个家族，及 942 个良性软件样本；在 Windows XP SP2 沙箱中对每个样本分析 30 秒；特征通过 MI 降至前 400 个。

实验结果

研究问题

RQ1是否可以使用在早期安装阶段收集的有限动态特征来准确检测勒索软件？
RQ2在此勒索软件检测任务中，正则化逻辑回归与 SVM 和朴素贝叶斯的比较如何？
RQ3该方法是否能够在无需完整家族可用性的情况下检测到新的、未见过的勒索软件家族？
RQ4在检测勒索软件方面，EldeRan 相对于 VirusTotal 标签的表现如何？

主要发现

对 ransomware 与 goodware 数据集的 AUC 为 0.995。
EldeRan 的平均错误率为 2.4%，而 VirusTotal 为 5.6%。
已知样本的检测率为 96.3%。
对新、未见过的勒索软件家族的平均检测率为 93.3%。
正则化逻辑回归略优于 SVM，且优于朴素贝叶斯；基于 MI 的前 400 个特征选择在性能与简化之间取得平衡。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。