QUICK REVIEW

[论文解读] Detecting Phishing Sites Using ChatGPT

Takashi Koide, Naoki Fukushi|arXiv (Cornell University)|Jun 9, 2023

Spam and Phishing Detection被引用 22

一句话总结

本文介绍了 ChatPhishDetector，这是一个利用大型语言模型（LLMs）通过网络爬虫、生成提示并分析文本与图像来检测钓鱼站点的系统；GPT-4V 在多语言数据集上实现了接近完美的精准与召回率。

ABSTRACT

The emergence of Large Language Models (LLMs), including ChatGPT, is having a significant impact on a wide range of fields. While LLMs have been extensively researched for tasks such as code generation and text synthesis, their application in detecting malicious web content, particularly phishing sites, has been largely unexplored. To combat the rising tide of cyber attacks due to the misuse of LLMs, it is important to automate detection by leveraging the advanced capabilities of LLMs. In this paper, we propose a novel system called ChatPhishDetector that utilizes LLMs to detect phishing sites. Our system involves leveraging a web crawler to gather information from websites, generating prompts for LLMs based on the crawled data, and then retrieving the detection results from the responses generated by the LLMs. The system enables us to detect multilingual phishing sites with high accuracy by identifying impersonated brands and social engineering techniques in the context of the entire website, without the need to train machine learning models. To evaluate the performance of our system, we conducted experiments on our own dataset and compared it with baseline systems and several LLMs. The experimental results using GPT-4V demonstrated outstanding performance, with a precision of 98.7% and a recall of 99.6%, outperforming the detection results of other LLMs and existing systems. These findings highlight the potential of LLMs for protecting users from online fraudulent activities and have important implications for enhancing cybersecurity measures.

研究动机与目标

利用 LLMs 而无需大量带标签训练数据来实现自动化钓鱼检测的需求。
通过分析整个网站内容（文本与视觉内容）而不仅仅是商标或 URL，实现多语言的钓鱼检测。
展示通过提示与提示设计，LLMs 能在跨司法辖区识别社会工程和品牌冒充。
与基线模型进行比较，以确立 ChatPhishDetector 在真实世界场景中的有效性。

提出的方法

爬取网站以收集最终 URL、浏览器渲染的 HTML 以及屏幕截图；必要时通过 OCR 提取文本。
使用 Chain-of-Thought 提示的提示工程引导 LLMs 完成四个子任务：SE 技术、品牌识别、钓鱼/非钓鱼结论，以及 JSON 输出。
简化 HTML 与 OCR 文本以适配令牌限制，同时保留钓鱼检测的关键信号。
以 Normal（文本输入）和 Vision（文本+图像输入）两种模式运行，以充分利用多模态 LLM。
在多种 LLM（GPT-4、GPT-4V、GPT-3.5、Gemini Pro、Llama 2）以及基线模型（dnstwist、Phishpedia）上对 1,000 个钓鱼站点与 1,000 个非钓鱼站点的数据集进行评估。

实验结果

研究问题

RQ1LLMs 是否能通过分析整个网站内容（包括文本和视觉内容）在多语言环境下检测钓鱼站点？
RQ2不同 LLMs 及输入模式（文本 vs. 视觉）在精准度、召回率与鲁棒性方面的比较如何？
RQ3哪些信号（品牌冒充、社会工程线索、域名-品牌一致性）推动了准确的钓鱼分类？
RQ4提示驱动的钓鱼检测在真实世界部署中的成本效益和可扩展性如何？

主要发现

System	Mode	Model	Precision	Recall	Accuracy	F-measure	URL	HTML	Image	Phishing	Non-phishing
ChatPhishDetector	Vision	GPT-4V	98.7%	99.6%	99.2%	99.2%	✓	✓	✓	✓	✓
Gemini Pro Vision	Vision	Gemini Pro Vision	78.9%	99.1%	89.1%	87.9%	✓	✓	✓	✓	✓
GPT-4	Normal	GPT-4	98.3%	98.4%	98.4%	98.4%	✓	✓	✓	✓	✓
GPT-3.5	Normal	GPT-3.5	98.3%	86.7%	92.6%	92.1%	✓	✓	✓	✓	✓
Llama-2-70B	Normal	Llama-2-70B	78.4%	66.4%	74.1%	71.9%	✓	✓	✓	✓	✓
Gemini Pro	Normal	Gemini Pro	90.5%	95.6%	93.2%	93.0%	✓	✓	✓	✓	✓
Simple GPT-4	Normal	Simple GPT-4	98.4%	75.5%	87.2%	85.5%	✓	✓	✓	✓	✓
GPT-3.5	Normal	GPT-3.5	98.6%	77.5%	88.2%	86.8%	✓	✓	✓	✓	✓
dnstwist	-	-	-	-	31.3%	-	-	-	-	-	-
Phishpedia	-	-	-	-	26.0%	-	-	-	-	-	-

GPT-4V（视觉模式）在数据集上的表现最佳，精准率 98.7%、召回率 99.6%。
GPT-4（正常模式）亦表现强劲，精准率 98.3%、召回率 98.4%；GPT-3.5 的钓鱼召回率显著较低（86.7%）。
在 GPT-4/GPT-4V 运行中，这些具有挑战性的案例里系统正确识别了所有 172 个高度不合规的钓鱼站点（100% 检测）。
与基线（dnstwist 3 1.3% 精准度、Phishpedia 26.0% 精准度）相比，ChatPhishDetector 采用 GPT-4V/GPT-4 提供了更高的准确性与更广泛的覆盖（多语言且非以标识符为基础的钓鱼）。
钓鱼分类可通过 phishing_score 阈值进行调优，且 ROC 性能突出（GPT-4V 的 AUC 高达 0.998）。
钓鱼检测成本与延迟对于部署是可行的（GPT-4V：约 0.179 美元/站点，约 25 秒/推理）。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。