Skip to main content
QUICK REVIEW

[论文解读] HoneyGPT: Breaking the Trilemma in Terminal Honeypots with Large Language Model

Ziyang Wang, Jianzhou You|arXiv (Cornell University)|Jun 4, 2024
Topic Modeling被引用 5
一句话总结

HoneyGPT 使用基于 ChatGPT 的终端蜜罐,结合提示框架和思维链,超越传统的终端蜜罐在灵活性、交互深度和欺骗性三难中的平衡,并通过基线评估和为期四周的实地评估进行验证。

ABSTRACT

Honeypots, as a strategic cyber-deception mechanism designed to emulate authentic interactions and bait unauthorized entities, often struggle with balancing flexibility, interaction depth, and deception. They typically fail to adapt to evolving attacker tactics, with limited engagement and information gathering. Fortunately, the emergent capabilities of large language models and innovative prompt-based engineering offer a transformative shift in honeypot technologies. This paper introduces HoneyGPT, a pioneering shell honeypot architecture based on ChatGPT, characterized by its cost-effectiveness and proactive engagement. In particular, we propose a structured prompt engineering framework that incorporates chain-of-thought tactics to improve long-term memory and robust security analytics, enhancing deception and engagement. Our evaluation of HoneyGPT comprises a baseline comparison based on a collected dataset and a three-month field evaluation. The baseline comparison demonstrates HoneyGPT's remarkable ability to strike a balance among flexibility, interaction depth, and deceptive capability. The field evaluation further validates HoneyGPT's superior performance in engaging attackers more deeply and capturing a wider array of novel attack vectors.

研究动机与目标

  • 激励开发更动态、智能的终端蜜罐,能够适应攻击者的战术。
  • 提出通用的蜜罐提示关键字规范,以及基于思维链的策略以提升交互中的推理能力。
  • 设计并实现 HoneyGPT,具有维护长期交互记忆和强健安全分析的提示管理器。
  • 使用开源数据和真实世界现场部署,对 HoneyGPT 与传统蜜罐进行评估,以评估欺骗性、交互性和灵活性。

提出的方法

  • 通过用基于 ChatGPT 的问答循环取代传统的请求-响应交互来开发 HoneyGPT。
  • 引入一个三组件框架:终端协议代理、提示管理器和 ChatGPT;利用 Cowrie 的协议层处理 SSH/Telnet,同时对 ChatGPT 进行响应提示。
  • 实现一个提示管理器,使用蜜罐原理、设置、攻击查询、系统状态寄存器和交互历史来构建提示,并采用修剪策略以适应上下文长度。
  • 融入思维链策略,分析攻击者命令对操作系统的影响,并生成用于持续推理的系统状态变更(C_i、F_i)。
  • 使用一个提示修剪机制,通过 Impact Factor(F_i)和 Weaken Factor(w)对交互进行评分,以删除最不重要的历史条目,保留相关上下文。
  • 将静态系统原则(P)和蜜罐设置(S)配置,以稳定操作与真实感。

实验结果

研究问题

  • RQ1Can HoneyGPT overcome the trilemma of flexibility, interaction depth, and deception in terminal honeypots using LLM-based prompting?
  • RQ2Does a Chain of Thought strategy and dynamic prompt management enhance long-term interaction memory and attacker engagement compared to traditional honeypots?
  • RQ3How does HoneyGPT perform in deception, interaction level, and flexibility against baseline open-source honeypots under replayed and live traffic?
  • RQ4What attack vectors and interaction lengths are captured by HoneyGPT in a four-week Internet deployment compared with Cowrie?
  • RQ5What are the practical limits and configurations required for deploying LLM-driven honeypots in real networks?

主要发现

  • Baseline evaluation shows HoneyGPT balances flexibility, interaction depth, and deception better than traditional honeypots using the same attack dataset.
  • Field deployment over four weeks indicates HoneyGPT entices attackers into longer, more complex interactions and captures a broader range of attack vectors than Cowrie.
  • HoneyGPT’s prompting framework with memory pruning effectively manages context length while maintaining useful historical context for reasoning.
  • The use of Chain of Thought enables HoneyGPT to handle extended, multi-command attack sequences such as write-elevate-execute more effectively than non-CoT approaches.
  • HoneyGPT demonstrates higher engagement and richer interaction trajectories in real-world tests compared to emulated or real-system honeypots.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。