QUICK REVIEW

[论文解读] Automatically Generate Steganographic Text Based on Markov Model and Huffman Coding

Zhongliang Yang, Shuyu Jin|arXiv (Cornell University)|Nov 12, 2018

Advanced Steganography and Watermarking Techniques参考文献 17被引用 29

一句话总结

本文提出了一种使用马尔可夫链模型和霍夫曼编码的自动化隐写文本生成方法，将秘密数据嵌入流畅且类人的文本中。通过从大规模人工撰写文本中学习，该模型生成统计上自然的载体，具有高度隐蔽性与提升的载荷容量，在安全性和效率方面优于先前的方法。

ABSTRACT

Steganography, as one of the three basic information security systems, has long played an important role in safeguarding the privacy and confidentiality of data in cyberspace. The text is the most widely used information carrier in people's daily life, using text as a carrier for information hiding has broad research prospects. However, due to the high coding degree and less information redundancy in the text, it has been an extremely challenging problem to hide information in it for a long time. In this paper, we propose a steganography method which can automatically generate steganographic text based on the Markov chain model and Huffman coding. It can automatically generate fluent text carrier in terms of secret information which need to be embedded. The proposed model can learn from a large number of samples written by people and obtain a good estimate of the statistical language model. We evaluated the proposed model from several perspectives. Experimental results show that the performance of the proposed model is superior to all the previous related methods in terms of information imperceptibility and information hidden capacity.

研究动机与目标

为解决因冗余度低和编码密度高而导致秘密数据嵌入时难以被检测到的挑战。
开发一种自动化系统，根据秘密消息长度生成流畅且类人的文本载体。
与现有方法相比，提升隐写性能，包括信息隐藏容量和隐蔽性。
利用统计语言建模和高效编码技术，实现鲁棒且可扩展的隐写文本生成。

提出的方法

该方法采用在大规模人工撰写文本语料上训练的高阶马尔可夫链模型，以估计词序列的概率分布。
使用霍夫曼编码将秘密消息压缩为二进制流，以最小化载荷大小并提升嵌入效率。
系统通过从马尔可夫模型中动态采样生成文本，同时确保嵌入的消息可使用压缩后的比特流无缝编码。
通过选择与消息比特模式概率相匹配的词，将霍夫曼编码的秘密数据集成到生成的文本中。
通过基于上下文的概率性词选择，保持句法和语义连贯性，从而确保语言流畅性。
该方法完全自动化，无需人工输入载体创建，且能根据秘密消息长度自适应调整。

实验结果

研究问题

RQ1在自然语言语料上训练的马尔可夫模型能否生成适合隐写嵌入的流畅、类人文本？
RQ2霍夫曼编码在保持嵌入容量和隐蔽性的同时，多大程度上能有效减小载荷大小？
RQ3与现有隐写文本技术相比，该方法在信息隐藏容量和统计不可检测性方面有多大的性能优势？
RQ4该系统能否在无需人工干预的情况下自动生成文本载体，并根据秘密消息长度进行定制？

主要发现

所提出的方法在信息隐藏容量方面优于以往的隐写文本技术。
经统计和感知评估验证，生成的隐写文本表现出高度的语言流畅性和自然性。
霍夫曼编码的集成显著减小了载荷大小，提升了嵌入效率并降低了可检测性。
该模型表现出强大的隐蔽性，实验结果表明其与自然语言分布的统计偏差极低。
该系统成功实现了隐写文本载体的自动化生成，消除了对人工选择或编辑的需求。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。