QUICK REVIEW

[论文解读] PlotMachines: Outline-Conditioned Generation with Dynamic Plot State Tracking

Hannah Rashkin, Aslı Çelikyılmaz|arXiv (Cornell University)|Apr 30, 2020

Topic Modeling参考文献 37被引用 44

一句话总结

PlotMachines 引入了基于大纲的故事生成，具备动态情节状态记忆与话语结构，在多数据集上相较于强基线如 GPT-2 与 Grover，展现出更高的连贯性与对大纲的遵循。

ABSTRACT

We propose the task of outline-conditioned story generation: given an outline as a set of phrases that describe key characters and events to appear in a story, the task is to generate a coherent narrative that is consistent with the provided outline. This task is challenging as the input only provides a rough sketch of the plot, and thus, models need to generate a story by interweaving the key points provided in the outline. This requires the model to keep track of the dynamic states of the latent plot, conditioning on the input outline while generating the full story. We present PlotMachines, a neural narrative model that learns to transform an outline into a coherent story by tracking the dynamic plot states. In addition, we enrich PlotMachines with high-level discourse structure so that the model can learn different writing styles corresponding to different parts of the narrative. Comprehensive experiments over three fiction and non-fiction datasets demonstrate that large-scale language models, such as GPT-2 and Grover, despite their impressive generation performance, are not sufficient in generating coherent narratives for the given outline, and dynamic plot state tracking is important for composing narratives with tighter, more consistent plots.

研究动机与目标

定义基于大纲的故事生成任务，并说明需要动态情节状态跟踪的理由。
开发 PlotMachines，一种具记忆增强的 Transformer，用于从大纲生成多段落故事。
结合高层次话语结构，学习不同叙事部分（开头、中间、结尾）的写作风格。
创建并发布三组数据集，将多段落叙事与自动构建的大纲配对。
展示大型预训练模型的局限性，并展示动态情节状态跟踪相对于基线的优势。

提出的方法

将大纲表示为带有专用要点分隔符和结束标记的令牌序列，并对每个段落以该大纲为条件进行生成。
维护一个含有两个分量的记忆矩阵 K（大纲要点跟踪）和 D（潜在文档状态），用于跨段落跟踪情节状态。
在每个段落后通过门更新记忆，使用前一段落的表示 h^{i-1} 来细化 M^{i}。
修改 Transformer 块，增加一个额外的记忆注意力通道，在进行标准自注意力的同时对记忆进行注意；输出取平均。
在输入中加入话语标签（开头、主体、结尾），以学习不同叙事部分的风格差异。
端到端地在交叉熵损失上训练，以预测每段落，训练时使用真实的前一段落来更新记忆，解码时采用预设的五段落结构。

实验结果

研究问题

RQ1基于大纲条件的生成模型是否能够生成符合给定大纲的连贯长篇故事？
RQ2相较于无记忆基线，动态情节状态跟踪是否能提高连贯性与对大纲的遵循？
RQ3话语层级结构对学习不同叙事部分的写作风格有何影响？
RQ4与大型预训练模型（如 GPT、GPT-2、Grover）相比，带记忆的模型在基于大纲的生成任务上是否更有效？

主要发现

模型	Wikiplots 平均长度	Wikiplots B-2	Wikiplots B-3	Wikiplots B-4	Wikiplots B-5	WritingPrompts 平均长度	WritingPrompts B-2	WritingPrompts B-3	WritingPrompts B-4	WritingPrompts B-5	NYTimes 平均长度	NYTimes B-2	NYTimes B-3	NYTimes B-4	NYTimes B-5
Gold Test	330	.74	.50	.29	.15	661	.82	.61	.40	.25	315	.73	.50	.32	.21
P&W-Static	352	.93	.85	.75	.64	675	.97	.94	.89	.85	352	.93	.85	.74	.63
Fusion	191	.84	.71	.58	.48	197	.93	.85	.75	.65	171	.89	.80	.70	.60
Grover	835	.72	.49	.48	.37	997	.88	.72	.52	.34	719	.79	.57	.38	.25
GPT	909	.77	.47	.25	.11	799	.73	.40	.19	.08	739	.68	.36	.27	.08
GPT-2	910	.60	.26	.10	.03	799	.74	.41	.19	.08	756	.69	.36	.17	.08
PlotMachines (GPT)	682	.77	.58	.40	.27	850	.89	.81	.72	.63	537	.85	.69	.53	.40
PlotMachines (GPT-2)	553	.56	.19	.07	.02	799	.83	.56	.30	.14	455	.79	.57	.37	.23
PM-NoMem (GPT-2)	--	--	--	--	--	--	--	--	--	--	--	--	--	--	--
PM-NoMem-NoDisc (GPT-2)	--	--	--	--	--	--	--	--	--	--	--	--	--	--	--
base (GPT-2)	--	--	--	--	--	--	--	--	--	--	--	--	--	--	--

PlotMachines 在 Wikiplots、WritingPrompts 和 NYTimes 数据集上的 ROUGE 分数与基线相比相当或更高。
以 GPT-2 为基础的 PlotMachines 在三个数据集的若干指标上比 Grover 的 ROUGE 更高。
记忆与话语组件有益；移除记忆或话语的消融实验会降低性能。
在人类评估中，PlotMachines 在大纲利用、叙事流畅性和整体排序方面优于 GPT 与 Fusion。
PlotMachines 展现出更好的多样性（self-BLEU 较低），同时保持连贯性和对大纲的遵循，相比基线。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。