QUICK REVIEW

[论文解读] ChatGPT and Other Large Language Models as Evolutionary Engines for Online Interactive Collaborative Game Design

Pier Luca Lanzi, Daniele Loiacono|arXiv (Cornell University)|Feb 9, 2023

Artificial Intelligence in Games参考文献 47被引用 11

一句话总结

论文提出一个在线框架，使用由大型语言模型（LLMs）驱动的交互式进化来协作设计游戏概念，作为自由文本，由人类参与者评估。它通过三个任务进行评估，约80名设计师使用Telegram收集反馈。

ABSTRACT

Large language models (LLMs) have taken the scientific world by storm, changing the landscape of natural language processing and human-computer interaction. These powerful tools can answer complex questions and, surprisingly, perform challenging creative tasks (e.g., generate code and applications to solve problems, write stories, pieces of music, etc.). In this paper, we present a collaborative game design framework that combines interactive evolution and large language models to simulate the typical human design process. We use the former to exploit users' feedback for selecting the most promising ideas and large language models for a very complex creative task - the recombination and variation of ideas. In our framework, the process starts with a brief and a set of candidate designs, either generated using a language model or proposed by the users. Next, users collaborate on the design process by providing feedback to an interactive genetic algorithm that selects, recombines, and mutates the most promising designs. We evaluated our framework on three game design tasks with human designers who collaborated remotely.

研究动机与目标

证明LLMs能够实现进化算子（随机初始化、交叉、变异）用于自由文本的游戏点子。
使人类与LLMs之间实现实时、在线协作以进化设计概念。
评估基于Telegram接口收集评估的可行性与用户体验。
在多种设计任务和参与者群体上评估框架，以验证普适性与实用性。

提出的方法

将游戏概念表示为自由文本，并在在线种群和数据库中管理。
使用通过对LLM提示实现的带有比赛选择、交叉和变异的交互式进化算法。
通过Telegram或网页接口将候选设计发布给用户，收集三值定性反馈（正面/中性/负面）。
使用LLMs（ChatGPT 或 DaVinci GPT-3）实现文本点子的遗传操作符。
进行多次固定时长的实验，以模拟有结构的设计会话和全球游戏创客营场景。
分析评估、点子长度与涌现的机制，以评估创造力和连贯性。

实验结果

研究问题

RQ1LLMs 是否能在交互式进化环境中有效实现自由文本点子的人为遗传操作（随机初始化、交叉、变异）？
RQ2基于Telegram的在线反馈循环是否支持协作游戏设计并产生新颖的涌现机制？
RQ3在现实世界场景中，LLM驱动的设计迭代的定性优点与局限（如连贯性、新颖性）是什么？
RQ4框架在不同设计任务（桌游、视频游戏）和活动形式（研讨会与全球游戏创客营）中的表现如何？

主要发现

框架支持了约80名设计师，涵盖三个设计任务，其中两个任务有四天工作流程，Global Game Jam环节较短。
桌游任务产生了更长的概念，并在迭代中向生态系统维护型机制转变，表明涌现的新颖性。
视频游戏任务产生了如颜色操控、光/反射交互等新颖机制，显示出涌现的试验性。
参与者认为涌现的新颖性和多样化叙事是正面，而由于样本量小导致的连贯性和冗余被视为负面。
在两次主要实验中，桌游概念评估总数为799，视频游戏概念评估总数为1025；全球游戏创客营环节较短，评估数量较少。
研究中未观察到 ChatGPT 与 DaVinci GPT-3 之间的性能差异；两者都可以作为操作员的可行 LLM 选项。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。