QUICK REVIEW

[论文解读] SWAN: A Generic Framework for Auditing Textual Conversational Systems

Tetsuya Sakai|arXiv (Cornell University)|May 15, 2023

Hate Speech and Cyberbullying Detection被引用 9

一句话总结

SWAN 提出一种通用审计框架，通过对对话会话中的 nugget 序列计算 Schematised Weighted Average Nugget (SWAN) 分数，使用准则架构和基于位置的 nugget 加权。

ABSTRACT

We present a simple and generic framework for auditing a given textual conversational system, given some samples of its conversation sessions as its input. The framework computes a SWAN (Schematised Weighted Average Nugget) score based on nugget sequences extracted from the conversation sessions. Following the approaches of S-measure and U-measure, SWAN utilises nugget positions within the conversations to weight the nuggets based on a user model. We also present a schema of twenty (+1) criteria that may be worth incorporating in the SWAN framework. In our future work, we plan to devise conversation sampling methods that are suitable for the various criteria, construct seed user turns for comparing multiple systems, and validate specific instances of SWAN for the purpose of preventing negative impacts of conversational systems on users and society. This paper was written while preparing for the ICTIR 2023 keynote (to be given on July 23, 2023).

研究动机与目标

促成对由大型语言模型驱动的对话系统进行快速且高召回的审计，以在识别潜在危害的同时认识到其好处。
提出一个通用、透明的评估框架，不需要访问内部系统状态。
引入基于 nugget 的打分机制，对话会话中 Nugget 的位置进行加权。
提供一个 20(+1) 条准则的架构，以指导多方面评估。
概述未来在抽样、种子用户轮次和社会影响预防验证方面的工作。

提出的方法

将 nugget 定义为由事实性陈述（Type F）或对话行为（Type O）组成的原子单位。
使用自动 nugget 提取器从抽样的对话会话中提取 nuggets。
对每个 nugget 根据准则架构进行打分，得分可在 nugget 级或轮次级进行。
使用带有位置感知的 nugget 权重 NW^c 与 nugget 分数 S^c 计算每一条准则的 WAN。
将准则 WAN 分数与准则权重 {CW^c} 结合，形成 SWAN 分数，SWAN = sum_c CW^c WAN^c(U^c) / sum_c CW^c。
讨论潜在的扩展，如分组公平性（分配性相似性）与用于非决定性路径的随机 SWAN 变体。

实验结果

研究问题

RQ1在无法访问内部状态的情况下，如何对文本对话系统进行审计？
RQ2基于 nugget 的、位置加权打分框架是否能够在多项准则上可靠地总结系统行为？
RQ3一个可扩展的准则架构应如何覆蓋对话系统的安全性、实用性与公平性？
RQ4抽样和种子用户轮次如何支持在 SWAN 框架下对不同系统的比较？
RQ5未来需要哪些方向来验证 SWAN 以防止负面社会影响？

主要发现

SWAN 提供一个正式的分数，将每个 nugget 的准则分数按对话中 nugget 位置加权汇总。
一个二十条 (+1) 准则的架构指导多方面评估，包括正确性、无害性和公平性。
Nuggets 可分为 Type F（事实）或 Type O（对话行为），可以在 nugget 级或轮次级进行评估。
该框架允许可视化单个 nugget/轮次分数，以在对话中准确定位问题。
SWAN 可以扩展到随机路径，以处理非确定性用户模拟。
作者提出将种子用户轮次与多系统比较集成的设想，指出跨系统的可比性并非完全精确。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。