QUICK REVIEW

[论文解读] Local and non-local dependency learning and emergence of rule-like representations in speech data by Deep Convolutional Generative Adversarial Networks

Gašper Beguš|arXiv (Cornell University)|Sep 26, 2020

Phonetics and Phonology Research参考文献 60被引用 13

一句话总结

该论文表明，深度卷积生成对抗网络（GANs）能够从语音数据中学习局部和非局部的音系过程，且规则式的音系概括性特征源于潜在变量之间的相互作用。关键发现是，非局部过程（如元音和谐）以概率方式学习，其可靠性低于局部过程，这与人类学习的偏见及类型学中对局部性的偏好相一致。

ABSTRACT

This paper argues that training GANs on local and non-local dependencies in speech data offers insights into how deep neural networks discretize continuous data and how symbolic-like rule-based morphophonological processes emerge in a deep convolutional architecture. Acquisition of speech has recently been modeled as a dependency between latent space and data generated by GANs in Begu\v{s} (2020b; arXiv:2006.03965), who models learning of a simple local allophonic distribution. We extend this approach to test learning of local and non-local phonological processes that include approximations of morphological processes. We further parallel outputs of the model to results of a behavioral experiment where human subjects are trained on the data used for training the GAN network. Four main conclusions emerge: (i) the networks provide useful information for computational models of speech acquisition even if trained on a comparatively small dataset of an artificial grammar learning experiment; (ii) local processes are easier to learn than non-local processes, which matches both behavioral data in human subjects and typology in the world's languages. This paper also proposes (iii) how we can actively observe the network's progress in learning and explore the effect of training steps on learning representations by keeping latent space constant across different training steps. Finally, this paper shows that (iv) the network learns to encode the presence of a prefix with a single latent variable; by interpolating this variable, we can actively observe the operation of a non-local phonological process. The proposed technique for retrieving learning representations has general implications for our understanding of how GANs discretize continuous speech data and suggests that rule-like generalizations in the training data are represented as an interaction between variables in the network's latent space.

研究动机与目标

该论文研究深度神经网络如何从原始语音数据中学习音系依赖关系。
旨在建模连接主义架构（如 GANs）中规则式、符号式表征的出现过程。
将计算模型的表现与人工语法学习实验中的人类行为数据进行比较。
探索可解释性技术，通过训练过程中潜在空间的操控来追踪学习进展。
检验深度网络是否能够学习非局部过程（如元音和谐），以及其在准确率和学习偏见方面与局部过程的对比。

提出的方法

该研究使用在具有受控音系模式（包括局部同位变体和非局部元音和谐）的合成语音数据上训练的深度卷积 GAN。
识别并操控潜在空间变量，以探究其在生成特定音系特征和过程中的作用。
通过特定潜在变量（如 z17）的线性插值，观察摩擦噪声或元音后舌度等声学特征的渐变变化。
模型在小规模人工数据集上进行训练，以模拟人工语法学习实验，从而可与人类行为数据直接比较。
在多个训练步骤分析训练进展，以避免性能上限效应，并观察表征随时间的出现过程。
统计分析比较局部与非局部过程的错误率，并评估元音和谐任务中和谐与非和谐输出的表现。

实验结果

研究问题

RQ1深度卷积 GAN 是否能从原始语音数据中学习局部音系过程（如清音化和送气）？
RQ2GAN 是否也能学习非局部形态音系过程（如元音和谐），若能，其可靠性如何？
RQ3在错误率和收敛性方面，局部与非局部过程的学习动态有何差异？
RQ4该模型的表征与行为在多大程度上与人工语法学习实验中人类受试者的表现相一致？
RQ5如何利用潜在空间变量主动观察并解释网络中规则式概括的出现过程？

主要发现

生成器网络以高准确率学习局部同位变体过程，清音化错误率仅为 1.8%。
非局部元音和谐以概率方式学习，23.2% 的输出违反和谐规则，表明其可靠性低于局部过程。
和谐与非和谐输出的分布为概率性而非类别性，且在前元音向后元音过渡时，非和谐输出更频繁。
该模型在非局部过程上的表现与人类行为数据高度一致，计算模型与人类受试者均表现出相似的错误率。
潜在空间操控表明，单一变量（如 z17）编码了前缀的存在，其插值可实现对非局部形态音系过程的主动观察。
本研究证明，规则式概括源于潜在变量之间的相互作用，表明符号式计算可从深度网络中分布式、连续的表征中涌现。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。