QUICK REVIEW

[论文解读] Generalized Teacher Forcing for Learning Chaotic Dynamics

Florian Heß, Zahra Monfared|arXiv (Cornell University)|Jun 7, 2023

Neural Networks and Applications被引用 7

一句话总结

该论文提出通用教师强制（GTF），在训练RNN以拟合混沌动力学时对梯度进行界定，从而实现用浅层PLRNNs进行准确的低维重建，并在真实数据上超越现有方法。

ABSTRACT

Chaotic dynamical systems (DS) are ubiquitous in nature and society. Often we are interested in reconstructing such systems from observed time series for prediction or mechanistic insight, where by reconstruction we mean learning geometrical and invariant temporal properties of the system in question (like attractors). However, training reconstruction algorithms like recurrent neural networks (RNNs) on such systems by gradient-descent based techniques faces severe challenges. This is mainly due to exploding gradients caused by the exponential divergence of trajectories in chaotic systems. Moreover, for (scientific) interpretability we wish to have as low dimensional reconstructions as possible, preferably in a model which is mathematically tractable. Here we report that a surprisingly simple modification of teacher forcing leads to provably strictly all-time bounded gradients in training on chaotic systems, and, when paired with a simple architectural rearrangement of a tractable RNN design, piecewise-linear RNNs (PLRNNs), allows for faithful reconstruction in spaces of at most the dimensionality of the observed system. We show on several DS that with these amendments we can reconstruct DS better than current SOTA algorithms, in much lower dimensions. Performance differences were particularly compelling on real world data with which most other methods severely struggled. This work thus led to a simple yet powerful DS reconstruction algorithm which is highly interpretable at the same time.

研究动机与目标

在保持可解释性的前提下，激发从时间序列重建混沌动态系统的动机。
在不需要李雅普诺夫指数知识的前提下，解决用于混沌系统的RNN训练中的梯度爆炸问题。
提出GTF和一个浅层PLRNN体系结构，以实现可信的低维重建。
在仿真数据和真实世界数据集上展示优于SOTA方法的性能。

提出的方法

引入通用教师强制（GTF）：z_t = F_theta(z_{t-1}^tilde)，其中 z_t^tilde = (1-α) z_{t-1}^tilde + α z̄_{t-1}，以束缚雅可比积。
推导在混沌动力学下使 ∂z_t/∂z_r 的雅可比积保持有界的条件，包括最优 α* = 1 - 1/σ̃_max。
采用带有1隐藏层ReLU网络结构的浅层PLRNN（shPLRNN）架构，该结构可重构为dendPLRNN且保持可处理性。
在时间反向传播（BPTT）中结合GTF（aGTF）进行训练，并在训练过程中使用自适应方案设定α，而不需要对σ̃_max的完全知识。
使用基于数据推断状态的雅可比信息的自适应策略估计α，并在训练中逐步退火α以保持稳定性。
以固定GTF和自适应GTF（aGTF）进行评估，并与稀疏TF、LSTM-TBPTT、RC、SINDy、Neural ODEs和LEM进行比较。

实验结果

研究问题

RQ1GTF是否能够在对混沌时间序列的RNN训练中对损失梯度进行有界化，从而实现任意长时间步的训练？
RQ2当使用GTF训练时，浅层PLRNN是否能够在低维潜在空间中（至多与观测系统维度相同）忠实重construct混沌动力学？
RQ3与目前的DS重建方法相比，GTF+浅层PLRNN在几何结构和时间保真度方面在仿真实验和真实数据上有何差异？
RQ4在没有系统李雅普诺夫指数先验知识的情况下，如何选择并自适应α以实现稳定训练？
RQ5得到的模型是否能够解释并可用于分析重建动力学的不变性质（吸引子、固定点、循环）？

主要发现

数据集	方法	D_stsp	D_H	PE(20)	维度	\|θ\|
ECG (5d)	shPLRNN + GTF	4.3 ± 0.6	0.34 ± 0.02	(2.4 ± 0.1)·10^{-3}	5	2785
ECG (5d)	shPLRNN + aGTF	4.5 ± 0.4	0.34 ± 0.02	(2.4 ± 0.2)·10^{-3}	5	2785
ECG (5d)	shPLRNN + STF	7.1 ± 1.8	0.38 ± 0.03	(5 ± 2)·10^{-3}	5	2785
ECG (5d)	dendPLRNN + id-TF	5.8 ± 0.6	0.37 ± 0.06	(4.0 ± 0.4)·10^{-3}	35	3245
ECG (5d)	RC	5.3 ± 1.7	0.39 ± 0.05	(4 ± 1)·10^{-3}	1000	5000
ECG (5d)	LSTM-TBPTT	15.2 ± 0.5	0.73 ± 0.02	(2.5 ± 0.5)·10^{-2}	70	5920
ECG (5d)	SINDy	diverging	diverging	diverging	5	3960
ECG (5d)	N-ODE	12.2 ± 0.7	0.70 ± 0.03	(4.1 ± 0.1)·10^{-1}	5	4955

GTF为在混沌动力学上训练提供严格有界的损失梯度，提升稳定优化的可能性。
浅层PLRNN能够在至多等同于观测系统维度的空间中重建混沌动力学，同时保持可解释性和可处理性。
在真实世界数据（ECG、EEG）上，带GTF的shPLRNN在几何结构和时间保真度方面显著优于若干SOTA方法。
与LSTM-TBPTT、RC、SINDy、Neural ODEs、LEM相比，带GTF的shPLRNN在D_stsp和D_H方面更低，预测误差具有竞争力且潜变量显著更少。
自适应GTF（aGTF）在无需精确估计σ̃_max的情况下也能提供稳健性能，退火策略显著提升训练稳定性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。