QUICK REVIEW

[论文解读] Hierarchical State Space Models for Continuous Sequence-to-Sequence Modeling

Raunaq Bhirangi, Chenyu Wang|arXiv (Cornell University)|Feb 15, 2024

Advanced Database Systems and Queries被引用 5

一句话总结

HiSS 构建了一个两层时序层次的状态空间模型，以改善来自真实世界传感数据的连续序列到序列预测，在性能上超越 LSTMs 与 Transformers，并且对较小数据集具有良好的扩展性。

ABSTRACT

Reasoning from sequences of raw sensory data is a ubiquitous problem across fields ranging from medical devices to robotics. These problems often involve using long sequences of raw sensor data (e.g. magnetometers, piezoresistors) to predict sequences of desirable physical quantities (e.g. force, inertial measurements). While classical approaches are powerful for locally-linear prediction problems, they often fall short when using real-world sensors. These sensors are typically non-linear, are affected by extraneous variables (e.g. vibration), and exhibit data-dependent drift. For many problems, the prediction task is exacerbated by small labeled datasets since obtaining ground-truth labels requires expensive equipment. In this work, we present Hierarchical State-Space Models (HiSS), a conceptually simple, new technique for continuous sequential prediction. HiSS stacks structured state-space models on top of each other to create a temporal hierarchy. Across six real-world sensor datasets, from tactile-based state prediction to accelerometer-based inertial measurement, HiSS outperforms state-of-the-art sequence models such as causal Transformers, LSTMs, S4, and Mamba by at least 23% on MSE. Our experiments further indicate that HiSS demonstrates efficient scaling to smaller datasets and is compatible with existing data-filtering techniques. Code, datasets and videos can be found on https://hiss-csp.github.io.

研究动机与目标

解决来自嘈杂、高频传感数据的连续序列到序列预测挑战.
为 CSP 任务提供有代表性的基准，并在该基准上评估现代序列模型.
提出利用时间结构的分层架构（HiSS），以提升预测性能。
展示 CSP 任务的数据效率与对预处理需求的降低。

提出的方法

引入 CSP-Bench，一个涵盖触觉与惯性测量单元传感器的六数据集连续序列预测基准。
使用 Deep State Space Models (SSMs) 如 S4 与 Mamba 作为基础的平坦基线模型。
提出 HiSS，在输入序列块之上堆叠一个低层 SSM，并再结合一个高层 SSM 将块特征映射到输出。
端到端训练，采用均方误差损失，标准化采样（输入 50 Hz，输出 5 Hz），并可选地将一阶差分作为特征。
证明分层建模相对于平坦 SSM 及其他基线可带来显著改进，并分析数据效率与预处理兼容性。

实验结果

研究问题

RQ1SSMs 与 LSTMs 及 Transformers 在 CSP-Bench 任务上的表现对比？
RQ2HiSS 是否通过利用时间层级在平坦模型之上带来额外的收益？
RQ3HiSS 是否与标准传感器预处理（如滤波）兼容？
RQ4HiSS 在低数据量情形与高维输入下的表现如何？

主要发现

模型类型	模型架构	MW (cm/s)	IS (cm/s)	JC (cm/s)	R (m/s)	V (m/s)	TC (m/s)
平坦	Transformer	2.3750	0.4600	1.0200	-	0.0432	-
平坦	LSTM	1.1685	0.3099	1.0740	0.0444	0.0353	0.1767
平坦	S4	1.3190	0.2617	0.9804	0.0382	0.0341	0.3483
平坦	Mamba	0.8830	0.1757	1.0640	0.0401	0.0319	0.3645
分层	Transformer \| LSTM	0.9958	0.2527	0.9350	0.0421	0.0377	0.3197
分层	S4 \| LSTM	0.6205	0.1574	0.8980	0.0363	0.0374	0.3583
分层	Mamba \| LSTM	1.0268	0.2022	0.9060	0.0472	0.0372	0.4560
分层	S4 \| S4	0.6590	0.1526	0.9080	0.0481	0.0322	0.3505
分层	Mamba \| S4	0.7915	0.1925	1.0610	0.0442	0.0286	0.3638
分层	S4 \| Mamba	0.6255	0.1551	0.9060	0.0265	0.0303	0.3438
分层	Mamba \| Mamba	0.7248	0.1678	0.9050	0.0325	0.0251	0.3762

SSMs（Mamba、S4）在 CSP-Bench 上优于 LSTMs 与 Transformers，所有任务的中位数 MSE 提升在 10–14% 之间。
HiSS 进一步提升，在所有任务中，相对于最佳平坦模型的中位数 MSE 提升约 23%。
以 S4 作为低层组件的 HiSS 模型在低层时间结构的有效捕获方面表现出强劲收益。
单纯的下采样并不能达到 HiSS 的收益，表明 HiSS 能提取出比简单下采样更多的信息。
HiSS 展现出数据效率，在较小的训练子集上也能取得良好表现。
TotalCapture 对 SSM 与 HiSS 来说是一个失败案例，可能是由于高输入/输出维度及人类数据的嘈杂性所致。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。