QUICK REVIEW

[论文解读] Markerless tracking of user-defined features with deep learning

Alexander Mathis, Pranav Mamidanna|arXiv (Cornell University)|Apr 9, 2018

Face and Expression Recognition参考文献 3被引用 31

一句话总结

本文提出 DeepLabCut，一种基于深度学习的无标记追踪方法，利用极少标注数据的迁移学习技术，实现对动物行为视频中用户定义身体部位的追踪。仅需约200帧训练数据，即可达到人类水平的精度（RMSE ≈ 0.5–1.0 px），适用于多种物种和行为的精确、自动化姿态估计，如小鼠抓握、果蝇产卵及气味轨迹追踪。

ABSTRACT

Quantifying behavior is crucial for many applications in neuroscience. Videography provides easy methods for the observation and recording of animal behavior in diverse settings, yet extracting particular aspects of a behavior for further analysis can be highly time consuming. In motor control studies, humans or other animals are often marked with reflective markers to assist with computer-based tracking, yet markers are intrusive (especially for smaller animals), and the number and location of the markers must be determined a priori. Here, we present a highly efficient method for markerless tracking based on transfer learning with deep neural networks that achieves excellent results with minimal training data. We demonstrate the versatility of this framework by tracking various body parts in a broad collection of experimental settings: mice odor trail-tracking, egg-laying behavior in drosophila, and mouse hand articulation in a skilled forelimb task. For example, during the skilled reaching behavior, individual joints can be automatically tracked (and a confidence score is reported). Remarkably, even when a small number of frames are labeled ($\approx 200$), the algorithm achieves excellent tracking performance on test frames that is comparable to human accuracy.

研究动机与目标

开发一种灵活、非侵入性的方法，用于在无反光标记的情况下追踪动物行为视频中的用户定义身体部位。
克服基于标记系统存在的局限性，后者具有侵入性且需预设特征。
通过迁移学习在小规模数据集上训练深度神经网络，实现高精度姿态估计。
提供一种可推广的开源工具箱，用于神经科学中的自动化行为量化。
仅使用约200帧标注数据，实现与人工标注相当的性能。

提出的方法

该方法使用基于预训练深度神经网络（DeeperCut）的迁移学习，对用户标注的特定身体部位图像进行微调。
每个身体部位使用独立的输出层，预测每个像素处该部位存在的概率，生成用于定位的得分图。
通过结合L2损失和空间约束回归损失，联合调整特征提取权重与输出层权重。
网络在少量人工标注帧上端到端训练，通过图像缩放（50–150%范围）进行数据增强。
训练完成后，模型将得分图的峰值作为身体部位位置的预测结果，并利用学习到的与真实值的对应关系进行优化。
该框架通过在每个身体部位的得分图中提取局部最大值，支持多动物追踪。

实验结果

研究问题

RQ1能否通过迁移学习的深度学习方法，仅使用少量标注帧实现高精度无标记姿态估计？
RQ2该方法是否能在无需预先放置标记的情况下，跨多种动物物种和行为实现泛化？
RQ3在训练数据极少的情况下，模型性能与人工标注精度相比如何？
RQ4该框架能否自动检测并追踪复杂行为中的多个身体部位，包括精细关节？
RQ5超参数（如空间半径epsilon和缩放因子）对模型性能有何影响？

主要发现

仅使用约200帧标注训练数据，模型在测试帧上达到人类水平精度（RMSE ≈ 0.5–1.0 px），与人工标注性能相当。
该方法成功追踪了小鼠抓握任务中的单个关节，并为每个预测结果提供了置信度评分。
性能在多种实验设置中保持稳健，包括小鼠气味轨迹追踪、果蝇产卵行为及精细前肢任务。
交叉验证表明，广泛调整epsilon值并未提升性能，但epsilon值过小会导致性能急剧下降。
通过在得分图中检测局部最大值，该框架可在多动物场景中实现精确姿态估计。
对裁剪手部图像进行t-SNE可视化，揭示了与标注身体部位构型相对应的、清晰可解释的姿态聚类。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。