[论文解读] THETA: Triangulated Hand-State Estimation for Teleoperation and Automation in Robotic Hand Control
THETA 使用三路同步摄像头和 DeepLabV3/MobileNetV2 管线,从多视角 RGB 图像估计手指关节角度,实现机器人手的实时、低成本遥操作(DexHand)。
The teleoperation of robotic hands is limited by the high costs of depth cameras and sensor gloves, commonly used to estimate hand relative joint positions (XYZ). We present a novel, cost-effective approach using three webcams for triangulation-based tracking to approximate relative joint angles (theta) of human fingers. We also introduce a modified DexHand, a low-cost robotic hand from TheRobotStudio, to demonstrate THETA's real-time application. Data collection involved 40 distinct hand gestures using three 640x480p webcams arranged at 120-degree intervals, generating over 48,000 RGB images. Joint angles were manually determined by measuring midpoints of the MCP, PIP, and DIP finger joints. Captured RGB frames were processed using a DeepLabV3 segmentation model with a ResNet-50 backbone for multi-scale hand segmentation. The segmented images were then HSV-filtered and fed into THETA's architecture, consisting of a MobileNetV2-based CNN classifier optimized for hierarchical spatial feature extraction and a 9-channel input tensor encoding multi-perspective hand representations. The classification model maps segmented hand views into discrete joint angles, achieving 97.18% accuracy, 98.72% recall, F1 Score of 0.9274, and a precision of 0.8906. In real-time inference, THETA captures simultaneous frames, segments hand regions, filters them, and compiles a 9-channel tensor for classification. Joint-angle predictions are relayed via serial to an Arduino, enabling the DexHand to replicate hand movements. Future research will increase dataset diversity, integrate wrist tracking, and apply computer vision techniques such as OpenAI-Vision. THETA potentially ensures cost-effective, user-friendly teleoperation for medical, linguistic, and manufacturing applications.
研究动机与目标
- 解决用于机器人手遥操作的指关节跟踪所需深度相机与传感手套成本高、可及性差的问题。
- 利用三台摄像头与三角测量开发低成本、实时的手态估计管线。
- 通过预测的关节角度实现端到端遥操作,控制 DexHand 机器人手。
提出的方法
- 从三台分辨率为 640x480p、间距 120 度的摄像头获取同步的多视角 RGB 数据。
- 为食指、中指、无名指、小拇指的 MCP、PIP、DIP 角度进行 ground-truth 注释,在 40 姿态内生成 gesture_angles.csv。
- 利用 DeepLabV3-ResNet-50 主干网络对手部区域进行分割,并使用 HSV 滤波来分离手部区域。
- 将分割后的视图处理为 9 通道的多视角输入,输入到基于 MobileNetV2 的分类器中,以预测跨 10 个角度类别的 15 个关节角度区间。
- 通过温度缩放 softmax 和 focal loss 进行输出标定,以处理类别不平衡;使用 Adam 进行训练并进行迁移学习。
- 将预测的关节角度以串行方式传输到 Arduino,实时驱动 DexHand。

实验结果
研究问题
- RQ1一个低成本的多视角视觉系统是否能够实现指关节角度的高精度实时估计?
- RQ2在保持遥操作效率的前提下,哪种架构最适合将分割后的多视角手部视图映射到离散的关节角度区间?
- RQ3在不同照明条件下,所提出的 THETA 管线在未见手势上的准确率、精确率、召回率和 F1 值表现如何?
主要发现
- 模型在未见数据上的测试准确率达到 97.18%。
- precision 为 0.8906,recall 为 0.9872,F1 值为 0.9274。
- 训练准确率达到 97.50%,验证准确率 97.03%,损失收敛至 0.0001。
- 该管线可实现实时关节角推断并通过 Arduino 对 DexHand 进行低延迟机器人执行。
- DexHand 是一款低成本(约 250 美元)的机器人手,能够通过 THETA 的预测实现指部动作的实时再现。

更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。