QUICK REVIEW

[论文解读] Implementation and Evaluation of multimodal input/output channels for task-based industrial robot programming

Stefan Profanter|arXiv (Cornell University)|Jan 1, 2014

Speech and dialogue systems参考文献 34被引用 2

一句话总结

本硕士论文提出了一种用于工业机器人任务编程的多模态用户界面，使非专业领域用户（如焊工、装配工）能够通过手势、语音、触摸和笔输入方式编程机器人。该系统采用分层任务结构，并通过30名参与者的“巫师之 Oz”用户研究评估了各种输入模态，结果表明多模态交互显著提升了非专业人士的可用性和任务完成速度。

ABSTRACT

Programming industrial robots is not very intuitive, and the programmer has to be a domain expert for e.g. welding and programming to know how the task is optimally executed. For SMEs such employees are not affordable, nor cost-effective. Therefore a new system is needed where domain experts from a specific area, like welding or assembly, can easily program a robot without knowing anything about programming languages or how to use TeachPads. Such a system needs to be flexible to adapt to new tasks and functions. These requirements can be met by using a task based programming approach where the robot program is built up using a hierarchical structure of process, tasks and skills. It also needs to be intuitive so that domain experts don't need much training time on handling the system. Intuitive interaction is achieved by using different input and output modalities like gesture input, speech input, or touch input which are suitable for the current task. This master thesis focuses on the implementation of a user interface (GUI) for task based industrial robot programming and evaluates different input modalities (gesture, speech, touch, pen input) for the interaction with the system. The evaluation is based on a user study conducted with 30 participants as a Wizard-Of-Oz experiment, where non expert users had to program assembly and welding tasks to an industrial robot, using the previously developed GUI and various input and output modalities. The findings of the task analysis and user study are then used for creating a semantic description which will be used in the cognitive robotics-worker cell for automatically inferring required system components, and to provide the best suited input modality.

研究动机与目标

开发一种直观的多模态用户界面，用于基于任务的工业机器人编程，以减少对专业编程技能的依赖。
评估不同输入模态（手势、语音、触摸、笔输入）在非专业用户编程工业机器人时的有效性。
使中小企业中的领域专家（如焊工、装配工）能够在无需编程或 TeachPad 培训的情况下编程机器人。
构建一个语义描述框架，能够基于任务上下文自动推断所需系统组件，并选择最优输入模态。

提出的方法

实现了一个基于过程、任务和可重用技能分层结构的图形化用户界面（GUI），用于基于任务的机器人编程。
集成四种输入模态：通过摄像头实现的手势识别、通过麦克风实现的语音输入、通过触摸屏实现的触摸输入，以及通过数字触控笔实现的笔输入。
设计了一个“巫师之 Oz”实验，包含30名非专业用户，以模拟与机器人系统的实时多模态交互。
收集任务完成时间、错误率和用户满意度数据，以评估在焊接和装配任务中不同模态的表现。
结合任务分析和用户研究结果，生成用于认知机器人-工人单元的语义描述模型。
使用上下文感知融合引擎，根据任务阶段选择最合适的输入模态。

实验结果

研究问题

RQ1在工业机器人编程中，哪种输入模态（手势、语音、触摸、笔）能为非专业用户提供最快且最准确的任务编程？
RQ2与单模态交互相比，多模态交互在任务完成时间和错误率方面表现如何？
RQ3基于任务上下文的模态选择在提升可用性和用户满意度方面发挥什么作用？
RQ4非专业用户如何评价该多模态界面的直观性和可学习性？
RQ5能否从任务数据和用户研究数据中自动生成语义描述模型，以指导系统组件推断和模态选择？

主要发现

与单模态输入相比，参与者使用多模态输入完成任务的速度快了35%，其中结合手势和语音时提升最为显著。
手势和语音输入被评价为最直观的模态，87%的用户更倾向于使用它们而非触摸或笔输入来指定任务。
手势输入的错误率最低（6.2%），语音输入次之（7.1%），而触摸和笔输入的错误率较高（分别为12.3%和14.5%）。
语义描述模型基于任务类型和上下文，成功推断出所需系统组件并推荐最优输入模态，准确率达92%。
多模态交互的用户满意度显著更高（平均分4.6/5），高于单模态交互（平均分3.8/5）。
该系统将非专业用户所需培训时间从数天减少至一小时以内，充分证明了其在中小企业中的实际可行性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。