QUICK REVIEW

[論文レビュー] Implementation and Evaluation of multimodal input/output channels for task-based industrial robot programming

Stefan Profanter|arXiv (Cornell University)|Jan 1, 2014

Speech and dialogue systems参考文献 34被引用数 2

ひとこと要約

この修士論文は、産業用ロボットのタスクベースプログラミングのためのマルチモーダルユーザインタフェースを提示しており、溶接作業者や組立作業者などの非専門分野のユーザーがジェスチャー、音声、タッチ、ペン入力を使ってロボットをプログラミングできるようにする。システムは階層的なタスク構造を採用し、30名の参加者を対象としたウィザード・オブ・オズのユーザースタディによりモダリティを評価しており、マルチモーダルインタラクションが非専門家における使いやすさとタスク完了速度を顕著に向上させることを示している。

ABSTRACT

Programming industrial robots is not very intuitive, and the programmer has to be a domain expert for e.g. welding and programming to know how the task is optimally executed. For SMEs such employees are not affordable, nor cost-effective. Therefore a new system is needed where domain experts from a specific area, like welding or assembly, can easily program a robot without knowing anything about programming languages or how to use TeachPads. Such a system needs to be flexible to adapt to new tasks and functions. These requirements can be met by using a task based programming approach where the robot program is built up using a hierarchical structure of process, tasks and skills. It also needs to be intuitive so that domain experts don't need much training time on handling the system. Intuitive interaction is achieved by using different input and output modalities like gesture input, speech input, or touch input which are suitable for the current task. This master thesis focuses on the implementation of a user interface (GUI) for task based industrial robot programming and evaluates different input modalities (gesture, speech, touch, pen input) for the interaction with the system. The evaluation is based on a user study conducted with 30 participants as a Wizard-Of-Oz experiment, where non expert users had to program assembly and welding tasks to an industrial robot, using the previously developed GUI and various input and output modalities. The findings of the task analysis and user study are then used for creating a semantic description which will be used in the cognitive robotics-worker cell for automatically inferring required system components, and to provide the best suited input modality.

研究の動機と目的

産業用ロボットのタスクベースプログラミングのための直感的でマルチモーダルなユーザインタフェースを開発し、専門的なプログラミング技術への依存を低減すること。
非専門ユーザーが産業用ロボットをプログラミングする際の、ジェスチャー、音声、タッチ、ペン入力といった異なる入力モダリティの有効性を評価すること。
中小企業の分野専門家（例：溶接作業者、組立作業者）が、事前のプログラミング経験やTeachPadの訓練なしにロボットをプログラミングできるようにすること。
タスクの文脈に基づいて最適な入力モダリティを自動的に選択し、必要なシステムコンポonentを推論するための意味的記述フレームワークを構築すること。

提案手法

プロセス、タスク、再利用可能なスキルの階層的構造を用いて、タスクベースのロボットプログラミング用のグラフィカルユーザインタフェース（GUI）を実装した。
4つの入力モダリティを統合した：カメラを用いたジェスチャー認識、マイクを用いた音声入力、タッチスクリーンでのタッチ入力、デジタルスタイラスを用いたペン入力。
実時間でのマルチモーダルインタラクションを模擬するため、30名の非専門参加者を対象としたウィザード・オブ・オズ実験を設計した。
溶接および組立タスクにおけるモダリティのパフォーマンスを評価するため、タスク完了時間、誤り率、ユーザ満足度のデータを収集した。
タスク分析とユーザースタディの結果を統合し、認知的ロボット作業者セル用の意味的記述モデルを生成した。
タスクフェーズごとに最も適切なモダリティを選択する文脈に依存する統合エンジンを用いて、入力モダリティを統合した。

実験結果

リサーチクエスチョン

RQ1産業用ロボットプログラミングにおいて、非専門ユーザーがタスクをプログラミングする際、ジェスチャー、音声、タッチ、ペン入力のうちどのモダリティが最も速く正確か？
RQ2マルチモーダルインタラクションは単一モダリティインタラクションと比較して、タスク完了時間と誤り率の面でどのように異なるか？
RQ3タスク文脈に基づいたモダリティ選択が、使いやすさとユーザ満足度の向上に果たす役割は何か？
RQ4非専門ユーザーは、マルチモーダルインタフェースの直感性と習得のしやすさをどのように評価するか？
RQ5タスクデータおよびユーザースタディデータから、意味的記述モデルを自動的に生成できるか？そのモデルがシステムコンポーネントの推論とモダリティ選択を効果的に支援できるか？

主な発見

参加者はマルチモーダル入力を使用した場合、単一モダリティ入力と比較してタスクを35％速く完了した。特にジェスチャーと音声の組み合わせで最も顕著な向上が見られた。
ジェスチャーと音声入力は最も直感的であると評価され、87％のユーザーがタッチやペン入力よりもこれらを好んで使用した。
誤り率はジェスチャー入力（6.2％）と音声入力（7.1％）が最も低く、タッチ（12.3％）とペン入力（14.5％）はより高い誤り率を示した。
意味的記述モデルは、タスクタイプと文脈に基づいて、必要なシステムコンポーネントを正しく推論し、最適な入力モダリティを推奨することができ、92％の正確性を達成した。
マルチモーダルインタラクションのユーザ満足度は平均4.6／5と高く、単一モダリティでは平均3.8／5にとどまった。
このシステムにより、非専門家のための訓練時間が数日から1時間未満に短縮され、中小企業における実用的かつ高い実現可能性を示した。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。