QUICK REVIEW

[논문 리뷰] The Feeling of Success: Does Touch Sensing Help Predict Grasp Outcomes?

Roberto Calandra, Andrew Owens|arXiv (Cornell University)|2017. 10. 16.

Robot Manipulation and Learning참고 문헌 34인용 수 109

한 줄 요약

엔드-투-엔드 시각-촉각 딥 뉴럴 네트워크가 파지 결과를 예측하며, 촉각 센싱과 시각의 통합은 파지 결과 예측 및 실제 세계의 파지 성능을 크게 향상시킨다.

ABSTRACT

A successful grasp requires careful balancing of the contact forces. Deducing whether a particular grasp will be successful from indirect measurements, such as vision, is therefore quite challenging, and direct sensing of contacts through touch sensing provides an appealing avenue toward more successful and consistent robotic grasping. However, in order to fully evaluate the value of touch sensing for grasp outcome prediction, we must understand how touch sensing can influence outcome prediction accuracy when combined with other modalities. Doing so using conventional model-based techniques is exceptionally difficult. In this work, we investigate the question of whether touch sensing aids in predicting grasp outcomes within a multimodal sensing framework that combines vision and touch. To that end, we collected more than 9,000 grasping trials using a two-finger gripper equipped with GelSight high-resolution tactile sensors on each finger, and evaluated visuo-tactile deep neural network models to directly predict grasp outcomes from either modality individually, and from both modalities together. Our experimental results indicate that incorporating tactile readings substantially improve grasping performance.

연구 동기 및 목표

비전과 촉각을 결합하여 로봇 파지를 위한 다중 모달 인식을 활용하도록 동기를 부여한다.
촉각 센싱이 시각 단독보다 파지 결과 예측을 향상시키는지 평가한다.
시각 및 촉각 입력을 처리해 파지 성공을 예측하는 엔드-투-엔드 신경망을 개발한다.
결과 예측 및 실제 파지 성능에서 단일 모달과 다중 모달 모델을 정량적으로 비교한다.

제안 방법

두 손가락 GelSight 부착 그리퍼로 9,000건이 넘는 파지 실험을 수집한다.
RGB 및 GelSight 이미지로 파지 성공을 예측하는 엔드-투-엔드 CNN 모델을 학습한다.
네트워크의 후반부에서 시각 및 촉각 특징을 융합하여 완전 연결 분류기에 입력으로 사용한다.
비전은 파지 전과 중의 두 시점을 사용하고 촉각 입력으로 GelSight의 시간 차(I_Tb - I_Ta)을 사용한다.
시각 및 촉각 CNN을 ImageNet에서 사전 학습시키고 학습 중에 미세 조정한다.
교차 객체 분할로 모델을 평가하고 단일 모달 대 다중 모달 성능을 비교한다.

실험 결과

연구 질문

RQ1촉각 센싱이 시각 단독에 비해 파지 결과 예측을 향상시키는가?
RQ2시각-촉각 다중 모달 모델이 결과 예측에서 단일 모달 모델보다 성능이 우수한가?
RQ3보지 못한 물체에서의 실제 세계 파지 선택에서 시각-촉각 모델의 성능은 어떠한가?

주요 결과

촉각 모델이 파지 결과 예측에서 시각 모델을 능가한다.
다중 모달 시각-촉각 모델이 테스트 정확도 77.8±0.3%로 최상위를 달성했다.
시각 단독, 깊이, 단일 촉각 모델은 더 낮은 정확도를 보인다(예: 시각 단독 68.8±1.0%, 깊이 73.2±0.7%).
GelSight 양쪽 센서를 모두 사용할 경우 75.6±0.8% 정확도이며, GelSight L과 R은 각각 75.3±1.4%와 73.8±1.7%로 약간 다르게 나타난다.
손으로 설계된 Indentation 특징은 72.7±0.8%에 도달해 엔드-투-엔드 모델이 이점을 제공하지만 핸드크래프트 특징은 작은 데이터세트에서 경쟁적이다.
실제 파지에서 시각-촉각 모델은 보지 못한 물체에서 비전 단독보다 약 14퍼센트 포인트 높은 성공률(94% 대 80%)을 보인다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.