QUICK REVIEW

[논문 리뷰] A Low-Cost Vision-Based Tactile Gripper with Pretraining Learning for Contact-Rich Manipulation

Yaohua Liu, Binkai Ou|arXiv (Cornell University)|2026. 01. 31.

Advanced Sensor and Energy Harvesting Materials인용 수 0

한 줄 요약

본 논문은 모듈식 피부를 갖춘 저비용 시각-촉각 그리퍼 LVTG를 제시하고, CLIP에서 영감을 받은 교차 모달 사전학습과 접촉이 많은 조작을 향상시키는 ACT 기반 정책을 도입한다. 시각 기반 기초대비 향상된 그립 안정성, 내구성 및 학습 효율성을 보여준다.

ABSTRACT

Robotic manipulation in contact-rich environments remains challenging, particularly when relying on conventional tactile sensors that suffer from limited sensing range, reliability, and cost-effectiveness. In this work, we present LVTG, a low-cost visuo-tactile gripper designed for stable, robust, and efficient physical interaction. Unlike existing visuo-tactile sensors, LVTG enables more effective and stable grasping of larger and heavier everyday objects, thanks to its enhanced tactile sensing area and greater opening angle. Its surface skin is made of highly wear-resistant material, significantly improving durability and extending operational lifespan. The integration of vision and tactile feedback allows LVTG to provide rich, high-fidelity sensory data, facilitating reliable perception during complex manipulation tasks. Furthermore, LVTG features a modular design that supports rapid maintenance and replacement. To effectively fuse vision and touch, We adopt a CLIP-inspired contrastive learning objective to align tactile embeddings with their corresponding visual observations, enabling a shared cross-modal representation space for visuo-tactile perception. This alignment improves the performance of an Action Chunking Transformer (ACT) policy in contact-rich manipulation, leading to more efficient data collection and more effective policy learning. Compared to the original ACT method, the proposed LVTG with pretraining achieves significantly higher success rates in manipulation tasks.

연구 동기 및 목표

혹독한 환경에서의 견고한 조작을 위한 확장된 감지 영역과 모듈식 교체 가능성을 갖춘 저비용 시각-촉각 그리퍼를 개발한다.
CLIP에서 영감을 받은 대조 목적을 통해 촉각 임베딩과 시각 임베딩을 일치시키기 위해 시각과 촉각을 융합한다.
ACT 정책을 갖춘 사전 학습 촉각 표현을 활용하여 접촉이 많은 작업의 데이터 효율성과 정책 학습을 향상시킨다.

제안 방법

대략 손가락당 $12의 비용으로 모듈식 교체 가능한 촉각 피부를 갖춘 두 손가락 평행조의 LVTG 설계.
프라이밍된 아크릴 위에 반투명 실리콘을 직접 성형하여 단일하고 내마모성 표면을 갖춘 강건한 광학 촉각 피부를 구성.
촉각 신호를 위한 3패스 촉각 이미지 처리 파이프라인: 피쉬아이 왜곡 보정, ROI 추출, 조도/대비 향상.
공유 백본과 메모리 뱅크 음수 샘플링 전략을 사용하여 촉각 임베딩과 시각 관찰을 일치시키기 위한 CLIP에서 영감을 받은 대조 학습.
5000개의 시각-촉각 트라이저토리를 사용한 촉각 인코더의 사전학습과, 시각-촉각 융합 특징을 입력하는 Action Chunking Transformer (ACT)를 이용한 정책 학습.

실험 결과

연구 질문

RQ1기존의 시각-촉각 센서와 비교하여 LVTG가 그립의 안정성과 신뢰성을 향상시키는가?
RQ2장기 사용에서 LVTG가 내구성이 있고 교체가 용이한가?
RQ3촉각 피드백이 접촉이 많은 조작에서 정책 학습을 향상시키는가, 그리고 교차 모달 학습이 성능에 어떤 영향을 미치는가?

주요 결과

시각 기반 촉각 센서	와인 병 집기	접시 집기	USB 삽입 및 제거	평균 점수
GelSlim	85	81	76	81
DIGIT	80	73	75	76
LVTG	92	89	73	85

LVTG는 큰 접촉 면적과 무게를 요구하는 물체에서 더 높은 그립 성공률을 달성하며, 와인 병 그립의 평균 92%, 접시 그립 89%, USB 플러그 작업 85%를 기록.
내구성 테스트에서 LVTG는 9Dtact의 수명보다 2배 이상 지속되며 모듈식 설계로 <30초 이내의 빠른 교체가 가능하다.
정책 실험에서 촉각 입력(+사전학습)가 포함된 ACT가 비전 기반 기초대비 더 높은 평균 성공률(55-63%)을 달성했고, 사전학습이 결과를 더욱 향상시켰다.
LVTG의 더 큰 감지 영역(80x30 mm, 2400 mm^2)과 단일 피부는 단일형 또는 연약한 젤 설계에 비해 안정성과 내구성을 향상시킨다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.