QUICK REVIEW

[論文レビュー] A Low-Cost Vision-Based Tactile Gripper with Pretraining Learning for Contact-Rich Manipulation

Yaohua Liu, Binkai Ou|arXiv (Cornell University)|Jan 31, 2026

Advanced Sensor and Energy Harvesting Materials被引用数 0

ひとこと要約

この論文は LVTG を提案する。低コストの視覚-触覚グリッパーで、モジュラー skin、CLIP に触発されたクロスモーダルプリトレーニング、ACT ベースのポリシーを備え、接触を要する操作の安定性・学習効率を向上させる。視覚のみのベースラインより grasp の安定性・耐久性・学習効率が改善されることを示す。

ABSTRACT

Robotic manipulation in contact-rich environments remains challenging, particularly when relying on conventional tactile sensors that suffer from limited sensing range, reliability, and cost-effectiveness. In this work, we present LVTG, a low-cost visuo-tactile gripper designed for stable, robust, and efficient physical interaction. Unlike existing visuo-tactile sensors, LVTG enables more effective and stable grasping of larger and heavier everyday objects, thanks to its enhanced tactile sensing area and greater opening angle. Its surface skin is made of highly wear-resistant material, significantly improving durability and extending operational lifespan. The integration of vision and tactile feedback allows LVTG to provide rich, high-fidelity sensory data, facilitating reliable perception during complex manipulation tasks. Furthermore, LVTG features a modular design that supports rapid maintenance and replacement. To effectively fuse vision and touch, We adopt a CLIP-inspired contrastive learning objective to align tactile embeddings with their corresponding visual observations, enabling a shared cross-modal representation space for visuo-tactile perception. This alignment improves the performance of an Action Chunking Transformer (ACT) policy in contact-rich manipulation, leading to more efficient data collection and more effective policy learning. Compared to the original ACT method, the proposed LVTG with pretraining achieves significantly higher success rates in manipulation tasks.

研究の動機と目的

robust manipulation のために拡張された感覚領域とモジュール交換可能性を備えた低コストの視覚-触覚グリッパーを開発する。
視覚と触覚を CLIP に着想を得た対照的な目的で統合し、触覚埋め込みと視覚埋め込みを整合させる。
事前学習済み触覚表現と ACT ポリシーを用いて、接触を要するタスクのデータ効率とポリシー学習を改善する。

提案手法

二指平行顎構成とモジュール式・交換可能な触覚スキンを搭載した LVTG の設計、指あたり約 $12 のコスト。
準備済みアクリルへ半透明シリコンを直接成形し、単一の耐摩耗表面を備える堅牢な光学触覚スキンの構築。
触覚信号の3 パス処理パイプライン：魚眼歪み補正、ROI抽出、照明/コントラスト強調。
触覚埋め込みを視覚観測と整合させる CLIP に着想を得た対照的学習を、共有バックボーンとメモリーバンク負サンプリング戦略を用いて実施。
5000 の視覚-触覚軌跡で触覚エンコーダを事前学習し、その後、融合視覚-触覚特徴を ingest する Action Chunking Transformer (ACT) を用いてポリシー学習を実施。

実験結果

リサーチクエスチョン

RQ1LVTG は既存の視覚-触覚センサーと比べて把持安定性と信頼性を改善するか？
RQ2長期使用において LVTG は耐久性が高く、交換が容易か？
RQ3触覚フィードバックは接触を要する操作のポリシー学習を改善するか，またクロスモーダル学習は性能にどう影響するか？

主な発見

Vision-based Tactile Sensor	Grasping Wine Bottle	Grasping Plate	USB Insertion and Removal	Average Scores
GelSlim	85	81	76	81
DIGIT	80	73	75	76
LVTG	92	89	73	85

LVTG は大きな接触領域を要する物体（例：ワインボトル）を対象とした把持成功率で高い値を示し、ワインボトル把持で平均 92%、皿把持で 89%、USB プラグ作業で 85% を達成。
耐久性テストでは LVTG が 9Dtact の寿命の 2 倍以上長く持ち、モジュール設計により<30 秒で迅速交換。
ポリシー実験では、触覚入力付きの ACT（+Pretraining）で、視覚のみのベースラインよりタスク全体で平均成功率が高く（55-63% 対 29-31%、事前学習によりさらに向上）。
LVTG のより大きな sensing area（80x30 mm、2400 mm^2）と単一スキンは、モノリシック設計または壊れやすいゲル設計と比べて安定性と耐久性を向上。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。