- Tactile, Acoustic, Peripheral Vision at the fingertip

Close the last-inch gap in dexterous manipulation with a tactile-diffusion policy.

Jialiang (Alan) Zhao , Naveen Kuppuswamy , Siyuan Feng , Benjamin Burchfiel , and Edward H. Adelson

MIT CSAIL and Toyota Research Institute

Nominated for the Best Paper Award at ICRA 2025

Crushing the Last-Inch problem with a multi-modal finger and transformer

Paired with a diffusion policy, PolyTouch significantly improves performance for numerous manipulation tasks that are not specially tailored for tactile sensing.

The uncut / uneditted video on the right shows a continuous evaluation of a tactile-diffusion policy trained with and without PolyTouch. The two policies were trained with the exact same data and architecture, except that the no-tactile policy's tactile observations were masked with zeros.

3 modalities in 1 cable

with one camera and one contact microphone, captures:

Tactile sensing   see the detailed texture just like GelSight, DIGIT, DenseTact etc.

Peripheral vision see exactly what's grasped and the surroundings

Acoustic sensing  hear the making and breaking of contacts

One repurposed HDMI cable provides power input and data output. On the other side all data is processed by a Raspberry Pi with an easy-to-use interface for real-time streaming or saving.

At least 20x lifespan

GelSight Inc. sensors last only 1-3 hrs in our durability test which emulates a continous tool-using environment. lasts 35 hrs without significant wear and tear.

This lifespan increase was achieved thanks to a novel yet very accessible elastomer (3M VHB tape) and protective layer (3M Nexcare), which eliminates delamination and reduces wear and tear.

Easy to make. No molding required.

- all but one material (aluminum flake which serves as a reflective paint) can be ordered and delivered next day from Amazon.com, or your local Walmart.

- using 3M VHB tape as the elastomer eliminates the traditional molding process of silicone, which requires specialized equipment and experience.

The video below shows the creation of 's replaceable gel.

Abstract

Achieving robust dexterous manipulation in unstructured domestic environments remains a significant challenge in robotics. Even with state-of-the-art robot learning methods, haptic-oblivious control strategies (i.e. those relying only on external vision and/or proprioception) often fall short due to occlusions, visual complexities, and the need for precise contact interaction control. To address these limitations, we introduce , a novel robot finger that integrates camera-based tactile sensing, acoustic sensing, and peripheral visual sensing into a single design that is compact and durable. provides high-resolution tactile feedback across multiple temporal scales, which is essential for efficiently learning complex manipulation tasks. Experiments demonstrate an at least 20-fold increase in lifespan over commercial tactile sensors, with a design that is both easy to manufacture and scalable. We then use this multi-modal tactile feedback along with visuo-proprioceptive observations to synthesize a tactile-diffusion policy from human demonstrations; the resulting contact-aware control policy significantly outperforms haptic-oblivious policies in multiple contact-aware manipulation policies. This paper highlights how effectively integrating multi-modal contact sensing can hasten the development of effective contact-aware manipulation policies, paving the way for more reliable and versatile domestic robots.

More Details

Check more details about from our paper or desktop website.


    @misc{zhao2025polytouchrobustmultimodaltactile,
        title={PolyTouch: A Robust Multi-Modal Tactile Sensor for Contact-rich Manipulation Using Tactile-Diffusion Policies}, 
        author={Jialiang Zhao and Naveen Kuppuswamy and Siyuan Feng and Benjamin Burchfiel and Edward Adelson},
        year={2025},
        eprint={2504.19341},
        archivePrefix={arXiv},
        primaryClass={cs.RO},
        url={https://arxiv.org/abs/2504.19341}, 
  }