Back
Technology

Wearable AI System Enables Silent Speech Communication in Noisy Environments

View source

Silent Speech Interface Breakthrough: Wearable System Decodes Throat Muscle Movements with High Accuracy

Researchers led by Sung-Min Park have unveiled a groundbreaking soft, wearable silent speech interface designed to overcome communication barriers in high-noise environments. This innovative system uses a Computer Vision-Based Optical Strain (CVOS) sensor, integrated into a neck choker, to precisely detect throat muscle deformations associated with speech. Processed by an advanced AI pipeline, the device achieved an impressive 85.8% recognition accuracy for the NATO phonetic alphabet in validation tests, demonstrating reliable performance even amidst significant acoustic interference.

The system offers a robust solution for critical communication in settings where conventional methods fail due to extreme noise.

System Overview

Conventional communication methods frequently falter in high-noise environments such, as industrial sites, military operations, and emergency scenarios, primarily due to overwhelming acoustic interference. Existing Silent Speech Interfaces (SSIs) that rely on electroencephalography (EEG), surface electromyography (sEMG), or single-axis strain sensors, face inherent challenges including invasiveness, poor reusability, and limited capture of the intricate muscle movements involved in speech. To address these limitations, the research team developed a soft, wearable system that adeptly combines multiaxial strain mapping with sophisticated AI processing.

CVOS Sensor Technology

At the heart of this system lies a Computer Vision-Based Optical Strain (CVOS) sensor, cleverly integrated into a neck choker. The sensor utilizes a soft silicone (Ecoflex) substrate featuring high-contrast black micromarkers on a white background. This is paired with a compact camera, microscope lens, and an LED light source. This meticulous design enables the sensitive detection of both the magnitude and direction of throat muscle deformations, which present as multiaxial strain patterns fundamental to speech.

Key characteristics of the CVOS sensor include:

  • Captures 2D strain maps, critically preserving directional information for distinguishing complex speech patterns, a significant advantage over single-axis sensors.
  • Exhibits an impressive gauge factor of 3,625, exceptionally low hysteresis (<0.65%), and high linearity (>0.99), allowing it to detect minute strains as small as 0.02%.
  • Demonstrates robust resistance to environmental noise and degradation, maintaining consistent performance across different devices (with a mean absolute percentage error of 2.8%) and enduring over 10,000 loading-unloading cycles.
  • Remains unaffected by environmental noise up to 90 dB, a level comparable to the intense noise of a construction site.
  • The sensor's signal-to-noise ratio of 34 dB notably exceeded that of commercial sEMG systems.

AI Processing

The sensor's rich data is channeled through an advanced AI-driven pipeline specifically engineered for silent speech decoding:

  • It automatically compensates for initial residual stress from device attachment, effectively eliminating baseline drift and ensuring accurate readings.
  • The system synergistically combines convolutional neural networks (CNNs) for precise spatial feature extraction with transformers for comprehensive temporal pattern analysis, thereby capturing both localized muscle deformations and global speech dynamics.
  • The model size was significantly reduced from 12.4 MB to a mere 3.6 MB via knowledge distillation, enabling rapid real-time inference (0.003 seconds per sample) on compact edge devices, such as the Raspberry Pi 5.
  • It possesses the capability to reconstruct a speaker's unique voice using voice recordings as short as 10 minutes.

The system primarily focuses on recognizing the 26 words of the NATO phonetic alphabet.

Validation and Performance

The system underwent rigorous validation in both laboratory settings and challenging noisy real-world environments:

  • It achieved an outstanding 85.8% recognition accuracy for the 26 NATO phonetic words, with 82% accuracy remarkably retained even in the lightweight model.
  • Utilizing Low-Rank Adaptation (LoRA) fine-tuning, the system attained an 80% accuracy with just 20 samples per class from new users, significantly outperforming traditional fine-tuning methods (76%).
  • Reliable performance was consistently maintained in 90-dB white noise and even during gas blowback rifle firing, which introduces irregular noise and mechanical vibrations, successfully transmitting words in real time.
  • Performance remained consistent across varying device tightness levels and vocal intensities, with the highest accuracy (100%) observed at moderate tightness and vocal effort.
  • A compelling demonstration showcased the system functioning flawlessly while a user fired a rifle, with the decoded speech wirelessly transmitted and synthesized into clear audio.

Potential Applications

The CVOS-based SSI is strategically designed to address critical communication needs in several high-stakes environments:

  • Facilitating seamless communication between workers in loud factories or construction sites.
  • Offering an invaluable alternative communication tool for patients with laryngectomy or severe voice disorders.

Future Development

Future research will focus on several key areas, including expanding the vocabulary beyond the NATO alphabet, enhancing motion artifact resistance (for example, through integrated inertial measurement units), and optimizing the device's ergonomics for comfortable long-term wear. The team also plans to validate the system with larger and more diverse user cohorts to ensure broad applicability.

Authorship and Publication

The paper, "Soft Multiaxial Strain Mapping Interface with AI-Driven Decoding for Silent Speech in Noise," was authored by Sunguk Hong, Junyoung Yoo, and Sung-Min Park. It was officially published in the journal Cyborg and Bionic Systems on March 23, 2026, with the DOI: 10.34133/cbsystems.0536. The extensive work received vital support from the National Research Foundation of Korea (NRF) through various programs funded by the Ministry of Education, Korea, and the Ministry of Science and ICT (MSIT), Korea.