Tackling Speaking Mode Varieties in EMG-Based Speech Recognition
by ,
Abstract:
An electromyographic (EMG) silent speech recognizer is a system that recognizes speech by capturing the electric potentials of the human articulatory muscles, thus enabling the user to communicate silently. After having established a baseline EMG-based continuous speech recognizer, in this paper, we investigate speaking mode variations, i.e., discrepancies between audible and silent speech that deteriorate recognition accuracy. We introduce multimode systems that allow seamless switching between audible and silent speech, investigate different measures which quantify speaking mode differences, and present the spectral mapping algorithm, which improves the word error rate (WER) on silent speech by up to 14.3% relative. Our best average silent speech WER is 34.7%, and our best WER on audibly spoken speech is 16.8%.
Reference:
Tackling Speaking Mode Varieties in EMG-Based Speech Recognition (Michael Wand, Matthias Janke; Tanja Schultz), In IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, volume 61, 2014.
Bibtex Entry:
@article{wand2014tackling,
  volume={61},
  year={2014},
  title={Tackling Speaking Mode Varieties in EMG-Based Speech Recognition},
  number={10},
  journal={IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING},
  abstract={An electromyographic (EMG) silent speech recognizer is a system that recognizes speech by capturing the electric potentials of the human articulatory muscles, thus enabling the user to communicate silently. After having established a baseline EMG-based continuous speech recognizer, in this paper, we investigate speaking mode variations, i.e., discrepancies between audible and silent speech that deteriorate recognition accuracy. We introduce multimode systems that allow seamless switching between audible and silent speech, investigate different measures which quantify speaking mode differences, and present the spectral mapping algorithm, which improves the word error rate (WER) on silent speech by up to 14.3% relative. Our best average silent speech WER is 34.7%, and our best WER on audibly spoken speech is 16.8%.},
  author={Wand, Michael and Schultz, Matthias Janke; Tanja}
}