Improving Unit Selection based EMG-to-Speech Conversion
by
Abstract:
This master’s thesis introduces a new approach to improve the unit-selection based conversion of facial myoelectric signals to audible speech. Surface electromyography is the recording of electric signals generated by muscle activity using surface electrodes attached to the skin. Past work has shown that it is feasible to generate audible speech signals from facial electromyographic activity generated during speech production, using several different approaches. This work focuses on the unit-selection approach to conversion, where the speech signal is reconstructed by concatenating pieces of target audio data selected by a similarity criterion calculated on the parallel sequence of source electromyographic data. A novel approach, based on optimizing the database that units are selected from by using unit clustering to generate more prototypical units and improve the selection process, is introduced and evaluated. In total, we obtain a qualitative improvement of up to 14.92 percent relative over a baseline unit selection system, while improving the time taken for conversion by up to 98%.
Reference:
Improving Unit Selection based EMG-to-Speech Conversion (Lorenz Diener), Master's thesis, Karlsruher Institut für Technologie, 2015. Supervisors: Matthias Janke, Tanja Schultz
Bibtex Entry:
@mastersthesis{diener2015improving,
  url={https://www.csl.uni-bremen.de/cms/images/documents/publications/diener2015improving.pdf},
  school={Karlsruher Institut für Technologie},
  title={Improving Unit Selection based EMG-to-Speech Conversion},
  year={2015},
  supervisor={Janke, Matthias and Schultz, Tanja},
  author={Diener, Lorenz},
  abstract={This master’s thesis introduces a new approach to improve the unit-selection based conversion of facial myoelectric signals to audible speech. Surface electromyography is the recording of electric signals generated by muscle activity using surface electrodes attached to the skin. Past work has shown that it is feasible to generate audible speech signals from facial electromyographic activity generated during speech production, using several different approaches. This work focuses on the unit-selection approach to conversion, where the speech signal is reconstructed by concatenating pieces of target audio data selected by a similarity criterion calculated on the parallel sequence of source electromyographic data. A novel approach, based on optimizing the database that units are selected from by using unit clustering to generate more prototypical units and improve the selection process, is introduced and evaluated. In total, we obtain a qualitative improvement of up to 14.92 percent relative over a baseline unit selection system, while improving the time taken for conversion by up to 98%.}
}