Towards Silent Paralinguistics: Deriving Speaking Mode and Speaker ID from Electromyographic Signals

Diener, Lorenz; Amiriparian, Shahin; Botelho, Catarina; Scheck, Kevin; Küster, Dennis; Trancoso, Isabel Schuller  Björn W.; Schultz, Tanja

by Lorenz Diener, Shahin Amiriparian, Catarina Botelho, Kevin Scheck, Dennis Küster, Isabel Schuller Björn W. Trancoso, Tanja Schultz

Abstract:

Silent Computational Paralinguistics (SCP) - the assessment of speaker states and traits from non-audibly spoken communication - has rarely been targeted in the rich body of either Computational Paralinguistics or Silent Speech Processing. Here, we provide first steps towards this challenging but potentially highly rewarding endeavour: Paralinguistics can enrich spoken language interfaces, while Silent Speech Processing enables confidential and unobtrusive spoken communication for everybody, including mute speakers. We approach SCP by using speech-related biosignals stemming from facial muscle activities captured by surface electromyography (EMG). To demonstrate the feasibility of SCP, we select one speaker trait (speaker identity) and one speaker state (speaking mode). We introduce two promising strategies for SCP: (1) deriving paralinguistic speaker information directly from EMG of silently produced speech versus (2) first converting EMG into an audible speech signal followed by conventional computational paralinguistic methods. We compare traditional feature extraction and decision making approaches to more recent deep representation and transfer learning by convolutional and recurrent neural networks, using openly available EMG data. We find that paralinguistics can be assessed not only from acoustic speech but also from silent speech captured by EMG.

Download PDF

PDF URL: https://www.csl.uni-bremen.de/cms/images/documents/publications/Diener_IS2020_SilentCompPara.pdf

Reference:

Towards Silent Paralinguistics: Deriving Speaking Mode and Speaker ID from Electromyographic Signals (Lorenz Diener, Shahin Amiriparian, Catarina Botelho, Kevin Scheck, Dennis Küster, Isabel Schuller Björn W. Trancoso, Tanja Schultz), In INTERSPEECH 2020 - 21st Annual Conference of the International Speech Communication Association, 2020 (to appear).

Bibtex Entry:

@inproceedings{diener2020towards,
    title={Towards Silent Paralinguistics: Deriving Speaking Mode and Speaker ID from Electromyographic Signals},
    author={Diener, Lorenz and Amiriparian, Shahin and Botelho, Catarina and Scheck, Kevin and Küster, Dennis and Trancoso, Isabel Schuller, Björn W. and Schultz, Tanja},
    booktitle={{INTERSPEECH} 2020 - 21st Annual Conference of the International Speech Communication Association},
    year={2020 (to appear)},
    url={https://www.csl.uni-bremen.de/cms/images/documents/publications/Diener_IS2020_SilentCompPara.pdf},
    abstract={Silent Computational Paralinguistics (SCP) - the assessment of speaker states and traits from non-audibly spoken communication - has rarely been targeted in the rich body of either Computational Paralinguistics or Silent Speech Processing. Here, we provide first steps towards this challenging but potentially highly rewarding endeavour: Paralinguistics can enrich spoken language interfaces, while Silent Speech Processing enables confidential and unobtrusive spoken communication for everybody, including mute speakers. We approach SCP by using speech-related biosignals stemming from facial muscle activities captured by surface electromyography (EMG). To demonstrate the feasibility of SCP, we select one speaker trait (speaker identity) and one speaker state (speaking mode). We introduce two promising strategies for SCP: (1) deriving paralinguistic speaker information directly from EMG of silently produced speech versus (2) first converting EMG into an audible speech signal followed by conventional computational paralinguistic methods. We compare traditional feature extraction and decision making approaches to more recent deep representation and transfer learning by convolutional and recurrent neural networks, using openly available EMG data. We find that paralinguistics can be assessed not only from acoustic speech but also from silent speech captured by EMG.},
}