Automatic Speech Recognition for ILSE-Interviews: Longitudinal Conversational Speech Recordings covering Aging and Cognitive Decline
by , ,
Abstract:
The \textitInterdisciplinary Longitudinal Study on Adult Development and Aging (ILSE) was initiated with the aim to investigate satisfying and healthy aging. Over 20 years, about 4200 hours of biographic interviews from more than 1,000 participants were recorded. Spoken language is a strong indicator for declining cognitive resources, as it is affected in early stage. Hence, various research topics related to aging like dementia, could be analyzed based on data such as the ILSE interviews. The analysis of language capabilities requires transcribed speech. Since manual transcriptions are time and cost consuming, we aim to automatically transcribing the ILSE data using Automatic Speech Recognition (ASR). The recognition of ILSE interviews is very demanding due to the combination of various challenges: 20 year old analog two-speaker one-channel recordings of low signal quality, emotional and personal interviews between doctor and participant, and repeated recordings of aging, partly fragile individuals. In this study, we describe ongoing work to develop hybrid Hidden Markov Model (HMM)- Deep Neural Network (DNN) based ASR system for the ILSE corpus. So far, the best ASR system is obtained by second-pass decoding of a hybrid HMM-DNN model using recurrent neural network based language models with a word error rate of $50.39$%. %Interdisciplinary Longitudinal Study on Adult Development(ILSE)wasinitiatedwiththeaimtoinvestigatesatisfyingandhealthyaging.ILSEcontainsover4200hoursofbiographicinterviewswithmorethan1
Reference:
Automatic Speech Recognition for ILSE-Interviews: Longitudinal Conversational Speech Recordings covering Aging and Cognitive Decline (Ayimunishagu Abulimiti, Jochen Weiner, Tanja Schultz), In Proc. Interspeech 2020, 2020.
Bibtex Entry:
@article{abulimiti2020automatic,
  title={Automatic Speech Recognition for ILSE-Interviews: Longitudinal Conversational Speech Recordings covering Aging and Cognitive Decline},
  author={Abulimiti, Ayimunishagu and Weiner, Jochen and Schultz, Tanja},
  journal={Proc. Interspeech 2020},
  pages={3795--3799},
  year={2020},
  url = {https://www.csl.uni-bremen.de/cms/images/documents/publications/ay_interspeech2020.pdf},
  abstract ={The \textit{Interdisciplinary Longitudinal Study on Adult Development and Aging} (ILSE) was initiated with the aim to investigate satisfying and healthy aging. Over 20 years, about 4200 hours of biographic interviews from more than 1,000 participants were recorded. Spoken language is a strong indicator for declining cognitive resources, as it is affected in early stage. Hence, various research topics related to aging like dementia, could be analyzed based on data such as the ILSE interviews. The analysis of language capabilities requires transcribed speech. Since manual transcriptions are time and cost consuming, we aim to automatically transcribing the ILSE data using Automatic Speech Recognition (ASR). The recognition of ILSE interviews is very demanding due to the combination of various challenges: 20 year old analog two-speaker one-channel recordings of low signal quality, emotional and personal interviews between doctor and participant, and repeated recordings of aging, partly fragile individuals.  In this study, we describe ongoing work to develop hybrid Hidden Markov Model (HMM)- Deep Neural Network (DNN) based ASR system for the ILSE corpus. So far, the best ASR system is obtained by second-pass decoding of a hybrid HMM-DNN model using recurrent neural network based language models with a word error rate of $50.39$\%.
 %Interdisciplinary Longitudinal Study on Adult Development} (ILSE) was initiated with the aim to investigate satisfying and healthy aging. ILSE contains over 4200 hours of biographic interviews with more than 1,000 participants.
 }