Investigating the Effect of Audio Duration on Dementia Detection using Acoustic Features
by , , ,
Abstract:
This paper presents recent progress toward our goal to enable area-wide pre-screening methods for the early detection of dementia based on automatically processing conversational speech of a representative group of more than 200 subjects. We focus on conversational speech since it is the natural form of communication that can be recorded unobtrusively, without adding stress to subjects, and without the need of controlled clinical settings. We describe our unsupervised process chain consisting of voice activity detection and speaker diarization followed by extraction of features and detection of early signs of dementia. The unsupervised system achieves up to 0.645 unweighted average recall (UAR) and compares favorably to a system that was carefully designed on manually annotated data. To further lower the burden for subjects, we investigate UAR over speech duration, and find that about 12 minutes of interview are sufficient to achieve the best UAR.
Reference:
Investigating the Effect of Audio Duration on Dementia Detection using Acoustic Features (Jochen Weiner, Miguel Angrick, Srinivasan Umesh, Tanja Schultz), In INTERSPEECH 2018 – 19th Annual Conference of the International Speech Communication Association, 2018.
Bibtex Entry:
@inproceedings{weiner2018investigating,
  title={{Investigating the Effect of Audio Duration on Dementia Detection using Acoustic Features}},
  author={Jochen Weiner and Miguel Angrick and Srinivasan Umesh and Tanja Schultz},
  booktitle={{INTERSPEECH} 2018 -- 19th Annual Conference of the International Speech Communication Association},
  year={2018},
  abstract={This paper presents recent progress toward our goal to enable area-wide pre-screening methods for the early detection of dementia based on automatically processing conversational speech of a representative group of more than 200 subjects. We focus on conversational speech since it is the natural form of communication that can be recorded unobtrusively, without adding stress to subjects, and without the need of controlled clinical settings. We describe our unsupervised process chain consisting of voice activity detection and speaker diarization followed by extraction of features and detection of early signs of dementia. The unsupervised system achieves up to 0.645 unweighted average recall (UAR) and compares favorably to a system that was carefully designed on manually annotated data. To further lower the burden for subjects, we investigate UAR over speech duration, and find that about 12 minutes of interview are sufficient to achieve the best UAR.},
  url={https://www.csl.uni-bremen.de/cms/images/documents/publications/Interspeech2018_WeinerEtAl.pdf},
}