EEG-based Decoding of Auditory Attention using a Deep Attention Network: Revealing neural commonalities of selective attention across individuals
by
Abstract:
Introduction: Focusing on specific sound sources in cluttered environments is crucial for daily communication. However, this ability poses a great challenge for persons that are dependent on hearing aids, as the devices do not possess information about which audio sources are interesting to the user. To solve this problem, approaches from the field of auditory attention detection (AAD) are trying to develop cognitive models of auditory selective attention using electroencephalography (EEG). Here, subject-independence (SI) is useful for EEG-applications in auditory attention detection because it eliminates the need for pretraining on specific individuals, making the model more flexible and adaptable to a wider range of users. This allows the model to be applied to new persons without the need for additional data collection and training, making it more efficient for practical applications. Such models could expand our understanding of the cognitive processes involved in selective attention. Further, an integration into hearing aids in future applications would allow individuals with hearing impairments to regain a level of normalcy in their daily activities. Materials, Methods and Results: This study aims to investigate subject-independent auditory attention decoding using electroencephalography (EEG) and Deep Neural Networks (DNN). The EEG data set in this work is publicly available and widely used in the AAD community [4.]. Participants were presented with two simultaneous but spatially separated speech stimuli, with the instruction to focus on one of the speech streams while their EEG signals were recorded with 64 channels. The decoding task is a binary classification of the attended speaker in a given time window. To achieve this, the data was preprocessed and analyzed using a Deep Attention Network [1.], which is designed to be a lightweight and efficient architecture to process raw windows of EEG signals. The network uses spatial and temporal attention modules to extract EEGchannel interactions and temporal dynamics at different frequencies. The EEG-data was lightly processed by common-average referencing and filtering the signal between 1-32 Hz, followed by segmenting in 1 second non-overlapping windows for each of the 16 participants. The network was trained to classify the attention states of the participants based on the EEG data in a leave-one-subject-out cross-validation. The results show an accuracy of 72% (STD: 11%) over all 16 participants with all but 1 participant significantly outperforming the baseline of 50%. Excluding the 6 subjects below 70% as a threshold of practical performance, the remaining 10 subjects average an 80% accuracy (STD: 6%). The extraction of the spatial maps of the network allows an insight into the importance of each channel for the classification model. The averaged electrode weights for participants reveal strongly localized activations in the prefrontal and temporal lobes (AF7, AFz, AF8, T7, T8) with an average standard deviation of 5% of the mean between the participants, for all channels. Discussion: The attention network significantly outperforms former DNN approaches for Subjectindependent auditory attention (p <.01) [3.] by an absolute of 7% (accounting for all participants) and has a lower variance between participants. While the spatial weights only reflect a part of the DNN-models, they imply a shared neural processing between the individuals in the prefrontal and temporal lobes. These areas are known to play a crucial role in speech tracking during selective listening [2.], and are likewise used by the neural network to discriminate between the different audio streams. Significance: The attention network reaches state-of-the-art performance for subject-independent auditory attention decoding with lower variability and fewer parameters while allowing an intuitive visualization of modules of the model. The ability to decode auditory attention in a subjectindependent manner is crucial for the development of cognitive models that can be applied to a wide range of individuals, including those with hearing impairments.
Reference:
EEG-based Decoding of Auditory Attention using a Deep Attention Network: Revealing neural commonalities of selective attention across individuals (), In 10th International Brain-Computer-Interfaces Meeting, 2023.
Bibtex Entry:
@INPROCEEDINGS{IvucicBCI23,
  author={{}},
  title={{EEG-based Decoding of Auditory Attention using a Deep Attention Network: Revealing neural commonalities of selective attention across individuals}},
  year=2023,
  booktitle={10th International Brain-Computer-Interfaces Meeting},
  pages={},
  url={https://www.csl.uni-bremen.de/cms/images/documents/publications/Abstract.Gabriel.Ivucic_BCI2023.pdf},
  abstract={Introduction: Focusing on specific sound sources in cluttered environments is crucial for daily
communication. However, this ability poses a great challenge for persons that are dependent on hearing aids,
as the devices do not possess information about which audio sources are interesting to the user. To solve
this problem, approaches from the field of auditory attention detection (AAD) are trying to develop cognitive
models of auditory selective attention using electroencephalography (EEG). Here, subject-independence (SI)
is useful for EEG-applications in auditory attention detection because it eliminates the need for pretraining on
specific individuals, making the model more flexible and adaptable to a wider range of users. This allows the
model to be applied to new persons without the need for additional data collection and training, making it
more efficient for practical applications. Such models could expand our understanding of the cognitive
processes involved in selective attention. Further, an integration into hearing aids in future applications would
allow individuals with hearing impairments to regain a level of normalcy in their daily activities.
Materials, Methods and Results: This study aims to investigate subject-independent auditory attention
decoding using electroencephalography (EEG) and Deep Neural Networks (DNN). The EEG data set in this
work is publicly available and widely used in the AAD community [4.]. Participants were presented with two
simultaneous but spatially separated speech stimuli, with the instruction to focus on one of the speech
streams while their EEG signals were recorded with 64 channels. The decoding task is a binary classification
of the attended speaker in a given time window. To achieve this, the data was preprocessed and analyzed
using a Deep Attention Network [1.], which is designed to be a lightweight and efficient architecture to process
raw windows of EEG signals. The network uses spatial and temporal attention modules to extract EEGchannel interactions and temporal dynamics at different frequencies. The EEG-data was lightly processed by
common-average referencing and filtering the signal between 1-32 Hz, followed by segmenting in 1 second
non-overlapping windows for each of the 16 participants. The network was trained to classify the attention
states of the participants based on the EEG data in a leave-one-subject-out cross-validation. The results show
an accuracy of 72% (STD: 11%) over all 16 participants with all but 1 participant significantly outperforming
the baseline of 50%. Excluding the 6 subjects below 70% as a threshold of practical performance, the
remaining 10 subjects average an 80% accuracy (STD: 6%). The extraction of the spatial maps of the network
allows an insight into the importance of each channel for the classification model. The averaged electrode
weights for participants reveal strongly localized activations in the prefrontal and temporal lobes (AF7, AFz,
AF8, T7, T8) with an average standard deviation of 5% of the mean between the participants, for all channels.
Discussion: The attention network significantly outperforms former DNN approaches for Subjectindependent auditory attention (p <.01) [3.] by an absolute of 7% (accounting for all participants) and has a
lower variance between participants. While the spatial weights only reflect a part of the DNN-models, they
imply a shared neural processing between the individuals in the prefrontal and temporal lobes. These areas
are known to play a crucial role in speech tracking during selective listening [2.], and are likewise used by the
neural network to discriminate between the different audio streams.
Significance: The attention network reaches state-of-the-art performance for subject-independent
auditory attention decoding with lower variability and fewer parameters while allowing an intuitive
visualization of modules of the model. The ability to decode auditory attention in a subjectindependent manner is crucial for the development of cognitive models that can be applied to a wide range
of individuals, including those with hearing impairments.}
}