arxiv:2409.00391

Density Adaptive Attention-based Speech Network: Enhancing Feature Understanding for Mental Health Disorders

Published on Aug 31

· Submitted by

amanchadha on Sep 4

Upvote

Authors:

Aman Chadha ,

Aaron Elkins

Abstract

Speech-based depression detection poses significant challenges for automated detection due to its unique manifestation across individuals and data scarcity. Addressing these challenges, we introduce DAAMAudioCNNLSTM and DAAMAudioTransformer, two parameter efficient and explainable models for audio feature extraction and depression detection. DAAMAudioCNNLSTM features a novel CNN-LSTM framework with multi-head Density Adaptive Attention Mechanism (DAAM), focusing dynamically on informative speech segments. DAAMAudioTransformer, leveraging a transformer encoder in place of the CNN-LSTM architecture, incorporates the same DAAM module for enhanced attention and interpretability. These approaches not only enhance detection robustness and interpretability but also achieve state-of-the-art performance: DAAMAudioCNNLSTM with an F1 macro score of 0.702 and DAAMAudioTransformer with an F1 macro score of 0.72 on the DAIC-WOZ dataset, without reliance on supplementary information such as vowel positions and speaker information during training/validation as in previous approaches. Both models' significant explainability and efficiency in leveraging speech signals for depression detection represent a leap towards more reliable, clinically useful diagnostic tools, promising advancements in speech and mental health care. To foster further research in this domain, we make our code publicly available.

View arXiv page View PDF Add to collection

Community

amanchadha

Paper author Paper submitter Sep 4

The paper introduces two novel, efficient, and explainable models—DAAMAudioCNNLSTM and DAAMAudioTransformer—using a Density Adaptive Attention Mechanism for enhancing speech-based depression detection.
Model Development: Introduces DAAMAudioCNNLSTM and DAAMAudioTransformer, which integrate a Density Adaptive Attention Mechanism to dynamically focus on informative speech segments for depression detection.
Performance: Both models achieve state-of-the-art performance on the DAIC-WOZ dataset, with F1 macro scores of 0.702 and 0.72, respectively, without relying on supplementary information such as vowel positions or speaker data.
Clinical Applicability: These models enhance detection robustness, interpretability, and clinical relevance, making them valuable tools for automated depression diagnostics in mental health care.

nielsr

Sep 9

Hi,

Congrats on this work. Are you planning to make artifacts available? If yes, would be great to share them on the hub, with appropriate tags so that people find them at https://huggingface.co/models?pipeline_tag=automatic-speech-recognition or at https://huggingface.co/models?pipeline_tag=audio-classification.

librarian-bot

Sep 5

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2409.00391 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2409.00391 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2409.00391 in a Space README.md to link it from this page.