Recognition and Analysis of Individual and Collective Human Behaviors by Combining Image Information and Speech Signal

ZINEDDINE, SARHANI KAHHOUL (2025) Recognition and Analysis of Individual and Collective Human Behaviors by Combining Image Information and Speech Signal. Doctoral thesis, Faculté des Sciences et de la technologie.

[img] Text
PhD_Thesis_KAHHOUL Zineddine Sarhani.pdf

Download (7MB)

Abstract

The automated recognition of human emotion is a cornerstone of modern affective comput�ing, yet progress is often hindered by the limitations of unimodal analysis, the performance gap on real-world data, and a critical lack of resources for under-resourced languages. This thesis presents a comprehensive framework to address these challenges, with a deep focus on advancing the state-of-the-art in Automatic Speech Emotion Recognition (ASER). The research makes three primary contributions. First, an efficient and lightweight archi�tecture, the CBAM-DenseNet121, is proposed to resolve the trade-off between accuracy and computational complexity. By integrating an attention mechanism with a dense convolutional network, this model achieves highly competitive performance on the benchmark CREMA-D dataset while utilizing substantially fewer parameters than comparable state-of-the-art models. Second, a novel high-accuracy framework is introduced, combining a custom DeepSpec�CNN with an architecturally diverse ensemble learning strategy. This approach reframes the classification problem using the control dimension of the Geneva Wheel of Emotions (GWE), establishing a new state-of-the-art performance on CREMA-D by significantly improving upon existing methods. Finally, to address data scarcity, this thesis introduces the Open Your Heart (OYH) corpus, a new, large-scale dataset containing several hours of genuine emotional speech in the Algerian Arabic dialect. Comprehensive performance baselines were established on this challenging corpus using traditional machine learning models, providing a vital new benchmark for future research. Collectively, this thesis advances the field through the dual contribution of novel, high�performance ASER models and the creation of an essential new corpus. The findings provide a robust foundation for building more nuanced, culturally aware, and socially intelligent systems.

Item Type: Thesis (Doctoral)
Uncontrolled Keywords: Automatic Speech Emotion Recognition (ASER), Deep Learning, Convolutional Neural Networks (CNN), Attention Mechanisms, Ensemble Learning, Spectrograms, Affective Computing, Algerian Arabic, Speech Corpus.
Subjects: T Technology > TK Electrical engineering. Electronics Nuclear engineering
Depositing User: Mr. Mourad Kebiel
Date Deposited: 24 Jan 2026 08:08
Last Modified: 24 Jan 2026 08:08
URI: http://thesis.univ-biskra.dz/id/eprint/7112

Actions (login required)

View Item View Item