ZINEDDINE, SARHANI KAHHOUL (2025) Recognition and Analysis of Individual and Collective Human Behaviors by Combining Image Information and Speech Signal. Doctoral thesis, Faculté des Sciences et de la technologie.
|
Text
PhD_Thesis_KAHHOUL Zineddine Sarhani.pdf Download (7MB) |
Abstract
The automated recognition of human emotion is a cornerstone of modern affective comput�ing, yet progress is often hindered by the limitations of unimodal analysis, the performance gap on real-world data, and a critical lack of resources for under-resourced languages. This thesis presents a comprehensive framework to address these challenges, with a deep focus on advancing the state-of-the-art in Automatic Speech Emotion Recognition (ASER). The research makes three primary contributions. First, an efficient and lightweight archi�tecture, the CBAM-DenseNet121, is proposed to resolve the trade-off between accuracy and computational complexity. By integrating an attention mechanism with a dense convolutional network, this model achieves highly competitive performance on the benchmark CREMA-D dataset while utilizing substantially fewer parameters than comparable state-of-the-art models. Second, a novel high-accuracy framework is introduced, combining a custom DeepSpec�CNN with an architecturally diverse ensemble learning strategy. This approach reframes the classification problem using the control dimension of the Geneva Wheel of Emotions (GWE), establishing a new state-of-the-art performance on CREMA-D by significantly improving upon existing methods. Finally, to address data scarcity, this thesis introduces the Open Your Heart (OYH) corpus, a new, large-scale dataset containing several hours of genuine emotional speech in the Algerian Arabic dialect. Comprehensive performance baselines were established on this challenging corpus using traditional machine learning models, providing a vital new benchmark for future research. Collectively, this thesis advances the field through the dual contribution of novel, high�performance ASER models and the creation of an essential new corpus. The findings provide a robust foundation for building more nuanced, culturally aware, and socially intelligent systems.
| Item Type: | Thesis (Doctoral) |
|---|---|
| Uncontrolled Keywords: | Automatic Speech Emotion Recognition (ASER), Deep Learning, Convolutional Neural Networks (CNN), Attention Mechanisms, Ensemble Learning, Spectrograms, Affective Computing, Algerian Arabic, Speech Corpus. |
| Subjects: | T Technology > TK Electrical engineering. Electronics Nuclear engineering |
| Depositing User: | Mr. Mourad Kebiel |
| Date Deposited: | 24 Jan 2026 08:08 |
| Last Modified: | 24 Jan 2026 08:08 |
| URI: | http://thesis.univ-biskra.dz/id/eprint/7112 |
Actions (login required)
![]() |
View Item |
