International Journal of Innovative Research in Computer and Communication Engineering

ISSN Approved Journal | Impact factor: 8.771 | ESTD: 2013 | Follows UGC CARE Journal Norms and Guidelines

| Monthly, Peer-Reviewed, Refereed, Scholarly, Multidisciplinary and Open Access Journal | High Impact Factor 8.771 (Calculated by Google Scholar and Semantic Scholar | AI-Powered Research Tool | Indexing in all Major Database & Metadata, Citation Generator | Digital Object Identifier (DOI) |


TITLE Psychological Stress Detection from Voice Using Deep Learning
ABSTRACT The prevalent problem of mental distress in contemporary society evolves to a severe issue that impacts mental and emotional conditions of people. The continuous existence of stress involves negative influence to cognitive functions and decision-making abilities and full working performance. The identification of psychological stress at an early stage allows its measurements to reduce their impact at a time when it improves the methods of treating mental health. The study presents a multimodal system of deep learning that identifies psychological distress by examining speech and written words. The system is based on the learning of particular acoustic features of speech data by a neural network model, CNN + LSTM, and a transformer-based language model that learns about the development of language patterns in written text. The researchers relied on the RAVDESS dataset to evaluate the speech emotion recognition and the Dreaddit dataset to test the ability to detect textual stress. It was observed that the text-based model has accuracy of 86.8 and speech-based model is 96.11 and ROC-AUC= 0.93. The results suggest that the automated stress detection systems prove to be more effective when both acoustic and linguistic data are employed.
AUTHOR RAMYAKRISHNA KADIYALA, B MADHAV RAO, VYUHITA GUNUPUDI, CHANIKYA DURGA VARA PRASAD G Assistant Professor, Dept. of CSE, Sir C R Reddy College of Engineering, Eluru, India Professor, Dept. of CSE, Sir C R Reddy College of Engineering, Eluru, India B. Tech Student, Dept.of CSE, Sir C R Reddy College of Engineering, Eluru., India
VOLUME 182
DOI DOI: 10.15680/IJIRCCE. 2026.1403059
PDF pdf/59_Psychological Stress Detection from Voice.pdf
KEYWORDS
References [1] S. Cohen, D. Janicki-Deverts, and G. E. Miller, “Psychological stress and disease,” Journal of the American Medical Association, vol. 298, no. 14, pp. 1685–1687, 2007.
[2] J. Healey and R. Picard, “Detecting stress during real-world driving tasks using physiological sensors,” IEEE Transactions on Intelligent Transportation Systems, vol. 6, no. 2, pp. 156–166, 2005.
[3] Z. Zeng, M. Pantic, G. I. Roisman, and T. Huang, “A survey of affect recognition methods: Audio, visual, and spontaneous expressions,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 1, pp. 39–58, 2009.
[4] M. El Ayadi, M. S. Kamel, and F. Karray, “Survey on speech emotion recognition: Features, classification schemes, and databases,” Pattern Recognition, vol. 44, no. 3, pp. 572–587, 2011.
[5] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997.
[6] A. Krizhevsky, I. Sutskever, and G. Hinton, “ImageNet classification with deep convolutional neural networks,” Advances in Neural Information Processing Systems, pp. 1097–1105, 2012.
[7] H. Fayek, M. Lech, and L. Cavedon, “Evaluating deep learning architectures for speech emotion recognition,” Neural Networks, vol. 92, pp. 60–68, 2017.
[8] S. Livingstone and F. Russo, “The Ryerson audio-visual database of emotional speech and song (RAVDESS),” PLOS ONE, vol. 13, no. 5, 2018.
[9] C. Busso et al., “IEMOCAP: Interactive emotional dyadic motion capture database,” Language Resources and Evaluation, vol. 42, no. 4, pp. 335–359, 2008.
[10] M. De Choudhury, M. Gamon, S. Counts, and E. Horvitz, “Predicting depression via social media,” Proceedings of the International AAAI Conference on Web and Social Media, 2013.
[11] J. Devlin, M. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” Proceedings of NAACL-HLT, pp. 4171–4186, 2019.
[12] Y. Liu et al., “RoBERTa: A robustly optimized BERT pretraining approach,” arXiv preprint arXiv:1907.11692,
[13] A. McEwen, “Protective and damaging effects of stress mediators,” New England Journal of Medicine, vol. 338, pp. 171–179, 1998.
[14] B. Schuller, S. Steidl, and A. Batliner, “The INTERSPEECH emotion challenge,” Proceedings of Interspeech, pp. 312–315, 2009.
[15] F. Eyben, M. Wöllmer, and B. Schuller, “openSMILE: The Munich versatile and fast open-source audio feature extractor,” Proceedings of ACM Multimedia, 2010.
[16] D. O’Shaughnessy, Speech Communications: Human and Machine. IEEE Press, 2000.
[17] K. Han, D. Yu, and I. Tashev, “Speech emotion recognition using deep neural network and extreme learning machine,” Proceedings of Interspeech, 2014.
[18] A. Sarkar et al., “A review of speech emotion recognition using deep learning,” IEEE Access, vol. 8, pp. 11171–11186, 2020.
[19] R. Turcan and K. McKeown, “Dreaddit: A Reddit dataset for stress analysis in social media,” Proceedings of the EMNLP Workshop on Computational Linguistics and Clinical Psychology, 2019.
[20] A. Radford et al., “Robust speech recognition via large-scale weak supervision,” OpenAI Whisper, 2022.
[21] V. Bhandari, “Generating log-mel spectrogram using librosa,” Signal Processing StackExchange, 2021. Available: https://dsp.stackexchange.com/questions/75017/generating-log-mel-spectrogram-using-librosa
[22] R. Madhubala, K. R. Akhila, P. S. G. Aruna Sri, B. Madhav Rao, and A. Deepa, “Enhancing Cybersecurity with Intelligent AI and Machine Learning using AutoML Techniques,” International Journal of Applied Mathematics, vol. 38, no. 4s, 2025.
image
Copyright © IJIRCCE 2020.All right reserved