Skip to main navigation menu Skip to main content Skip to site footer

Multimodal Emotion Recognition using Deep Learning


New research into human-computer interaction seeks to consider the consumer's emotional status to provide a seamless human-computer interface. This would make it possible for people to survive and be used in widespread fields, including education and medicine. Multiple techniques can be defined through human feelings, including expressions, facial images, physiological signs, and neuroimaging strategies. This paper presents a review of emotional recognition of multimodal signals using deep learning and comparing their applications based on current studies. Multimodal affective computing systems are studied alongside unimodal solutions as they offer higher accuracy of classification. Accuracy varies according to the number of emotions observed, features extracted, classification system and database consistency. Numerous theories on the methodology of emotional detection and recent emotional science address the following topics. This would encourage studies to understand better physiological signals of the current state of the science and its emotional awareness problems.


Emotion recognition, Facial recognition, Physiological signals, Deep Learning



  1. N. Perveen, D. Roy, and K. M. Chalavadi, "Facial Expression Recognition in Videos Using Dynamic Kernels," IEEE Transactions on Image Processing, vol. 29, pp. 8316-8325, 2020.
  2. S. Bateman and S. Ameen, "Comparison of algorithms for use in adaptive adjustment of digital data receivers," IEE Proceedings I (Communications, Speech and Vision), vol. 137, pp. 85-96, 1990.
  3. H. I. Dino and M. B. Abdulrazzaq, "Facial expression classification based on SVM, KNN and MLP classifiers," in 2019 International Conference on Advanced Science and Engineering (ICOASE), 2019, pp. 70-75.
  4. O. F. Mohammad, M. S. M. Rahim, S. R. M. Zeebaree, and F. Y. Ahmed, "A survey and analysis of the image encryption methods," International Journal of Applied Engineering Research, vol. 12, pp. 13265-13280, 2017.
  5. V. Shrivastava, V. Richhariya, and V. Richhariya, "Puzzling Out Emotions: A Deep-Learning Approach to Multimodal Sentiment Analysis," in 2018 International Conference on Advanced Computation and Telecommunication (ICACAT), 2018, pp. 1-6.
  6. D. A. Zebari, H. Haron, S. R. Zeebaree, and D. Q. Zeebaree, "Enhance the Mammogram Images for Both Segmentation and Feature Extraction Using Wavelet Transform," in 2019 International Conference on Advanced Science and Engineering (ICOASE), 2019, pp. 100-105.
  7. L. Chen, Y. Ouyang, Y. Zeng, and Y. Li, "Dynamic facial expression recognition model based on BiLSTM-Attention," in 2020 15th International Conference on Computer Science & Education (ICCSE), 2020, pp. 828-832.
  8. M. Wu, W. Su, L. Chen, W. Pedrycz, and K. Hirota, "Two-stage Fuzzy Fusion based-Convolution Neural Network for Dynamic Emotion Recognition," IEEE Transactions on Affective Computing, 2020.
  9. A. Clark, S. Abdullah, and S. Ameen, "A comparison of decision-feedback equalizers for a 9600 bit/s modem," Journal of the Institution of Electronic and Radio Engineers, vol. 58, pp. 74-83, 1988.
  10. S. Ammen, M. Alfarras, and W. Hadi, "OFDM System Performance Enhancement Using Discrete Wavelet Transform and DS-SS System Over Mobile Channel," ed: ACTA Press Advances in Computer and Engineering, 2010.
  11. J. Liang, S. Chen, and Q. Jin, "Semi-supervised Multimodal Emotion Recognition with Improved Wasserstein GANs," in 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2019, pp. 695-703.
  12. A. A. Yazdeen, S. R. Zeebaree, M. M. Sadeeq, S. F. Kak, O. M. Ahmed, and R. R. Zebari, "FPGA Implementations for Data Encryption and Decryption via Concurrent and Parallel Computation: A Review," Qubahan Academic Journal, vol. 1, pp. 8-16, 2021.
  13. A. A. Salih and M. B. Abdulrazaq, "Combining best features selection using three classifiers in intrusion detection system," in 2019 International Conference on Advanced Science and Engineering (ICOASE), 2019, pp. 94-99.
  14. H. Dino, M. B. Abdulrazzaq, S. R. Zeebaree, A. B. Sallow, R. R. Zebari, H. M. Shukur, et al., "Facial Expression Recognition based on Hybrid Feature Extraction Techniques with Different Classifiers," TEST Engineering & Management, vol. 83, pp. 22319-22329, 2020.
  15. S. S. R. Zeebaree, S. Ameen, and M. Sadeeq, "Social Media Networks Security Threats, Risks and Recommendation: A Case Study in the Kurdistan Region," International Journal of Innovation, Creativity and Change, vol. 13, pp. 349-365, 2020.
  16. S. Y. Ameen and S. W. Nourildean, "Coordinator and router investigation in IEEE802. 15.14 ZigBee wireless sensor network," in 2013 International Conference on Electrical Communication, Computer, Power, and Control Engineering (ICECCPCE), 2013, pp. 130-134.
  17. M. R. Al-Sultan, S. Y. Ameen, and W. M. Abduallah, "Real Time Implementation of Stegofirewall System," International Journal of Computing and Digital Systems, vol. 8, pp. 498-504, 2019.
  18. E. Chandra and J. Y.-j. Hsu, "Deep Learning for Multimodal Emotion Recognition-Attentive Residual Disconnected RNN," in 2019 International Conference on Technologies and Applications of Articial Intelligence (TAAI), 2019, pp. 1-8.
  19. M. B. Abdulrazzaq and J. N. Saeed, "A comparison of three classification algorithms for handwritten digit recognition," in 2019 International Conference on Advanced Science and Engineering (ICOASE), 2019, pp. 58-63.
  20. J. Chen, Y. Lv, R. Xu, and C. Xu, "Automatic social signal analysis: Facial expression recognition using difference convolution neural network," Journal of Parallel and Distributed Computing, vol. 131, pp. 97-102, 2019.
  21. M. R. Mahmood, M. B. Abdulrazzaq, S. Zeebaree, A. K. Ibrahim, R. R. Zebari, and H. I. Dino, "Classification techniques performance evaluation for facial expression recognition," Indonesian Journal of Electrical Engineering and Computer Science, vol. 21, pp. 176-1184, 2021.
  22. L. Hu, W. Li, J. Yang, G. Fortino, and M. Chen, "A Sustainable Multi-modal Multi-layer Emotion-aware Service at the Edge," IEEE Transactions on Sustainable Computing, 2019.
  23. E. Ghaleb, M. Popa, and S. Asteriadis, "Multimodal and Temporal Perception of Audio-visual Cues for Emotion Recognition," in 2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII), 2019, pp. 552-558.
  24. C. Caihua, "Research on Multi-modal Mandarin Speech Emotion Recognition Based on SVM," in 2019 IEEE International Conference on Power, Intelligent Computing and Systems (ICPICS), 2019, pp. 173-176.
  25. M. B. Abdulrazzaq and K. I. Khalaf, "Handwritten Numerals' Recognition in Kurdish Language Using Double Feature Selection," in 2019 2nd International Conference on Engineering Technology and its Applications (IICETA), 2019, pp. 167-172.
  26. K. B. Obaid, S. Zeebaree, and O. M. Ahmed, "Deep Learning Models Based on Image Classification: A Review," International Journal of Science and Business, vol. 4, pp. 75-81, 2020.
  27. T. D. Nguyen, "Multimodal emotion recognition using deep learning techniques," Queensland University of Technology, 2020.
  28. S. Y. Ameen, F. M. Almusailkh, and M. H. Al-Jammas, "FPGA Implementation of Neural Networks Based Symmetric Cryptosystem," in 6th International Conference: Sciences of Electronic, Technologies of Information and Telecommunications May, 2011, pp. 12-15.
  29. K. Mohan, A. Seal, O. Krejcar, and A. Yazidi, "Facial Expression Recognition using Local Gravitational Force Descriptor based Deep Convolution Neural Networks," IEEE Transactions on Instrumentation and Measurement, 2020.
  30. C. Marechal, D. Mikolajewski, K. Tyburek, P. Prokopowicz, L. Bougueroua, C. Ancourt, et al., "Survey on AI-Based Multimodal Methods for Emotion Detection," ed, 2019.
  31. E. S. Hussein, U. Qidwai, and M. Al-Meer, "Emotional Stability Detection Using Convolutional Neural Networks," in 2020 IEEE International Conference on Informatics, IoT, and Enabling Technologies (ICIoT), 2020, pp. 136-140.
  32. H. I. Dino and M. B. Abdulrazzaq, "A Comparison of Four Classification Algorithms for Facial Expression Recognition," Polytechnic Journal, vol. 10, pp. 74-80, 2020.
  33. H. Miao, Y. Zhang, W. Li, H. Zhang, D. Wang, and S. Feng, "Chinese Multimodal Emotion Recognition in Deep and Traditional Machine Leaming Approaches," in 2018 First Asian Conference on Affective Computing and Intelligent Interaction (ACII Asia), 2018, pp. 1-6.
  34. D. Liu, X. Ouyang, S. Xu, P. Zhou, K. He, and S. Wen, "SAANet: Siamese action-units attention network for improving dynamic facial expression recognition," Neurocomputing, vol. 413, pp. 145-157, 2020.
  35. S. Zhang, X. Tao, Y. Chuang, and X. Zhao, "Learning deep multimodal affective features for spontaneous speech emotion recognition," Speech Communication, 2020.
  36. W. Mellouk and W. Handouzi, "Facial emotion recognition using deep learning: review and insights," Procedia Computer Science, vol. 175, pp. 689-694, 2020.
  37. P. Ekman and W. V. Friesen, Unmasking the face: A guide to recognizing emotions from facial clues: Ishk, 2003.
  38. L. A. Feldman, "Valence focus and arousal focus: Individual differences in the structure of affective experience," Journal of personality and social psychology, vol. 69, p. 153, 1995.
  39. S. R. Zeebaree, O. Ahmed, and K. Obid, "CSAERNet: An Efficient Deep Learning Architecture for Image Classification," in 2020 3rd International Conference on Engineering Technology and its Applications (IICETA), 2020, pp. 122-127.
  40. M. B. Abdulrazaq, M. R. Mahmood, S. R. Zeebaree, M. H. Abdulwahab, R. R. Zebari, and A. B. Sallow, "An Analytical Appraisal for Supervised Classifiers Performance on Facial Expression Recognition Based on Relief-F Feature Selection," in Journal of Physics: Conference Series, 2021, p. 012055.
  41. S. Y. Ameen and M. R. Al-Badrany, "Optimal image steganography content destruction techniques," in International Conference on Systems, Control, Signal Processing and Informatics, 2013, pp. 453-457.
  42. E. S. Salama, R. A. El-Khoribi, M. E. Shoman, and M. A. W. Shalaby, "A 3D-convolutional neural network framework with ensemble learning techniques for multi-modal emotion recognition," Egyptian Informatics Journal, 2020.
  43. I. A. Khalifa, S. R. Zeebaree, M. Ata, and F. M. Khalifa, "Image steganalysis in frequency domain using co-occurrence matrix and Bpnn," Science Journal of University of Zakho, vol. 7, pp. 27-32, 2019.
  44. A. Chen, H. Xing, and F. Wang, "A Facial Expression Recognition Method Using Deep Convolutional Neural Networks Based on Edge Computing," IEEE Access, vol. 8, pp. 49741-49751, 2020.
  45. S. Dou, Z. Feng, X. Yang, and J. Tian, "Real-time multimodal emotion recognition system based on elderly accompanying robot," in Journal of Physics: Conference Series, 2020, p. 012093.
  46. G. Wen, T. Chang, H. Li, and L. Jiang, "Dynamic Objectives Learning for Facial Expression Recognition," IEEE Transactions on Multimedia, 2020.
  47. I. Lasri, A. R. Solh, and M. El Belkacemi, "Facial Emotion Recognition of Students using Convolutional Neural Network," in 2019 Third International Conference on Intelligent Computing in Data Sciences (ICDS), 2019, pp. 1-6.
  48. S. Rajan, P. Chenniappan, S. Devaraj, and N. Madian, "Novel deep learning model for facial expression recognition based on maximum boosted CNN and LSTM," IET Image Processing, vol. 14, pp. 1373-1381, 2020.
  49. J. Saeed and S. Zeebaree, "Skin Lesion Classification Based on Deep Convolutional Neural Networks Architectures," Journal of Applied Science and Technology Trends, vol. 2, pp. 41-51, 2021.
  50. D. Krishna and A. Patil, "Multimodal Emotion Recognition using Cross-Modal Attention and 1D Convolutional Neural Networks," Proc. Interspeech 2020, pp. 4243-4247, 2020.
  51. J. Liscombe, J. Venditti, and J. Hirschberg, "Classifying subject ratings of emotional speech using acoustic features," in Eighth European Conference on Speech Communication and Technology, 2003.
  52. R. Ibrahim, S. Zeebaree, and K. Jacksi, "Survey on Semantic Similarity Based on Document Clustering," Adv. sci. technol. eng. syst. j, vol. 4, pp. 115-122, 2019.
  53. D. Bertero and P. Fung, "A first look into a convolutional neural network for speech emotion detection," in 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), 2017, pp. 5115-5119.
  54. J. Lee and I. Tashev, "High-level feature representation using recurrent neural network for speech emotion recognition," in Sixteenth annual conference of the international speech communication association, 2015.
  55. D. A. Hasan, B. K. Hussan, S. R. Zeebaree, D. M. Ahmed, O. S. Kareem, and M. A. Sadeeq, "The Impact of Test Case Generation Methods on the Software Performance: A Review," International Journal of Science and Business, vol. 5, pp. 33-44, 2021.
  56. Y.-T. Lan, W. Liu, and B.-L. Lu, "Multimodal Emotion Recognition Using Deep Generalized Canonical Correlation Analysis with an Attention Mechanism," in 2020 International Joint Conference on Neural Networks (IJCNN), 2020, pp. 1-6.
  57. P. Bhattacharya, R. K. Gupta, and Y. Yang, "The Contextual Dynamics of Multimodal Emotion Recognition in Videos," arXiv preprint arXiv:2004.13274, 2020.
  58. H. Zhang, "Expression-EEG Based Collaborative Multimodal Emotion Recognition Using Deep AutoEncoder," IEEE Access, vol. 8, pp. 164130-164143, 2020.
  59. F. Al-Naima, S. Y. Ameen, and A. F. Al-Saad, "Destroying steganography content in image files," in IEEE Proceedings of Fifth International Symposium on Communication Systems, Networks and Digital Signal Processing, University of Patras, Patras, Greece, 2006.
  60. S. Bouktif, A. Fiaz, A. Ouni, and M. A. Serhani, "Multi-sequence LSTM-RNN deep learning and metaheuristics for electric load forecasting," Energies, vol. 13, p. 391, 2020.
  61. M. Verma, S. K. Vipparthi, G. Singh, and S. Murala, "LEARNet: Dynamic imaging network for micro expression recognition," IEEE Transactions on Image Processing, vol. 29, pp. 1618-1627, 2019.
  62. Z. Pan, Z. Luo, J. Yang, and H. Li, "Multi-modal Attention for Speech Emotion Recognition," arXiv preprint arXiv:2009.04107, 2020.
  63. S. Siriwardhana, A. Reis, R. Weerasekera, and S. Nanayakkara, "Jointly Fine-Tuning" BERT-like" Self Supervised Models to Improve Multimodal Speech Emotion Recognition," arXiv preprint arXiv:2008.06682, 2020.
  64. D. Priyasad, T. Fernando, S. Denman, S. Sridharan, and C. Fookes, "Attention Driven Fusion for Multi-Modal Emotion Recognition," in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 3227-3231.
  65. J.-H. Lee, H.-J. Kim, and Y.-G. Cheong, "A Multi-modal Approach for Emotion Recognition of TV Drama Characters Using Image and Text," in 2020 IEEE International Conference on Big Data and Smart Computing (BigComp), 2020, pp. 420-424.
  66. G. Liu and Z. Tan, "Research on Multi-modal Music Emotion Classification Based on Audio and Lyirc," in 2020 IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), 2020, pp. 2331-2335.
  67. S. DHAOUADI and M. M. B. KHELIFA, "A multimodal Physiological-Based Stress Recognition: Deep Learning Models Evaluation in Gamers Monitoring Application," in 2020 5th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP), 2020, pp. 1-6.
  68. C.-J. Yang, N. Fahier, W.-C. Li, and W.-C. Fang, "A Convolution Neural Network Based Emotion Recognition System using Multimodal Physiological Signals," in 2020 IEEE International Conference on Consumer Electronics-Taiwan (ICCE-Taiwan), 2020, pp. 1-2.
  69. X. Zhang, J. Liu, J. Shen, S. Li, K. Hou, B. Hu, et al., "Emotion Recognition From Multimodal Physiological Signals Using a Regularized Deep Fusion of Kernel Machine," IEEE Transactions on Cybernetics, 2020.
  70. B. Nakisa, M. N. Rastgoo, A. Rakotonirainy, F. Maire, and V. Chandran, "Automatic Emotion Recognition Using Temporal Multimodal Deep Learning," IEEE Access, 2020.
  71. M. A. Asghar, M. J. Khan, M. Rizwan, R. M. Mehmood, and S.-H. Kim, "An Innovative Multi-Model Neural Network Approach for Feature Selection in Emotion Recognition Using Deep Feature Clustering," Sensors, vol. 20, p. 3765, 2020.
  72. W. Nie, Y. Yan, D. Song, and K. Wang, "Multi-modal feature fusion based on multi-layers LSTM for video emotion recognition," Multimedia Tools and Applications, pp. 1-10, 2020.
  73. H. Gunes and M. Pantic, "Automatic, dimensional and continuous emotion recognition," International Journal of Synthetic Emotions (IJSE), vol. 1, pp. 68-99, 2010.


Metrics Loading ...

Most read articles by the same author(s)