Artificial intelligence understands emotions

Emotions and cognitive functions are inextricably integrated into the human brain, together forming human intelligence. With emotions, people express their various feelings or react to the internal and external stimuli accordingly. Therefore, emotions play a fundamental role in human communication and cognition, which makes human become very special to other creatures.

Formally, emotions are biological states associated with the nervous system brought on by neurophysiological changes variously associated with thoughts, feelings, behavioural responses, and a degree of pleasure or displeasure [1]. Over the last decades, research on emotions has increased significantly in the fields like psychology, neuroscience, as well as what we focuses on, computer science. The prosperous Artificial emotional intelligence or emotion AI is a proof of its popularity in computer science, which may also indicate that emotionally intelligent machines may not be as far away as it seems. Emotion AI strives for the development of systems and devices that can recognise, interpret, process and simulate human emotions. Put it in other words, the AI is expected to understand your happiness when seeing your spontaneous smiles, as well it is expected to react appropriately.

Why emotion AI?

The first question is do we really need emotion AI? Well, can you take a second to think about this question. How would you feel about online learning with an e-tutor? Is it exciting if virtual tutor could understand your confusion and frustration, and encourage you if you want to give up?

Emotion AI has so many attractive and valuable potential applications, just take a few more examples: It can help detect when patients in ICU (intensive care unit) or who can not talk well, are experiencing abnormal pains so that their pains should be taken care immediately and seriously. In business, automatic measurement of consumers’ behaviour in response to products and product ads can have profound impact in automatic market research and for improving the service. It could also help border control agents to detect potentially dangerous individuals during routine screening interviews, and detect what a driver intends to do in the next seconds or in a self-driving car what he expects the car to do. Also analysing concealed emotions can help assess the mental problems, e.g., depression and anxiety, for people’s emotional wellbeing.

How does emotion AI work?

Face is one of the most commonly used information source to analyze people’s emotions. Generally, there will be a group of predefined discrete emotions, like neutral, happiness, sadness, surprise, fear, disgust, and anger, or continuous emotional dimensions, like valence and arousal, in faces. Then, researchers will build some computational models to predict the emotions. For Instance, AI technology can determine one’s emotional expression based on several factors such as the location of eyebrows, eyes, and how mouth moves. Nowadays, more advanced methods are data-driven learning methods. For this kind of methods, emotion-recognition systems generally learn to determine the links between an emotion and its external manifestation from large arrays of labelled data, which is called supervised learning in machine learning. The data may include audio or video recordings of TV shows, interviews and experiments involving people, clips of theatrical performances or movies, and dialogues acted out by professional actors. Based on the predicted error, when compared to the provided labels, the computational models will adjust the model parameters using some optimization methods, like gradient descent. Normally, this process is performed iteratively, which is so-called ‘learning’ in machine learning, until it converges to an accepted point.

Just recognition or understanding?

We humans are very good at hiding, especially for emotions. For instance, one can pretend to be happy even this person feels extremely angry inside. Therefore, if the AI agent cannot truly understand the emotion of a person, it cannot make an appropriate response. So, one of the most challenging things for Emotion AI is understanding human emotions, spontaneous or acted. How can the computational model address this kind of problems? Well, in our group we turn to micro-expressions. Micro-expressions are defined as rapid involuntary facial expressions that can reveal suppressed emotional states, which has the potential to discover the real emotion of a person. Therefore, it can be helpful to let the computational models distinguish and understand the real emotion of a person. However, compared to normal expressions (Macro expressions), recognizing micro-expression is very challenging since micro-expression is very short in duration and its intensity is also suppressed. Therefore, it only has fleeting small movements in the face, hard to notice. In our group, we have explored many different methods to deal with these challenges. For instance, we have introduced to magnify the intensity and at the same time extend the duration of micro-expressions [2], such that methods for macro expressions can also be applied.

Only face? No, there is multimodal learning

Face is a fascinating way for communicating with emotions. But it also has its limitations. For example, a computational model has hard to tell the crying heart from a smiling face (let’s say one is trying to hide huge sadness with a big smile) based only on the facial expressions. But if we can introduce more cues, like the voice, action and heart rate, this problem can be easier. One representative example is text and voice. Like one of the most commonly used Finnish word, noniin, here, with this single word, it is hard to tell its meaning since it could be totally different when it is said with different emotional tones. Combining both text and voice, it will be very easy. Therefore, emotion agents can recognize human emotions from multiple cues or signals more accurately. Most existing emotion recognition systems analyse an individual’s facial expressions and voice, as well as words people say or write. While in our research, we are involving more, including the human actions [3], gestures [4] and also remotely measured bio-information from face videos [5]. While we have more clues, we can build a more reliable system, a multi-modal learning to improve the performance to really understand human emotions.


[2] Peng, Wei, Xiaopeng Hong, Yingyue Xu, and Guoying Zhao. "A boost in revealing subtle facial expressions: A consolidated eulerian framework." In 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019), pp. 1-5. IEEE, 2019.

[3] Peng, Wei, Xiaopeng Hong, Haoyu Chen, and Guoying Zhao. "Learning Graph Convolutional Network for Skeleton-Based Human Action Recognition by Neural Searching." In AAAI, pp. 2669-2676. 2020.

[4] Haoyu Chen, Xin Liu, Jingang Shi, Guoying Zhao. “Temporal Hierarchical Dictionary Guided Decoding for Online Gesture Segmentation and Recognition". IEEE Transactions on Image Processing (TIP), 2020

[5] Yu, Zitong*, Wei Peng*, Xiaobai Li, Xiaopeng Hong, and Guoying Zhao. "Remote heart rate measurement from highly compressed facial videos: an end-to-end deep learning solution with video enhancement." In Proceedings of the IEEE International Conference on Computer Vision, pp. 151-160. 2019.

Researcher and Ph.D. candidate Wei Peng and Professor Guoying Zhao work at the Center for Machine Vision and Signal Analysis, University of Oulu. Their research interests include machine learning and affective computing, among others.