Deep Representation Learning for Automatic Depression Detection from Facial Expressions

Thesis event information

Date and time of the thesis defence

Place of the thesis defence

Auditorium L10, Linnanmaa

Topic of the dissertation

Deep Representation Learning for Automatic Depression Detection from Facial Expressions

Doctoral candidate

Master of Science Wheidima Carneiro de Melo

Faculty and unit

University of Oulu Graduate School, Faculty of Information Technology and Electrical Engineering, The Center for Machine Vision and Signal Analysis (CMVS)

Subject of study

Computer Science and Engineering


Docent Heikki Huttunen, Tampere University of Technology


Associate Professor Miguel Bordallo López, University of Oulu

Add event to calendar

Automatic Depression Detection from Facial Expressions

Depression is a prevalent mental disorder that severely affects an individual’s quality of life. Traditional diagnostic methods rely on either clinician’s evaluation of symptoms reported by an individual or self-report instruments. These subjective assessments have resulted in difficulties to recognize depression. This scenario has motivated the development of automatic diagnostic systems to provide objective and reliable information about depressive states. Recently, a growing interest has been generated in developing such systems based on facial information since there exists evidence that facial expressions convey valuable information about depression.

This thesis proposes computational models to explore the correlations between facial expressions and depressive states. Such exploration is a challenging task because

1) the difference in facial expressions along different depression levels may be small and
2) the complexities involved in facial analysis.

From this perspective, we investigate different deep learning techniques to effectively model facial expressions for automatic depression detection. Specifically, we design architectures that model the appearance and dynamics of facial videos. For that, we analyze structures that explore either a fixed
or multiple spatiotemporal ranges. Our findings suggest that the use of a structure with multiscale feature extraction ability contributes to learning depression representation. We also demonstrate that depression distributions increase the robustness of depression estimations.

Another key challenge in this application is the scarcity of labelled data. This limitation leads to the need of efficient representation learning methods. To this end, we first develop a pooling method to encode facial dynamics into an image map, which may be explored by less complex deep models. In addition, we design an architecture to capture different facial expression variations by using a basic structure based on functions that explore features at multiple ranges without using trainable parameters.

Finally, we develop an architecture to explore facial expressions related to depression and pain since depressed individuals may experience pain. To build this architecture, we use different strategies to efficiently extract multiscale features. Our experiments indicate that the proposed methods have the potential to generate discriminative representations.
Last updated: 2.8.2022