Video representation and deep learning techniques for face presentation attack detection

Thesis event information

Date and time of the thesis defence

Place of the thesis defence

L10, Linnanmaa

Topic of the dissertation

Video representation and deep learning techniques for face presentation attack detection

Doctoral candidate

Master of Science Usman Muhammad

Faculty and unit

University of Oulu Graduate School, Faculty of Information Technology and Electrical Engineering, Center for Machine Vision and Signal Analysis

Subject of study

Computer Science


Professor Moncef Gabbouj, Tampere University


Associate professor Mourad Oussalah, University of Oulu

Add event to calendar

Video representation and deep learning techniques for face presentation attack detection.

Facial recognition technology has rapidly gained popularity in a number of security applications, including airport passenger screening, cell phone screening, banking, and law enforcement surveillance. Unfortunately, recent studies show that facial recognition systems can be vulnerable to spoofing, known as a presentation attack. For instance, false facial verification using a photograph, silicone mask, video replay, or even a 3D mask can be used to fraudulently gain access to the biometric system. In recent years, significant efforts have been made to develop software or hardware-based methods, but their performance drastically degrades under real-world conditions (e.g., lighting conditions, illumination variations, user demographic characteristics, and input cameras).

This thesis addresses the very recent developments in face anti-spoofing methods. In particular, we propose video representation and deep learning techniques to explore spatial and temporal information between bona fide and attacked videos. Such exploration is a challenging task because 1) both real and spoofed videos contain spatiotemporal information and 2) there is the challenge of data labeling. From this perspective, we investigate feature fusion methods to compute the importance of features, because the better the features of a model, the more accurate it is. Our results suggest that hybrid deep learning provides stronger discriminative power than the deep features of a single model. In addition, we introduce a mechanism called sample learning for feature augmentation. We show that directly integrating convolutional features into a recurrent neural network can introduce the risk of interference information (e.g., mutual exclusion and redundancy), which can limit the performance of PAD.

Another key challenge is to provide powerful deep feature learning without depending on human-labeled data. This requires the research community to focus more on developing robust PAD countermeasures. To this end, we develop two countermeasures in the context of self-supervised learning, alleviating the annotation bottleneck where models obtain supervision from the data itself. Finally, the generalization capability is considered, where the proposed methods encode complex patterns from PAD videos based on global motion and data augmentation to obtain discriminative representations.
Last updated: 8.8.2023