Adaptable and generalizable deep learning for visual recognition systems

Thesis event information

Date and time of the thesis defence

Place of the thesis defence

Online

Topic of the dissertation

Adaptable and generalizable deep learning for visual recognition systems

Doctoral candidate

Master of Engineering Wuti Xiong

Faculty and unit

University of Oulu Graduate School, Faculty of Information Technology and Electrical Engineering, Center for Machine Vision and Signal Analysis

Subject of study

Computer Science and Engineering

Opponent

Professor Joni-Kristian Kämäräinen, Tampere University

Custos

Professor Olli Silvén, University of Oulu

Visit thesis event

Add event to calendar

Building smarter vision systems that work wherever they’re used

Most existing deep learning-based visual recognition systems struggle to be adaptable and generalizable in scenarios with limited labeled data or when encountering novel domains. This thesis addresses these challenges by focusing on two key areas: object detection and deepfake detection.

For object detection, the thesis explores adaptability and generalizability through two major contributions. First, a semi-supervised few-shot object detection framework is introduced that leverages self-supervised learning to enhance the model's robustness and adaptability with limited labeled data. Second, a comprehensive benchmark for cross-domain few-shot object detection is established, providing a robust evaluation platform and insights into the model's generalizability across diverse domains, addressing the critical issue of domain shift in real-world applications.

For deepfake detection, the thesis investigates adaptability through an exemplar-free incremental learning framework, enabling models to continuously adapt to emerging deepfake techniques without retaining past exemplars. To improve generalizability, an attention-guided inconsistency learning method is proposed to enhance the detection of subtle inconsistencies in forged images. Additionally, the thesis explores the use of vision-language models to improve generalization performance, demonstrating the potential of pre-trained foundation models for deepfake detection tasks.

The proposed methods consistently achieve strong performance across benchmarks and real-world scenarios, effectively addressing challenges of adaptability and generalizability. By advancing object detection and deepfake detection, this thesis contributes meaningful insights and tools for computer vision and artificial intelligence, laying the groundwork for more robust and versatile visual recognition systems.
Created 26.3.2026 | Updated 27.3.2026