Data-Driven Software System Log Anomaly Detection

Thesis event information

Date and time of the thesis defence

Topic of the dissertation

Data-Driven Software System Log Anomaly Detection

Doctoral candidate

Master of Science Sayedshayan Hashemi Hosseinabad

Faculty and unit

University of Oulu Graduate School, Faculty of Information Technology and Electrical Engineering, Empirical Software Engineering in Software, Systems and Services

Subject of study

Information Processing Science

Opponent

Professor Michael Felderer, University of Cologne

Custos

Professor Mika Mäntylä, University of Oulu and University of Helsinki

Visit thesis event

Add event to calendar

Data-Driven Software Log Anomaly Detection

Modern software systems generate massive volumes of log data—detailed records of everything the system does—that are essential for spotting errors, monitoring performance, and protecting against security threats. As these logs grow in size and complexity, it becomes impossible for humans to sift through them by hand. This thesis explores how automated tools, powered by both traditional machine-learning and modern deep-learning methods, can detect unusual behaviors (anomalies) in logs quickly and accurately, keeping systems running smoothly and safely.

To accomplish this, the work evaluates several key innovations on publicly available log datasets. First, it presents a fast, scalable log parser that rapidly separates the “template” portion of each message from its changing details. Next, it offers a new way to measure parser accuracy down to the character level, so developers can choose the best parser for their needs. The thesis then introduces a Siamese-network model that remains robust even when log formats evolve over time, complete with built-in monitoring and visualization tools. Finally, it demonstrates an end-to-end deep-learning approach that reads raw log text, including letters, numbers, and punctuation, directly, achieving strong anomaly-detection performance across diverse projects and small datasets. Together, these contributions advance automated log analysis by making it faster, more adaptable, and more precise.
Last updated: 26.5.2025