Data-Driven Software System Log Anomaly Detection
Thesis event information
Date and time of the thesis defence
Topic of the dissertation
Data-Driven Software System Log Anomaly Detection
Doctoral candidate
Master of Science Sayedshayan Hashemi Hosseinabad
Faculty and unit
University of Oulu Graduate School, Faculty of Information Technology and Electrical Engineering, Empirical Software Engineering in Software, Systems and Services
Subject of study
Information Processing Science
Opponent
Professor Michael Felderer, University of Cologne
Custos
Professor Mika Mäntylä, University of Oulu and University of Helsinki
Data-Driven Software Log Anomaly Detection
Modern software systems generate massive volumes of log data—detailed records of everything the system does—that are essential for spotting errors, monitoring performance, and protecting against security threats. As these logs grow in size and complexity, it becomes impossible for humans to sift through them by hand. This thesis explores how automated tools, powered by both traditional machine-learning and modern deep-learning methods, can detect unusual behaviors (anomalies) in logs quickly and accurately, keeping systems running smoothly and safely.
To accomplish this, the work evaluates several key innovations on publicly available log datasets. First, it presents a fast, scalable log parser that rapidly separates the “template” portion of each message from its changing details. Next, it offers a new way to measure parser accuracy down to the character level, so developers can choose the best parser for their needs. The thesis then introduces a Siamese-network model that remains robust even when log formats evolve over time, complete with built-in monitoring and visualization tools. Finally, it demonstrates an end-to-end deep-learning approach that reads raw log text, including letters, numbers, and punctuation, directly, achieving strong anomaly-detection performance across diverse projects and small datasets. Together, these contributions advance automated log analysis by making it faster, more adaptable, and more precise.
To accomplish this, the work evaluates several key innovations on publicly available log datasets. First, it presents a fast, scalable log parser that rapidly separates the “template” portion of each message from its changing details. Next, it offers a new way to measure parser accuracy down to the character level, so developers can choose the best parser for their needs. The thesis then introduces a Siamese-network model that remains robust even when log formats evolve over time, complete with built-in monitoring and visualization tools. Finally, it demonstrates an end-to-end deep-learning approach that reads raw log text, including letters, numbers, and punctuation, directly, achieving strong anomaly-detection performance across diverse projects and small datasets. Together, these contributions advance automated log analysis by making it faster, more adaptable, and more precise.
Last updated: 26.5.2025