Data analysis for Digital Humanities

Data analysis for Digital Humanities

Digital Humanities is an emerging research area seated firmly between traditional humanities scholarship and the engineering sciences. Research in the domain seeks to uncover new humanities knowledge and add insight to existing theories by utilizing advances made in computer-assisted analysis methods. Due to this unique multidisciplinary nature, the domain offers fruitful ground for developing and testing out many machine vision and signal analysis methods on real-world data and research questions.

Advanced signal analysis, digital media content analysis, intelligent utilization of metadata and novel information visualization techniques are at the core of our research, and have proven highly valuable in the solving of real-world research issues stemming from, for example, the fields of humanities, education and medicine.

MORE – A Multimodal Observation System for Groups

The MORE system is designed for observation and machine-aided analysis of social interaction in real life situations, such as classroom teaching scenarios and business meetings. The system utilizes a multichannel approach to collect data whereby multiple streams of 11 data in a number of different modalities are obtained from each situation. Typically the system collects a 360-degree video and audio feed from multiple microphones set up in the space. The system includes an advanced server backend component that is capable of performing video processing, feature extraction and archiving operations on behalf of the user. The feature extraction services form a key part of the system and rely on advanced signal analysis techniques, such as speech processing, motion activity detection and facial expression recognition in order to speed up the analysis of large data sets.

The provided web interface weaves the multiple streams of information together, utilizes the extracted features as metadata on the audio and video data and lets the user dive into analysing the recorded events. The objective of the system is to facilitate easy navigation of multimodal data and enable the analysis of the recorded situations for the purposes of, for example, behavioural studies, teacher training and business development. A further unique feature of the system is its low setup overhead and high portability as the lightest MORE setup only requires a laptop computer and the selected set of sensors on site.

The MORE system has been deployed in real classroom observation cases and has a solid application potential within the fields of vocology (professional voice training), speech-language pathology (voice, speech and language assessment and therapy), medicine (physician-patient interaction), psychology (counselling and guidance), education (class-room management), forensics (interrogation and interviewing), and security research. The developed system consisting of both a hardware and a software component is presented in a paper published in the journal of Multimedia tools and applications (Springer).

Linguistic Modelling based on Large-scale Survey Data

A persistent issue in linguistic research, and in dialect research in particular, has been how to make generalizations from survey data about where some dialect feature might be found. We have prepared a sophisticated cellular automaton (CA) for use with authentic, large-scale data collected for the Linguistic Atlas Project ( that can address the problems involved in this type of data visualization. The developed CA seeds the simulation using real informant data, plots the locations where survey data were elicited, and then through the application of rules creates an estimate of the spatial distributions of selected features. The CA supports a variety of rules addressing both the elicited linguistic features and the social metadata available for each informant.

In a paper published in the esteemed Journal of English Linguistics in 2015 we show that our CA can create regions based on real data with simple rules following a definite, logical procedure. The CA differs significantly from the traditional process of drawing isoglosses as it provides a rigorous, repeatable process that grows regions based on initial data positions instead of drawing divisions subjectively between them. Through comparison of corresponding CA and density estimation (DE) plots we have shown that the CA can reduce the uncertainty in drawing boundaries. In addition, our CA method identifies regions for dialect features in a way that recognizes the many features that are in use at the same time, sometimes in the same locations, and incorporates competition between features.


Last updated: 21.11.2016