Demographic inference and affect estimation of microbloggers

Thesis event information

Date and time of the thesis defence

Place of the thesis defence

L6, Linnanmaa,

Topic of the dissertation

Demographic inference and affect estimation of microbloggers

Doctoral candidate

Master of Technology Abhinay Pandya

Faculty and unit

University of Oulu Graduate School, Faculty of Information Technology and Electrical Engineering, Center for Ubiquitous Computing (UBICOMP)

Subject of study

Computer Science


Professor Stephan Oepen, Department of Informatics, University of Oslo


Research Professor Mourad Oussalah, Center for Machine Vision and Signal Analysis (CMVS), Faculty of ITEE, University of Oulu

Add event to calendar

Demographic inference and affect estimation of microbloggers

Owing to the peculiar nature of the discourse on Twitter, developing analytical frameworks to derive useful insights from Twitter remains challenging as evidenced by the poor performance at tasks such as reliable demographic inference, affect estimation, and event detection. One of the focal problems lies in analyzing short texts in general, and tweets in particular. The analysis is as such made difficult because of the vagaries of the linguistic expressions and Twitter further exacerbates this by enabling the use of emojis, hashtags, URLs, and embedded media.

While the previous research has demonstrated ways of extracting useful information from individual tweet- texts to some extent, a detailed and thorough investigation of the role of metadata has not yet been systematically performed. Furthermore, a majority of the previous work has paid little or no attention to the emerging role of deep learning approaches in Twitter-based analytics. These observations motivate this thesis, which aims to enhance machine understanding of tweets towards deriving deeper insights from the public data on Twitter and inform the scientific objectives of this thesis.

First, this thesis sets out to empirically investigate the impact and efficacy of deep learning approaches integrating message-text and metadata leveraging on the distributed semantic representations of textual entities. Second, the thesis contributes towards improving capturing enhanced semantics from tweets by harnessing external, open-sourced knowledge graphs and other crowd-sourced lexical resources. Third, the role of the user-created metadata, such as hashtags and URLs, in machine understanding of tweets is examined and quantified. At the same time, computational models are introduced to derive conversational, topical, and temporal contexts of tweets and utilize them in machine learning models to improve Twitter-based analytics.

Validation of the proposed novel machine learning models integrating the diverse footprints of users' online activity/behavior is achieved by employing them in various case study applications. In addition, the datasets and the tools developed during this thesis have been made available publicly for the scientific community.
Last updated: 1.3.2023