Multimodal Interaction and Motivational Intention Characterization
MIMIC
Funders
Project information
Project duration
-
Funded by
Other Finnish
Project funder
Funding amount
20 000 EUR
Project coordinator
University of Oulu
Unit and faculty
Contact information
Project leader
Project description
This NRI positions motivational classroom interaction recognition as a hybrid-intelligence problem: neither AI nor human analysis alone can deliver scalable, valid identification of motivational interaction in learning contexts. The initiative, therefore, combines multimodal AI development with motivation and learning sciences expertise. We aim to collaboratively develop and pilot an MLLM for recognising motivational peer interaction from existing multimodal interaction data, alongside shared construct definitions, annotation practices, and human-in-the-loop validation routines that enable later upscaling. This establishes a hybrid pathway in which AI is iteratively trained through expert guidance, while researchers gain understanding of how motivational interaction is expressed across modalities.
The project responds to the research question: ”Can a multimodal large language model recognize motivational interactions among students in collaborative learning settings by integrating speech content, vocal prosody, and embodied cues?” We expect the multimodal model to outperform speech-content-only models, as prosodic and embodied signals provide critical context for interpreting motivational meaning
Project actions
MIMIC will be implemented through three interrelated work packages (WPs): two focusing on methodological and technical aspects of LLM development, and the third one focusing on collaborative activities between project partners and administrative issues. The work will be carried out in collaboration with the University of Potsdam and the Technical University Berlin (Germany).
Project results
The project produces one of the first empirically validated, multimodal approaches for analysing motivational interaction in peer collaboration by integrating semantic information from fine-tuned large language models (LLMs) with paralinguistic cues derived from high-quality speech and video data. The resulting methodological pipeline will contribute directly to current research on collaborative learning, learning analytics, and hybrid intelligence by offering a new way to detect motivational processes as they unfold in authentic educational settings. The multimodal dataset and analysis framework will provide researchers with tools to examine the interplay between verbal content, prosody, and interaction dynamics in unprecedented detail, supporting new directions in motivational research, social interaction modelling, and the development of AI systems capable of interpreting nuanced human behaviour.
The project’s primary contribution lies in developing robust research methods, such as multimodal large language models, that enable precise analysis of motivational interactions in classroom interactions. These methodological innovations will produce evidence-based insights that guide educators in designing practices that sustain engagement and equity. Furthermore, the methods will inform the creation of intelligent tools capable of detecting and responding to motivational dynamics in real time. By advancing both scientific understanding and practical applications, the project strengthens the foundation for scalable, research-driven educational technologies, ultimately promoting motivated lifelong learning.