One dataset, many views: How AI and humans decode emotion in learning
At the heart of the University of Oulu’s LeaF research infrastructure anticipation fills the air. Postdoctoral researcher Tiina Törmänen adjusts the camera angles while Assistant professor Haoyu Chen fine-tunes the video analysis setup that integrates multimodal large language models with learner self-annotation. In this context, multimodality refers to diverse data collected by observing subtle human behaviours, ranging from micro-expressions to the use of voice. This day marks the start of data collection for their studies.
Törmänen’s research explores how emotions influence collaboration and evolve during the learning process. Her goal is to identify indicators of emotions and emotion regulation that could help AI and humans co-create supportive emotional climates in real time. Meanwhile, Chen is using the same dataset to advance his work on decoding micro-gestures and embodied actions, those subtle movements that often reveal how learners feel and engage in group work.
The LeaF was set up for a collaborative learning session. Around 25 learning sciences students, divided into six small groups, will be working on project tasks across seven sessions. Each 1.5 hour session will be recorded through multiple channels: high-resolution video, directional audio, log data, situational self- and group-reports, and self-annotations of emotional expressions analyzed by AI. These annotations also include learners’ own reflections on how the AI interpreted their emotions. Altogether, the study will produce, around 63 hours of rich, multimodal data.
The study has been developed in collaboration between the two researchers both working in Hybrid Intelligence research programme at the University of Oulu. Törmänen, an expert in socially shared regulation of learning and emotional dynamics, designed the study to capture the real-time interplay between motivation and emotion. Chen developed algorithms capable of detecting subtle behaviors such as a lean forward or a shared glance, gestures that can signal both engagement and underlying emotional states.
"This is hybrid intelligence in action"
In the next phase, the dataset will power a multimodal large language model that combines AI-based analysis with human interpretation. This hybrid approach aims to make the identification of emotional states and regulation processes in collaborative learning more precise, transparent, and context-aware.
“This is hybrid intelligence in action,” says Törmänen, watching the live feed.
She emphasizes that the integration of multimodal data streams enables robust data triangulation, enhancing the validity and depth of insights derived from complex learning interactions. Chen adds, “One day, these insights could help learners respond to challenges in real time, supporting learning together not just cognitively, but emotionally. However, such tools require careful consideration of ethical principles and regulatory frameworks to ensure responsible use of sensitive data and equitable access to these technologies.”
As Chen describes, the Hybrid Intelligence research programme approaches the use of AI in a truly human-centered and responsible way. That´s why not only data experts and human scientists are involved in the core of the research but also studies in ethics guide the goals and decision-making of the research.
LeaF Research Infrastructure
LeaF at the University of Oulu is a research infrastructure that supports multimodal data collection studies. It offers acoustically optimized and flexible spaces for diverse types of data collection, including XR and eye-tracking. LeaF specializes in video, audio, and biosensor measurements for both human-to-human and human-computer interaction, as well as learning research. LeaF’s facilities are open to all University of Oulu researchers.