Deep Learning for Human-Centered vision & Multimodal conversational systems

The course gives an introduction to building Multimodal Conversational Systems, that involve both multimodal analysis (of the users) and multimodal synthesis (on virtual agents). First, we recap Deep Learning (DL) for scene centered vision - object recognition and object detection. As regards DL for human centered vision, we introduce the problem of joint location estimation, 3D facial/body mesh estimation and emotion recognition. We then discuss models for natural language understanding and dialog generation - task based and generative. Finally, we discuss both Machine Learning (ML) and DL based models for generating speaking and listening behavior.

Event information

Time

-

Venue location

TS126 Linnanmaa campus

Add event to calendar

External teacher(s): Associate Professor Dr. Dinesh Babu Jayagopi
External teacher(s) organization: IIIT Bangalore
ECTS-credits: 2 ECTS
Grade: Pass or Fail

Assessment: Research report / Essay after the lectures

May 23, 10.15-12.00, Tuesday Morning: Lecture 1 Introduction to Multimodal Conversational Systems

May 23, 13.15-15.00, Tuesday Afternoon: Lecture 2 Deep Learning for scene centered vision

May 24, 10.15-12.00, Wed Morning: Lecture 3 Natural Language Understanding (NLU)

May 24, 13.15-15.00, Wed Afternoon: Lecture 4 Dialog generation

May 25, 10.15-12.00, Thursday Morning: Lecture 5 Gesture generation – Machine Learning based

May 25, 13.15-15.00, Thursday Afternoon: Lecture 6 Gesture generation – Deep Learning based

Learning objectives and contents:
The course gives an introduction to building Multimodal Conversational Systems, that involve both multimodal analysis (of the users) and multimodal synthesis (on virtual agents). First, we recap Deep Learning (DL) for scene centered vision - object recognition and object detection. As regards DL for human centered vision, we introduce the problem of joint location estimation, 3D facial/body mesh estimation and emotion recognition. We then discuss models for natural language understanding and dialog generation - task based and generative. Finally, we discuss both Machine Learning (ML) and DL based models for generating speaking and listening behavior.

Tentative topics:
Deep learning, multimodal conversational systems, NLP, Computer Vision

Amount of contact teaching hours ~12 hours lectures

Contact person(s): Miguel Bordallo López & Praneeth Susarla

Last updated: 12.4.2023