Talking Avatars: A Deep Learning Approach

Infotech Oulu Lecture Series

Date: Wednesday, August 12, 2015
Time: 10:15 - 11:00
Room: TS127

Lecturer: Professor Lei Xie, School of Computer Science, Northwestern Polytechnical University, Xi'an, China


In this talk, I will firstly give a brief introduction to the audio, speech and language processing group in the Northwestern Polytechnical University. After that, I will focus on a recent work towards photo-real talking avatar animation using deep learning technologies. Specifically, we propose to use deep bidirectional LSTM (BLSTM) for audio/visual modeling in our photo-real talking head system. Long short-term memory (LSTM) is a specific recurrent neural network (RNN) architecture that is designed to model temporal sequences and their long-range dependencies more accurately than conventional RNNs. Compared with our previous HMM-based system, the newly proposed deep BLSTM-based one is apparently better on both objective measurement and subjective A/B test. Finally in my talk, I will talk about our latest attempts on the voice conversion task that transforms source speech to the desired target speech.


Lei Xie received the Ph.D. degree in computer science from Northwestern Polytechnical University, Xi’an, China, in 2004. From 2001 to 2002, he was with the Department of Electronics and Information Processing, Vrije Universiteit Brussel, Brussels, Belgium, as a Visiting Scientist. From 2004 to 2006, he was a Senior Research Associate with the Center for Media Technology, School of Creative Media, City University of Hong Kong, Hong Kong, China. From 2006 to 2007, he was a Postdoctoral Fellow with the Human-Computer Communications Laboratory, Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Hong Kong. He is currently a Professor with School of Computer Science, Northwestern Polytechnical University, Xi’an, China. He has published more than 120 papers in major journals and conference proceedings, such as IEEE Transactions on Audio, Speech and Language Processing, IEEE Transactions on Multimedia, Pattern Recognition, ACM/Springer Multimedia Systems, Springer Multimedia Tools and Applications, ACL, ACM Multimedia, ICASSP, Interspeech, ICPR, and ICME. His current research interests include speech and language processing, multimedia, and human-computer interaction. Dr. Xie is a senior member of IEEE and a senior member of China Computer Federation. He serves as the Vice Director of the Speech Information Processing Technical Committee for the Chinese Information Processing Society of China. He has served as program chair, organizing chair, and program/organizing committee member for various major conferences. He was the publication chair of Interspeech2014. He will serve as the technical program chair of ISCSLP2016.

