Four lectures in August 12

Date: 
12.8.2014 09:00
Place: 
TS 107

Title 1: Speaker recognition -- a Brief Overview and Vulnerability Under Spoofing Attacks
Presenter: Dr. Tomi Kinnunen, University Researcher

Title 2: Audio-Visual Speech Processing
Presenter: Dr. Gerard Chollet, Emeritus senior researcher, CNRS-LTCI, Telecom-ParisTech; Nokia Visiting Professor at UEF

Title 3: Quality Metrics in Calibration of Biometric Match Scores
Presenter: Dr. Rahim Saeidi, University of Eastern Finland

Title 4: Automatic Regularization of Cross-entropy Cost for Speaker Recognition Fusion
Presenter: Dr. Ville Hautamäki, University of Eastern Finland

Time: Tuesday 12 August, 9:00-12:00
Place: TS 107

More details about the presentations/presenters:

Title: Speaker recognition -- a Brief Overview and Vulnerability Under Spoofing Attacks

Presenter: Dr. Tomi Kinnunen, University Researcher

Abstract: This talk gives a brief overview of the recent speech processing research at the Speech and Image Processing Unit (SIPU) of the University of Eastern Finland. A particular focus area of the group has been speaker recognition (voice biometrics). This talk will have two parts. The first part will introduce the basic techniques used by voice biometric systems, intended for listeners who are less familiar with speech technology. The second part then focuses on a well-known problem of biometric technology, namely, spoofing attacks. There are four main ways to spoof (circumvent) speaker verification systems: impersonation, replay, voice conversion and speech synthesis. This talk focuses mostly on voice conversion attacks.

Biography: Tomi Kinnunen received the M.Sc., Ph.Lic. and Ph.D. degrees in Computer Science from the University of Joensuu (now University of Eastern Finland, UEF), Finland, in 1999, 2004 and 2005, respectively. From 2005 to 2007, he worked as an associate scientist at the Institute for Infocomm Research (I2R), Singapore. Since 2007, he has been with UEF. From 2010 to 2012, he was funded by a post-doc grant from Academy of Finland and he currently holds the position of university researcher. He serves as an associate editor in Digital Signal Processing. He was the chair of Odyssey 2014: the Speaker and Language Recognition Workshop. His primary research interests include speaker recognition, robust speech modeling and feature extraction, pattern recognition and biometrics.

*********************************************************

Title: Audio-Visual Speech Processing

Presenter: Dr. Gerard Chollet, Emeritus senior researcher, CNRS-LTCI, Telecom-ParisTech; Nokia Visiting Professor at UEF

Abstract: Speech is not only an audio signal. The speaker moves when he talks. In particular, lip movements are important for speech perception. This talk will review the combined exploitation of acoustic and visual cues for multimedia encoding, indexing, synthesis, speech and speaker recognition.

Biography: The education of Dr. Gérard Chollet, until the doctoral level, was centered on Mathematics (DUES-MP), Physics (Maîtrise), Engineering and Computer Sciences (DEA). He studied Linguistics, Electrical Engineering and Computer Science at the University of California, Santa Barbara where he was granted a PhD in Computer Science and Linguistics. He taught courses in Phonetics, Speech processing and Psycholinguistic in the Speech and Hearing department at Memphis State University in 1976-1977. Then, he had a dual
affiliation with the Computer Science and Speech departments at the University of Florida in 1977-1978. He joined CNRS (the French public research agency) in 1978 at the Institut de Phonétique in Aix en Provence. Since then, CNRS is his main employer. In 1981, he was asked to take in charge the speech research group of Alcatel. This was his first real experience with management. He negotiated the first research proposals with the European Commission under the ESPRIT program. The speech group of Alcatel moved subsequently to Telic in Strasbourg (where he consulted), then SEL in Germany, and finally Face and Telettra in Italy. In 1983, he joined a newly created CNRS research unit at ENST (under the leadership of Claude Gueguen). This unit has grown steadily since then (incorporating researchers from most of the departments of ENST). Dr. Gérard Chollet was head of the speech group before he left temporarily for IDIAP. The group contributed to a number of European projects such as SAM, ARS, FreeTel as well as national projects. In 1985, he spent a sabbatical year at IPO in Eindhoven, The Netherland. This is where he developed the `temporal decomposition' technique with Steven Marcus. That technique has been quite successful and is still under development at AT&T, CUED, ICP, LAFORIA, ENST, etc. In 1992, he was asked to participate to the development of IDIAP, a new research laboratory of the `Fondation Dalle Molle' in Martigny, Switzerland. He initiated a successful collaboration with the Swiss Telecom-PTT and attracted some funding from the Swiss Confederation (from National and European programs). IDIAP contributed to SpeechDat, M2VTS and other European projects. From 1996 to 2012, he was full time at ENST, managing research projects and supervising doctoral work.  Funding was secured from such projects as Eureka-Majordome and MajorCall, NoE-BioSecure, Strep-SecurePhone, IP-Companion@ble, AAL-vAssist, FET-ILHAIRE,. CNRS decided in july 2012 to grant him an emeritus status. He accepted a visiting Professor position in Boise State University for the academic year 2012-13. He teaches every year graduate courses (in Paris, Lausanne, Boise) in Speech, Signal processing and HCI. He supervise(d) a number of doctoral thesis (Omnes, Tassy, Vicard, Choukri, Fournier, DeLima, Bimbot, Montacié, Deleglise, Barbier, Valbret, Mokbel, Mathan, Mauc, Liu, Tadj, Homayounpour, Langlais, Cernocky, Genoud, Halber, Gravier, Verlinde, Kharroubi, Benayed, Karam, Sanchez-Soto, Lin, Bredin, Zouari, Bayeh, Perrot, Hueber, Bendris, Khemiri, Milhorat,...). His main research interests are in phonetics, automatic audio-visual speech processing, speech dialog systems, multimedia, pattern recognition, biometrics, digital signal processing, speech pathology, speech training aids, etc.

*********************************************************

Title: Quality Metrics in Calibration of Biometric Match Scores

Presenter: Dr. Rahim Saeidi, University of Eastern Finland

Abstract: Face is one of the common biometric modalities that is used by humans to recognize other people. On the other hand voice is sometimes the only way to recognize individuals when talking over the phone or the face is covered. In forensic applications, interpreting the match score of an automatic recognition system is only possible by presenting it as calibrated log-likelihood ratio. In this talk, I will present the state-of-the-art algorithms for authentication based on speech and facial image. Then I cover the importance of calibration
stage, elaborate on conventional approaches and spend the last part of my presentation on describing the quality metrics integration into calibration stage. The content of this presentation is based on our journal papers published in [1,2] and their follow up work.

References:

[1] M. I. Mandasari, M. Gunther, R. Wallace, R. Saeidi, S. Marcel and D. A. van Leeuwen, Score Calibration in Face Recognition, IET Biometrics, accepted for publication.
[2] M. I. Mandasari, R. Saeidi, M. McLaren and D. A. van Leeuwen, Quality Measure Functions for Calibration of Speaker Recognition System in Various Duration Conditions, IEEE Transactions on Audio, Speech and Language Processing, 21(11), pp. 2425-2438, November 2013.

Biography: Rahim Saeidi received the Ph.D. degree in computer science from the University of Eastern Finland (UEF, formerly Univ. of Joensuu) in 2011. He was a Marie Curie post-doctoral fellow in EU project Bayesian biometrics for forensics working at Radboud University Nijmegen in Netherlands from 2011 to 2013. After lecturing at Univesrity of Eastern Finland in 2013-2014, he is currently working a visiting research fellow in Department of Signal Processing and Acoustics at Aalto University. Dr. Saeidi is actively serving the research community and has been a member of organizing and scientific committee for several European national and international conferences. He has authored or co-authored over fifty peer-reviewed publications with research topics of robust speaker and speech recognition and speech enhancement. More information: http://cs.uef.fi/pages/saeidi/

*********************************************************

Title: Automatic Regularization of Cross-entropy Cost for Speaker Recognition Fusion

Presenter: Dr. Ville Hautamäki, University of Eastern Finland

Abstract: In this paper we study automatic regularization techniques for a fusion of automatic speaker recognition systems. To learn the regularization parameters can dramatically reduce fusion training time, in addition there will not be any need for splitting the development set into different folds for cross-validation. We utilize majorization-minimization approach to automatic ridge regression learning and design a similar way to learn LASSO regularization parameter automatically. By experiments we show improvement in using
automatic regularization.

Biography: Ville Hautamäki received the M.Sc. degree in a Computer Science from the University of Joensuu, Finland in 2005. He received the Ph.D. degree in Computer Science from the same university in 2008. He has worked as a research fellow at the Institute for Infocomm Research, A*STAR, Singapore and as a visiting scholar in Georgia Tech, USA. Currently, he is post-doctoral researcher in University of Eastern Finland, funded by Academy of Finland. His current research interests are machine learning, classifier fusion, speaker recognition
and language recognition.

 

Last updated: 11.8.2014