Infotech Oulu Annual Report 2015 - Center for Machine Vision Research (CMV)


Background and Mission

creative, open and internationally attractive research unit. It is renowned world-wide for its expertise in computer vision, which now spans for nearly 35 years.

The Center has achieved ground-breaking research results in many areas of its activity, including texture analysis, facial image analysis, 3D computer vision, and energy-efficient architectures for embedded systems.  Among the highlights of its research are the Local Binary Pattern (LBP) methodology, LBP-based face descriptors, and methods for geometric camera calibration, which all are highly-cited and widely used around the world.  The areas of application for CMV's current research include affective computing, perceptual interfaces for human-computer interaction, biometrics, augmented reality, and biomedical image analysis. The CMV has a wide international collaboration network to support its research mobility.

In spring 2016, the staff of CMV consists of three Professors, one Associate Professor, two FiDiPro Professors and one FiDiPro Fellow, 15 senior or postdoctoral researchers, and 16 doctoral students or research assistants. We have also visiting scholars coming with their own funding. The unit is highly international: over 50% of our researchers (doctors, PhD students) come from abroad. CMV has an extensive international collaboration network in Europe, China, Japan, Australia, and the USA. Both outgoing and incoming mobility of researchers is intense to/from leading research groups abroad. In 2015, CMV participated in two European COST actions.


Scientific Progress

The current main areas of research are: 1) Computer vision methods, 2) Human-centered vision systems, 3) Vision systems engineering, and 4) Biomedical image analysis.

The group has a long and highly successful research tradition in two important generic areas of computer vision: texture analysis and geometric computer vision. Vision systems engineering has been a basis for many practical machine vision applications developed in the group. In recent years, computational photography, object detection and recognition, and biomedical image analysis have also become important research topics in CMV. The aim in all these areas is to create a methodological foundation for the development of new vision-based technologies and innovations.

Highlights and Events in 2015

In 2015, CMV received funding for two FiDiPro projects from Tekes – The Finnish Funding Agency for Innovation. Two distinguished computer vision scientists, Jiri Matas (Czech Technical University in Prague) and Stefanos Zafeiriou (Imperial College London), will visit the University of Oulu on a regular basis in 2016-2019 and contribute to a joint research agenda. Both FiDiPro projects will also benefit a set of companies as project partners.

Professor Xilin Chen, whose term as FiDiPro Professor will end in 2016, was elevated to IEEE Fellow. He is being recognized for his contributions to machine vision for facial image analysis and sign language recognition.

According to a Web of Science analysis, CMV continues to excel in producing highly-cited papers, which is a commonly used measure for breakthrough research. In 2005-15, it had five papers in top-20, among all 18,522 Finnish papers (19.4.2016) published in Engineering category of Web of Science during that period, ranking 2nd, 5th, 7th, 8th, and 20th.

In November, our research on hidden facial expressions was reported by MIT Technology Review. The article was based on our report on reading hidden emotions from spontaneous micro-expressions published in arXiv.

In May, the 2015 MVA Most Influential Paper over the Decade Award was granted to Dr. Vili Kellokumpu, Prof. Matti Pietikäinen and Prof. Janne Heikkilä. This award is given to the authors of papers that were presented at the IAPR International Conference on Machine Vision Applications (MVA) held ten years before (this time MVA 2005) and have been recognized as having the most significant influence on machine vision technologies over the subsequent decade.

CMV members have been active in co-editing special issues in prestigious journals: Dr. J. Chen, Dr. G. Zhao and Prof. M. Pietikäinen, together with Dr. Z. Lei (from Chinese Academy of Sciences) and Dr. L. Liu (from National University of Defense Technology, China) co-edited the Special Issue on robust local descriptors for computer vision in Neurocomputing journal; Dr. G. Zhao and Prof. M. Pietikäinen, together with S. Zafeiriou, I. Kotsia (from Middlesex University, UK), J. Cohn (from University of Pittsburgh/CMU, USA ) and R. Chellappa (from University of Maryland), have been co-editing the Special issue on Spontaneous Facial Behaviour Analysis (SFBA) for Computer Vision and Image Understanding journal; Prof. Heikkilä together with Prof. L. Xie and Dr. Zhang (both from Northwestern Polytechnic University, China) co-edited the Special Issue on Immersive Audio/Visual Systems in Multimedia Tools and Applications journal.

Computer Vision Methods

Texture Analysis

Texture is an important characteristic of many types of images and can play a key role in a wide variety of applications of computer vision and image analysis. The CMV has long traditions in texture analysis research, and ranks among the world leaders in this area. The Local Binary Pattern (LBP) texture operator has been highly successful in numerous applications around the world, and has inspired plenty of new research on related methods, including the blur-insensitive Local Phase Quantization (LPQ) method, Weber Law Descriptor (WLD), and Binarized Statistical Image Features (BSIF), also developed by researchers of CMV.

We proposed a globally rotation invariant multi-scale co-occurrence local binary pattern (MCLBP) feature for texture-relevant tasks. In MCLBP, we arrange all co-occurrence patterns into groups according to properties of the co-patterns, and design three encoding functions (Sum, Moment, and Fourier Pooling) to extract features from each group. The MCLBP can effectively capture the correlation information between different scales and is also globally rotation invariant (GRI). The MCLBP is substantially different from most existing LBP variants including the LBP, the CLBP, and the MSJ-LBP that achieves rotation invariance by locally rotation invariant (LRI) encoding. Extensive experiments demonstrated the effectiveness of the MCLBP compared to many state-of-the-art LBP variants including the CLBP and the LBPHF.

We also proposed a local feature, called Local Orientation Adaptive Descriptor (LOAD), to capture regional texture in an image. In LOAD, we propose to define point description on an Adaptive Coordinate System (ACS), adopt a binary sequence descriptor to capture relationships between one point and its neighbors and use multi-scale strategy to enhance the discriminative power of the descriptor. The proposed LOAD enjoys not only discriminative power to capture the texture information, but also has strong robustness to illumination variation and image rotation. Extensive experiments on benchmark data sets of texture classification and real-world material recognition show that the LOAD yields the state-of-the-art performance. By combining LOAD with Convolutional Neural Networks (CNN), we obtain significantly better performance than both the LOAD and CNN. This result confirms that the LOAD is complementary to the learning-based features.

LBPs are considered among the most computationally efficient high-performance texture features. However, the LBP method is very sensitive to image noise and is unable to capture macrostructure information. To best address these disadvantages, we collaborated with Dr. Li Liu, (National University of Defense Technology, China) and others, to introduce a novel descriptor for texture classification, the Median Robust Extended Local Binary Pattern (MRELBP). Different from traditional LBP and many LBP variants, MRELBP compares regional image medians rather than raw image intensities. A multiscale LBP type descriptor is computed by efficiently comparing image medians over a novel sampling scheme, which can capture both microstructure and macrostructure texture information. A comprehensive evaluation on benchmark datasets reveals MRELBP’s high performance - robust to gray scale variations, rotation changes and noise - but at a low computational cost. MRELBP has produced the best classification scores on many different test databases. More importantly, MRELBP is highly robust to image noise including Gaussian noise, Gaussian blur, Salt-and-Pepper noise and random pixel corruption. A paper on this method was published in IEEE Transactions on Image Processing.

Dynamic texture and scene classification are two fundamental problems in understanding natural video content. Extracting robust and effective features is a crucial step towards solving these problems. However, the existing approaches suffer from the sensitivity to either varying illumination, or viewpoint changes, or even camera motion, and/or the lack of spatial information. Inspired by the success of deep structures in image classification, we attempted to leverage a deep structure to extract features for dynamic texture and scene classification. To tackle with the challenges in training a deep structure, we propose to transfer some prior knowledge from image domain to video domain. To be more specific, we propose to apply a well-trained Convolutional Neural Network (ConvNet) as a feature extractor to extract mid-level features from each frame, and then form the video-level representation by concatenating the first and the second order statistics over the mid-level features. We term this two-level feature extraction scheme as a Transferred ConvNet Feature (TCoF). Moreover, we explore two different implementations of the TCoF scheme, i.e., the spatial TCoF and the temporal TCoF. In the spatial TCoF, the mean-removed frames are used as the inputs of the ConvNet; whereas in the temporal TCoF, the differences between two adjacent frames are used as the inputs of the ConvNet. We evaluated systematically the proposed schemes on three benchmark data sets, including DynTex, YUPENN, and Maryland, demonstrating that the proposed approach yields excellent performance.

Local descriptors are popular ways to characterize the local properties of images in various computer vision based tasks. To form the global descriptors for the image regions, the first-order feature pooling is widely used. However, as the first-order pooling technique treats each dimension of local features separately, the pairwise correlations of local features are usually ignored. Encouraged by the success of recently developed second-order pooling techniques, we formulate a general second-order pooling framework and explore several analogues of the second-order average and max operations. We comprehensively investigate a variety of moments which are in the central positions to the second-order pooling technique. As a result, the superiority of the second-order standardized moment average pooling (2Standmap) is suggested. The 2Standmap provides a unified approach to capsule both low-level information from raw features and the mid-level visual cues from the local descriptors. It is of low dimension, discriminative, efficient, and learning free. We successfully apply 2Standmap to four challenging tasks namely texture classification, medical image analysis, pain expression recognition, and micro-expression recognition. It illustrates the effectiveness of 2Standmap to capture multiple cues and the generalization to both static images and spatial-temporal sequences.

Overview of the second pooling framework 2Standmap. Given an image region I, for each pixel x inside, a vector f(x) concatenating all local features is extracted. We then perform the second order pooling to obtain the global region descriptor g. Two key parts of the second order pooling namely 2nd order average or max pooling, and Non Linear mapping are comprehensively investigated in our work.


Background Subtraction

Foreground object segmentation from a video stream is a fundamental and critical step for many high level computer vision tasks. The accuracy of segmentation can significantly affect the overall performance of the application employing it. Background subtraction is generally regarded as an effective method for extracting the foreground. However, the background in a complex environment may include distracting motions and hence makes precise segmentation challenging.

Illustration of proposed background subtraction framework, and results of detected foreground and recovered background.


Low rank and sparse representation based methods, which make few specific assumptions about the background, have recently attracted wide attention in background modeling. With these methods, moving objects in the scene are modeled as pixel-wised sparse outliers. However, in many practical scenarios, the distributions of these moving parts are not truly pixel-wised sparse but structurally sparse. Meanwhile a robust analysis mechanism is required to handle background regions or foreground movements with varying scales. Based on these two observations, we first introduce a class of structured sparsity-inducing norms to model moving objects in videos. In our approach, we regard the observed sequence as being constituted of two terms, a low-rank matrix (background) and a structured sparse outlier matrix (foreground). Next, in virtue of adaptive parameters for dynamic videos, we propose a saliency measurement to dynamically estimate the support of the foreground. Experiments on challenging well known data-sets demonstrate that the proposed approach outperforms the state-of-the-art methods and works effectively on a wide range of complex videos.

Computational Photography

In computational photography, the aim is to develop techniques for computational cameras that give more flexibility to image acquisition and enable more advanced features to be employed, going beyond the capabilities of traditional photography. These techniques often involve use of special optics and digital image processing algorithms that are designed to eliminate the degradations caused by the optical system and viewing conditions.

Multi-aperture camera refers to an imaging device that consists of several camera units, each having dedicated optics and color filter. The camera produces several sub-images, which are combined into a single RGB image. Such camera is a feasible alternative to traditional Bayer filter camera in terms of image quality, camera size and camera features.

The main challenge of the multi-aperture camera arises from the fact that each camera unit has a slightly different viewpoint. We have developed a novel image fusion algorithm that corrects the parallax error between the sub-images using a disparity map. In order to improve the disparity estimation we combine the matching costs over multiple views. Promising test results imply that multi-aperture camera has potential to become a serious competitor to Bayer filter cameras in portable devices.

A multi-aperture camera with four sensors for different color channels.


Image registration is one of the most important and most frequently discussed topic in the image processing literature, and it is a crucial preliminary step in all the algorithms in which the final result is obtained from a fusion of several images, e.g. multichannel image deblur, super-resolution, depth-of-field extension, etc. In many cases, the images to be registered are inevitably blurred. We developed an original registration method designed particularly for registering blurred color images. In order to handle color images we developed an approach that is based on an extension of the Wiener filter that operates on quaternion signals. We proved experimentally its good performance which is independent from the amount of blur contained in the images and on the illumination changes of the scene.

Two images acquired with different focus settings and under different illumination conditions are correctly aligned using our blur-invariant registration algorithm based on the quaternion Wiener filter.


Object Detection and Recognition

Nowadays object detection and recognition is an important research area in computer vision as it has potential to facilitate search from very large unannotated image databases. However, even the best current systems for artificial vision have a very limited understanding of objects and scenes. For instance, state-of-the-art object detectors model objects as distributions of simple features (e.g., HOG or SIFT), which capture a blurred statistics of the two-dimensional shape of the objects. Color, material, texture, and most of the other object attributes are likely ignored in the process.

Automatic Video Content Annotation

TV and movie industry has seen a paradigm shift during the last years, as the Internet has become one of the most important distribution channels. Platforms, such as the BBC's iPlayer, and distributors and portals like Netflix, Amazon Instant Video, Hulu, and Youtube, have millions of users every day. Due the amount of media, the search is essential functionality in all platforms. These are typically based on video title and metadata, but not on actual visual context. Enabling such search option would require annotating the video content, which is very laborious to do manually. In this project we aim to enable fully automatic system that labels video based on information fusion from the visual and textual sources.

Most often the video is accompanied textual information in the form of subtitles and transcripts. Subtitles contain the information on who is saying what and when, whereas transcripts describe actions in the scene on a higher level without very precise timing. In our research, we have developed a learning system that is able to utilize this kind of weak supervision to learn to detect and recognize the actors appearing in the movies. Moreover, unlike previous works, we are able to detect and classify characters who are not speaking at all.

One of the very recent innovations is a system that utilizes higher level cues like timely overlapping face tracks and overall number of detections per character in the decision making process. In addition, we learn novel convolutional neural network features that represent entire face tracks. These features provide to be superior compared to previous generation methods based on hand-crafted descriptors or individual frame representations.

Learning Based Image Representations

Finding correspondences between image regions (patches) is a key factor in many computer vision applications. Structure-from-motion, multi-view reconstruction, image retrieval and object recognition require an accurate computation of similarity. Due to importance of these problems various descriptors have been proposed for patch matching with the aim of improving accuracy and robustness. Many of the most widely used approaches, like SIFT or DAISY descriptors, are based on hand-crafted features and have limited ability to cope with negative factors (occlusions, variation in viewpoint etc.) making a search of similar patches more difficult. Recently, various methods based on supervised machine learning have been successfully applied to learning patch descriptors.

In our research, we have developed a new learning based image patch descriptor. The new descriptor is learned directly form the data using a Siamese convolutional neural network constructions. A key feature in the obtained descriptor is that by design the pairwise comparisons are possible using simple Euclidean distance. This ability, which is not true for all learning based descriptors, make the utilization of this work simpler.

Semantic illustration of the Siamese network structure that is utilized to learn efficient image patch descriptors.


3D Computer Vision

Three-dimensional scene reconstruction from multiple images has been a popular research topic in computer vision for several decades. During recent years, there has been significant progress both in large-scale structure from motion (SfM) and simultaneous localization and mapping (SLAM). One example is our DT-SLAM system published in late 2014, which is an open source implementation for real-time SLAM.

In 2016, we have focused on 3D modeling from point clouds produced with image-based 3D reconstruction techniques. In particular, we developed a method for evaluating triangle mesh models. The method takes the reconstructed mesh or point cloud and the ground truth as input and outputs two values, namely Jaccard index and compression ratio, which represent the accuracy, completeness and compactness of the reconstruction. The previous evaluation metrics usually measure only the accuracy and completeness of the mesh whereas our method is able to measure the compactness-accuracy trade-off of the reconstructions. The evaluation method presented in the figure below.

A framework for evaluating triangular 3D mesh models.


Range cameras have greatly simplified many 3D vision problems that have been considered to be extremely difficult using conventional imaging techniques. While Microsoft Kinect was the first inexpensive range sensor that gained a lot of interest in the computer vision community, range cameras are now also emerging to mobile devices. Google Tango has been the first mobile platform that embeds many sensors including a range camera. It provides both depth maps and color images (so called RGB-D data) and also relatively good and robust odometry, which is computed by combining visual and inertial sensor information. Our aim has been to correct the errors, e.g. drift, that the Tango's built-in hardware and software produce, and thereby improve the final outcome of the 3D scene reconstruction.

A 3D reconstruction of CMV’s premises obtained with the Google Tango device.


Recent improvements in scanning technologies such as consumer penetration of RGB-D cameras, lead obtaining and managing range image databases practical. Hence, the need for describing and indexing such data arises. In one of our study, we focused on similarity indexing of range data among a database of range objects (range-to-range retrieval) by employing only single view depth information. We utilized feature based approaches both on local and global scales. However, the emphasis was on the local descriptors with their global representations. We presented a comparative study with extensive experimental results. In addition, we introduced a publicly available range object database which is large and has a high diversity that is suitable for similarity retrieval applications. The simulation results indicate competitive performance between local and global methods. While better complexity trade-off can be achieved with the global techniques, local methods perform better in distinguishing different parts of incomplete depth data. It could also be concluded that global and local descriptors can be merged to achieve a higher performance in depth data similarity retrieval applications.

Human-Centered Vision Systems

In future ubiquitous environments, computing will move into the background, being omnipresent and invisible to the user. This will also lead to a paradigm shift in human-computer interaction (HCI) from traditional computer-centered to human-centered systems. We expect that computer vision will play a key role in such intelligent systems, enabling, for example, natural human-computer interaction, or identifying humans and their behavior in smart environments.

Face Recognition and Biometrics

We continued our research on unconstrained face recognition by investigating the recent binarized statistical image features (BSIF) local descriptor method. BSIF is a learning based method that can be used to learn application specific descriptors. In our first work, we combined BSIF and a novel soft-assignment method yielding state-of-the-art results with the well-known FERET database. The proposed method performed well also in unconstrained settings where we evaluated its performance on the LFW database. Motivated by the positive results using BSIF, we performed a review of unsupervised feature learning methods and found a method called Reconstruction ICA (RICA) which is quite similar to the ideas behind BSIF. Using RICA and the popular Bag-of-Features model we constructed a face representation method that was able to outperform the state-of-the-art in the unsupervised evaluation category of the original LFW protocol.

We also continued our research on face biometrics under spoofing attacks. While previous research has mainly been focused on analyzing the luminance of the face images, hence discarding the chrominance information which can be useful for discriminating fake faces from genuine ones, we proposed a new face anti-spoofing method based on color texture analysis. We analyzed the joint color-texture information from the luminance and the chrominance channels using a color local binary pattern descriptor. More specifically, the feature histograms are extracted from each image band separately. Extensive experiments on two benchmark datasets showed excellent results compared to the state-of-the-art. This work was ranked among the best 10% papers presented and published at the IEEE International Conference on Image Processing (ICIP 2015).

Color based face spoofing detection.


Continuing the anti-spoofing research theme, we investigated audiovisual synchrony assessment for replay attack detection in talking face biometrics. Audiovisual speech synchrony detection is an important liveness check for talking face verification systems in order to make sure that the input biometric samples are actually acquired from the same source. In prior work, the used visual speech features have been mainly describing facial appearance or mouth shape in frame-wise manner, thus ignoring the lip motion between consecutive frames. Since also the visual speech dynamics are important, we took the spatiotemporal information into account and proposed the use of space-time auto-correlation of gradients (STACOG) for measuring the audiovisual synchrony. For evaluating the effectiveness of the proposed approach, a set of challenging and realistic attack scenarios were designed. Extensive experimental analysis showed that the STACOG features outperform the state of the art in measuring the audiovisual synchrony.

Audiovisual synchrony assessment for replay attack detection in talking face biometrics.


Although still-to-video face recognition is an important function in watch list screening, state-of-the-art systems often yield limited performance due to camera inter-operability and to variations in capturing conditions. Therefore, the visual comparison of faces captured in unconstrained low-quality videos against a matching high-quality reference facial still image captured under controlled conditions is required in many surveillance applications. To improve the visual appearance of faces captured in videos, we studied a new super-resolution (SR) pipeline that is suitable for fast adjudication of face-matches produced by an automated system. In this pipeline, face quality measures are used to rank and select face captures belonging to a facial trajectory, and multi-image SR iteratively enhances the appearance of a super-resolved face image. Face selection is optimized and registered using graphical models. Experimental results showed that the proposed pipeline efficiently produces super-resolved face images by ranking best quality ROIs in a trajectory. To select the best face captures for SR, this pipeline exploits a strong correlation existing between pose and sharpness quality measurements. This work was done in a close collaboration with Ecole de Technologie Superieure, Universite du Quebec, Canada.

We co-organized a special issue on the theme of soft biometrics at the Pattern Recognition Letters Some of the articles are extended versions of selected papers presented at the "1st International Workshop on Soft Biometrics," which was held in conjunction with the European Computer Vision Conference (ECCV 2014) in Zurich on September 7th 2014 and promoted by the European COST Action IC1106 "Integrating Biometrics and Forensics for the Digital Age." The special issue featured a review paper entitled "On Soft Biometrics," written by experts in soft biometrics research. The article provided an introduction to the topic and an overview of the history and achievements in the extraction and use of soft biometrics traits. A new definition of the term is also presented: "the estimation or use of personal characteristics describable by humans that can be used to aid or effect person recognition." The other articles covered a wide range of aspects of soft biometrics research and provide an excellent summary of the state-of-the-art and recent developments in this evolving field. The articles included in this special issue are just samples of work being carried out by different research groups that illustrate some of the soft biometrics traits, their potential applications, and many open issues and challenges. We believe the articles provide important insights into this relatively new research field. We hope that the readers enjoy reading these selected articles and find novel research avenues and agendas that can push forward the field of soft biometrics. The guest editors of this special issue were Paulo Lobato Correia (Portugal), Abdenour Hadid (Finland), Thomas B. Moeslund (Danemark).

How much information Microsoft Kinect facial depth data can reveal about identity, gender and ethnicity? To answer this research question, we explored the usefulness of the depth images provided by Microsoft Kinect sensors in different face analysis tasks including identity, gender and ethnicity. Four local feature extraction methods (LBP, LPQ, HOG and BSIF) were investigated for both face texture and shape description. Extensive experimental analysis on three publicly available Kinect face databases was conducted. The experimental analysis yields into interesting findings. Furthermore, a comprehensive review of the literature on the use of Kinect depth data in face analysis was provided along with the description of the available databases.

Examples of 2D cropped image (left) and corresponding 3D face image (right) obtained with the Microsoft Kinect sensor after preprocessing.


We investigated kinship verification from faces using different local texture features obtaining promising results. During the experimental analysis, we noticed that many of the existing kinship databases are limited or biased. For instance, the Kinship Face in the Wild data sets (KinFaceW I & II), published in TPAMI, are commonly used for the evaluation of kinship verification algorithms. We noted that the images in these data sets have relationship pairs that in many cases have been cropped from the same original images. This cropping fact, when taken into account, can significantly bias and simplify the classification problem. A classification strategy that tries to determine if both images in a pair are cropped from the same photo will show improvements when compared to approaches focusing only on facial features. To illustrate this anomaly, we presented an extremely simple classification method that requires no training and offers comparable results to the ones obtained with sophisticated methods under the same experimental protocol. This calls the research community for joint efforts to design new and more reliable databases and evaluation protocols to advance the kinship verification research. For fair comparison, the new databases should include the conditions of the image acquisition and discuss the potential implication of the limitations of the data sets. To ensure the reproducibility of the results, the source code of the different methods should be made publicly available.

Recognition of Facial Expressions and Emotions

The face is the key component in understanding the emotions, and this plays significant roles in many areas, from security and entertainment to psychology and education.

Micro-expressions (MEs) are rapid, involuntary facial expressions which reveal emotions that people do not intend to show. Studying MEs is valuable as recognizing them has many important applications, particularly in forensic science and psychotherapy. However, analyzing spontaneous MEs is very challenging due to their short duration and low intensity. In particular, spontaneous MEs (in contrast to posed MEs) are highly challenging due to their large variability in both appearance and duration. Recently, there are increasing interests in inferring mirco-expression from facial image sequences. Automated computer vision analysis of micro-expressions consists of emotion recognition and detection in the video.

For micro-expression recognition, feature extraction is an important critical issue. In our research, we proposed two new spatiotemporal feature descriptors for analyzing micro-expression.

In the first work, we proposed a novel framework based on a new spatiotemporal facial representation (called as spatiotemporal local binary pattern with integral projection) to analyze micro-expressions with subtle facial movement. Firstly, we propose to use an integral projection method based on difference images for obtaining horizontal and vertical projection, which can preserve the shape attribute of facial images and increase the discrimination for micro-expressions. Furthermore, we employ the local binary pattern operators to extract the appearance and motion features on horizontal and vertical projections. Intensive experiments are conducted on three availably published micro-expression databases for evaluating the performance of the method. Experimental results demonstrate that the new spatiotemporal descriptor can achieve promising performance in micro-expression recognition.


The procedure of spatiotemporal local binary pattern with integral projection.


In the second work, we considered that LBP-TOP suffers from two critical problems, causing a decrease in the performance of micro-expression analysis. It generally extracts appearance and motion features from the sign-based difference between two pixels but not yet considers other useful information. As well, LBP-TOP commonly uses classical pattern types which may be not optimal for local structure in some applications. Therefore, we proposed SpatioTemporal Completed Local Quantization Patterns (STCLQP) for facial micro-expression analysis. Firstly, STCLQP extracts three interesting information containing sign, magnitude and orientation components. Secondly, an efficient vector quantization and codebook selection are developed for each component in appearance and temporal domains to learn compact and discriminative codebooks for generalizing classical pattern types. Finally, based on discriminative codebooks, spatiotemporal features of sign, magnitude and orientation components are extracted and fused. Experiments are conducted on three publicly available facial micro-expression databases. Some interesting findings about the neighboring patterns and the component analysis are concluded. Comparing with the state of art, experimental results demonstrate that STCLQP achieves a substantial improvement for analyzing facial micro-expressions.

The framework for obtaining spatiotemporal feature using STCLQP.

Micro-expression detection problem is defined as locating the onset frame (when the facial muscles start moving), peak frame (when the muscle achieves maximum contraction) and offset frame (when the movement disappears). Currently, all the research work on spontaneous micro-expression detection has focused on detecting the peak frames. We were motivated to work on detecting not only the peak frame but also the onset and offset frames. So, we decided to explore motion features because they capture the direction of movements of the face at every frame instance, as illustrated in images (a)-(c) and the trajectory changes at FACS Action Units (as shown in image (d)) can be used to find peak, onset and offset; the motion vectors from onset to peak follows a smooth trajectory and traverses this trajectory in opposite direction from peak to offset. In this approach, the optical flow vectors are added for each AU group across time and their magnitude is plotted as shown in image (e). In this magnitude plot, the peak frame corresponds to the peak and the appropriate onset and offset frames are searched. The drawback is that previous head motions, facial expressions and micro-expressions are accumulated. In order to tackle this problem, the optical flow vector is added across variable small time windows. Various heuristics are used to filter false detections. Experimental results on SMIC spontaneous micro-expression dataset demonstrate that our framework achieves more accuracy than our previous baseline work.


 (a) Positive. (b) Negative. (c) Surprise. (d) Grouping of landmarks. (e) Images, motion vectors and optical flow magnitude plot for group 2 in a surprise micro-expression video.


Group Emotion Analysis

Social media has provided much opportunity for people to socially engage and interact with a larger population. In recent years, millions of images and videos have been uploaded on the Internet (e.g. in YouTube and Flickr), enabling us to explore images from a social event, such as family party. However, until recently, relatively little research has examined group emotion in an image. To advance affective computing research, it is indeed of interest to understand and model the affect exhibited by a group of people in images. As we know, feature extraction and group expression model are two critical issues to infer the emotion of group. In our work, we propose a new method to estimate happiness intensity of a group of people in an image. Firstly, we combine Riesz transform and local binary pattern descriptor, namely Riesz-based volume local binary pattern, which considers neighboring changes not only in the spatial domain of a face but also along the different Riesz faces. Secondly, we exploit to use continuous conditional random fields for constructing a new group expression model, in which considers global and local attributes. Finally, we utilize this model based on Riesz-based volume local binary pattern for estimating group happiness intensity. Numerous experiments are performed on three challenging facial expression databases to evaluate the novel feature. Furthermore, experiments are conducted on HAPPEI database to evaluate group expression model with the new feature. Our experiment results demonstrate our method can provide considerable performance for group happiness intensity analysis.

An framework to infer emotion of group.


Multi-Modal Emotion Recognition

Automatic analysis of human spontaneous behavior has attracted increasing attention in recent years from researchers in computer vision. This paper proposes an approach for multi-modal video-induced emotion recognition, based on facial expression and electroencephalogram (EEG) technologies. Spontaneous facial expression is utilized as an external channel. A new feature, formed by percentage of nine facial expressions, is proposed for analyzing the valence and arousal classes. Furthermore, EEG is used as an internal channel supplementing facial expressions for more reliable emotion recognition. Discriminative spectral power and spectral power difference features are exploited for EEG analysis. Finally, these two channels are fused on feature-level and decision-level for multi-modal emotion recognition. Experiments are conducted on MAHNOB-HCI database, including 522 spontaneous facial expression videos and EEG signals from 27 participants. Moreover, human perception in emotion recognition compared to the proposed approach is tested with ten volunteers. The experimental results and comparisons with the average human performance show the effectiveness of the proposed multi-modal approach.

Multimodal data for recognizing emotions and estimating their intensity.


Heart Rate Measuring from Videos

Remote heart rate (HR) measurement from face videos recorded by cameras is a new research topic.

Intel's newly-announced low-cost and high precision RealSense 3D (RGBD) camera is becoming ubiquitous in laptops and mobile devices starting this year, opening the door for new applications in the mobile health arena. We demonstrate how the Intel RealSense 3D camera can be used for low-cost gaze tracking and passive pulse rate estimation. We develop a novel 3D gaze and fixation tracker based on the eye surface geometry as well as an illumination invariant pulse rate estimation method using near-infrared images captured with RealSense. We achieve a mean error of 1 cm at 20 × 30 cm for the gaze tracker and 2:26 bpm (beats per minute) for pulse estimation, which is adequate in many medical applications, demonstrating the great potential of novel consumer-grade RGBD technology in mobile health.

Cheek region segmentation using infrared and depth image. (a) Facial landmarks tracked on infrared image. (b) Connected components in depth image. (c) The connected component containing the most landmarks is selected as the face. (d) Face region in infrared image. (e) Cheek area is selected as the region between the eyes and mouth landmarks.

Heart rate measure for ROI in infrared channel.


Analysis of Visual Speech

It is known that human speech perception is a bi-modal process which makes use of information not only from what we hear (acoustic) but from what we see (visual). In machine vision, visual speech recognition (VSR), sometimes also referred to as automatic lip-reading is the task of recognizing the utterances through analyzing the visual recordings of a speaker’s talking mouth without any acoustic input. Although visual information cannot in itself provide normal speech intelligibility, it may be sufficient within a particular context when the utterances to be recognized are limited. In such a case, VSR can be used to enhance natural human-computer interactions through speech especially when audio is not accessible or severely corrupted.

Visual speech constitutes a large part of our nonrigid facial motion and contains important information that allows machines to interact with human users, for instance, through automatic visual speech recognition (VSR) and speaker verification. One of the major obstacles to research of non-rigid mouth motion analysis is the absence of suitable databases. Those available for public research either lack a sufficient number of speakers or utterances or contain constrained view points, which limits their representativeness and usefulness. We introduced a newly collected multi-view audiovisual database for non-rigid mouth motion analysis. It includes more than 50 speakers uttering three types of utterances and more importantly, thousands of videos simultaneously recorded by six cameras from five different views spanned between the frontal and profile views. Moreover, a simple VSR system has been developed and tested on the database to provide some baseline performance.


Head Pose and Eye Gaze Estimation

As large datasets with well-defined gaze directions are desired for the researches in eye gaze estimation, we collected the Oulu Multi-pose Eye Gaze Dataset in order to facilitate related researches. The Oulu Multi-pose Eye Gaze Dataset finally includes 200 image sequences from 50 subjects (For each subject it includes four image sequences). Each sequence consists of 225 frames captured when people are fixating on 10 targeting points on the screen. The first three sequences of each subject are captured under three fixed head poses, namely 0 (the frontal) and ±30 degree respectively. The last sequence is in a free pose style. It forms a strong basis for considering the eye gaze estimation in a finer manner; say continuous manner, rather than just several discrete angles as previous researches do. Moreover, we provided baseline results on our dataset by evaluating the popular approaches on eye gaze estimation. To discuss the influence of different head poses in gaze estimation, we further designed a set of experiments, which randomly pick fifty percent of subjects in one pose as the training set and the left subjects in other poses as the testing set. It shows that the performance under the same pose is much higher than the one under different poses. It means the eye gazes are distinctively different when head poses vary. It confirms our estimation that multi-pose gaze estimation is highly challenging. Fortunately, our dataset provides an opportunity of in-depth investigation in this issue.

Sample of the normalized and cropped eye gaze images in a resolution of 30 × 150 pixels from three fixed head poses respectively.

As a promising technology, visual attention analysis contributes to various computer vision based applications, such as object detection and image segmentation. Since eye movement reveals the regions of interest (ROI) of the human visual system, it is widely utilized in researches regarding visual attention understanding. Hence, to facilitate the research in visual attention analysis, we design and establish a new task-driven eye tracking dataset of 47 subjects. Inspired by psychological findings that human visual behavior is tightly dependent on the executed tasks, we carefully design specific tasks in accordance with the contents of 111 images covering various semantic categories, such as text, facial expression, texture, pose, and gaze. It results in a dataset of 111 fixation density maps and over 5,000 scanpaths. Moreover, we provide baseline results of thirteen state-of-the-art saliency models. Furthermore, we hold discussions on important clues on how tasks and image contents influence human visual behavior. This task-driven eye tracking dataset with the fixation density maps and scanpaths will be made publicly available.

(a) Examples of watching materials and the created fixation density maps. (b) Example of a scanpath. The circles are fixation positions and the length of radius correlates to the duration between fixations. The line shows the direction of fixation shifts and the circle without cross symbol is the first fixation point. The first 5 fixations are marked in this image.


Affective Human-Computer Interaction

A paper describing our Minotaurus system developed for affective human-robot interaction in smart environments was published in 2014 in Cognitive Computation journal.

Vision Systems Engineering

Vision systems engineering research aims to identify attractive computing approaches, architectures, and algorithms for industrial machine vision systems. In this research, solutions ranging from low-level image processing even to equipment installation and operating procedures are considered simultaneously. The roots of this expertise are in our visual inspection studies in which we met extreme computational requirements already in the early 1980’s, and we have contributed to the designs of several industrial solutions. We have also applied our expertise to applications intended for embedded mobile platforms.

In 2014, we created a demo system that showcases different face analysis methods including face recognition, gender recognition, facial expression recognition and heart rate measurement. In 2015-early 2016, key parts of this system are ported to work on Vuzic M100 smart glasses. Software uses only local processing and does not rely on network connectivity to offload computation to servers. Face analysis technology on wearable devices enables many different kind of services. Familiar persons could be associated with extra information that is displayed when they are detected. In the future, software could also automatically analyze surroundings, e.g. for the visually impaired and use other modalities to convey this information for the user.

Face analysis with Vuzic M100 smart glasses.


Biomedical Image Analysis

In recent years, increasing resolving power and automation of biomedical imaging systems have resulted in an exponential growth of the image data. Manual analysis of these data sets is extremely labor intensive and hampers the objectivity and reproducibility of results. Hence, there is a growing need for automatic image processing and analysis methods. In CMV, our aim has been to apply modern computer vision techniques to biomedical image analysis which is one of our emerging research areas.

Automated cell classification in Indirect Immunofluo-rescence (IIF) images has potential to be an important tool in clinical practice and research. Recently, classification of Human Epithelial Type 2 (HEp-2) cell images has attracted great attention. However, the HEp-2 cell classification task is quite challenging due to large intra-class and small inter-class variations. We proposed an effective approach for the automatic HEp-2 cell classification by combining multi-resolution co-occurrence texture and large regional shape information. In our approach, we: a) capture multi-resolution co-occurrence texture information by a novel Pairwise Rotation Invariant Co-occurrence of Local Gabor Binary Pattern (PRICoLGBP) descriptor, b) depict large regional shape information by using an Improved Fisher Vector (IFV) model with RootSIFT features which are sampled from large image patches in multiple scales, and c) combine both features. We evaluated systematically the proposed approach on the IEEE International Conference on Pattern Recognition (ICPR) 2012, the IEEE International Conference on Image Processing (ICIP) 2013 and the ICPR 2014 contest data sets. The proposed method based on the combination of the introduced two features outperforms the winners of the ICPR 2012 contest using the same experimental protocol. Our method also greatly improves the winner of the ICIP 2013 contest under four different experimental setups. Using the leave-one-specimen-out evaluation strategy, our method achieves comparable performance with the winner of the ICPR 2014 contest that combined four features. This work is published in IEEE Journal of Biomedical and Health Informatics.

In another paper, published in Pattern Recognition Letters, we analyzed the importance of the pre-processing, and more specifically the role of Gaussian Scale Space (GSS) theory as a pre-processing approach for the HEp-2 cell classification task. We validated the GSS pre-processing under the Local Binary Pattern (LBP) and the Bag-of-Words (BoW) frameworks. Under the BoW framework, the introduced pre-processing approach, using only one Local Orientation Adaptive Descriptor (LOAD), achieved superior performance on the Executable Thematic on Pattern Recognition Techniques for Indirect Immunofluorescence (ET-PRT-IIF) image analysis. Our system, using only one feature, outperformed the winner of the ICPR 2014 contest that combined four types of features. Meanwhile, the proposed pre-processing method is not restricted to this work; it can be generalized to many existing works.

In our recent work, we also presented a framework for classification of HEp-2 cell images using convolutional neural networks (CNNs). Previous state-of-the-art methods show classification accuracy of 75:6% on a benchmark dataset. We conducted an exploration of different strategies for enhancing, augmenting and processing training data in a CNN framework for image classification. We demonstrate how training data affects classification accuracy of cell classification. We found that additional real data-augmentation is incredibly helpful and domain specific pre-training still maintains an advantage. Our proposed strategy for training data and pre-training and fine-tuning the CNN network led to a significant increase in the performance over other approaches that have been used until now. Specifically, our method achieves 80.25% classification accuracy. Source code and models to reproduce the experiments in the paper is made publicly available on our web pages.

Sample HEp-2 cell images from ICPR2012, ICPR2014, and SNPHEp-2 datasets.


In last few years, advances in microscopy techniques have enabled the investigation of dynamic processes at increasing temporal and spatial resolution. This has produced large quantities of imaging data, which cannot be fully analyzed manually. This has increased the importance of automatic analysis methods, most of which depend heavily on accurate cell segmentation and tracking.

Microscopic images can be very challenging when cell density is high due to frequent interaction of cells with each other. Often there is not enough information in a single frame to make the correct decision about segmentation and tracking. In these situations it helps to consider content of adjacent frames when making decisions, which can be computationally very expensive for long dense sequences. We have developed a greedy joint cell segmentation and tracking method which overcomes this challenge.

Our method uses multiple filter banks to detect cells and uses watershed to split cell clusters and obtain cell proposals. It then creates a hierarchical forest from these cell proposals. The figure below shows two trees in the forest (a). Cells in microscopic sequences can go through few events, the probabilities of which are represented by nodes in a super-node (b). Proposal super-nodes in adjacent frames are connected with each other to create a directed acyclic graph (c). Tracks within this graph are found by iteratively finding the shortest path, which provides cell segmentations and tracks.

A joint cell segmentation and tracking method using cell proposals.


Exploitation of Results

Many researchers have adopted and further developed our methodologies. Our research results are used in a wide variety of different applications around the world. For example, the Local Binary Pattern methodology and its variants are used in numerous image analysis tasks and applications, such as biomedical image analysis, biometrics, industrial inspection, remote sensing and video analysis. The researchers in CMV have actively published the source codes of their algorithms for the research community, and this has increased the exploitation of the results.

The results have also been utilized in our own projects. For example, we have collaborated with Prof. Tapio Seppänen’s Biomedical Engineering Group in the area of multimodal emotion recognition for affective computing, combining vision with physiological signals. Together with Prof. Osmo Tervonen from Oulu University Hospital we have been carried out research on classifying thorax images using computer vision and deep learning methods.

Most of our funding for both basic and applied research comes from public sources such as the Academy of Finland and Tekes, but besides these sources, CMV also conducts research by contract which is funded by companies. In this way, our expertise is being utilized by industry for commercial purposes, and even in consumer products, like mobile devices.

The CMV has actively encouraged and supported the birth of research group spin-outs. This gives an opportunity for young researchers to start their own teams and groups. Side results are the spin-out enterprises. According to our experience, their roots are especially in the strands of “free academic research”.


Future Goals

Our results from 2015 are very positive, for example the number of publications in major forums has clearly increased. Having two new FiDiPro projects for distinguished scientists from abroad will make an exciting progress possible also in coming years. We will continue to sharpen our strategies to meet the future demands and ensure enough research funding in an increasingly tough competition. We plan to carry out well focused cutting-edge research, for example, on novel image and video descriptors, multimodal face analysis and biometrics, multimodal analysis of emotions, 3D computer vision, biomedical image analysis, and energy-efficient architectures for embedded vision systems. We also have plans to further deepen our collaboration with international and domestic partners. We plan to participate in new European project proposals, and continue applying funding for breakthrough research from the Academy of Finland and the European Research Council (ERC). Close interaction between basic and applied research has always been a major strength of our research unit. The scientific output of the CMV has been increasing significantly in recent years. With this we expect to have much new potential for producing novel innovations and exploitation of research results in collaboration with companies and other partners.






senior research fellows


postdoctoral researchers


doctoral students


other research staff




person years for research



External Funding



Academy of Finland

1 401 000


142 000


1 543 000



Doctoral Theses

Herrera Castro D (2015) From images to point clouds: practical considerations for three-dimensional computer vision. Acta Univ Oul C 536.

Komulainen J (2015) Software-based countermeasures to 2D facial spoofing attacks. Acta Univ Oul C 537.

Pedone M (2015) Algebraic methods for constructing blur-invariant operators and their applications. Acta Univ Oul C 538.


Selected Publications

Bayramoglu N & Alatan A (2016) Comparison of 3D local and global descriptors for similarity retrieval of range data. Neurocomputing, 184:13-27.

Bordallo-Lopez M, Boutellaa E & Hadid A (2016) Comments on the "Kinship Face in the Wild" data sets. IEEE Transactions on Pattern Analysis and Machine Intelligence, in press.

Chen J, Lei Z, Liu L, Zhao G & Pietikäinen M (2016) Editorial: RoLoD - Robust local descriptors for computer vision. Neurocomputing, 184:1-2.

Flusser J, Farokhi S, Hoschl C, Suk T, Zitova B & Pedone M (2016) Recognition of images degraded by Gaussian blur. IEEE Transactions on Image Processing, 25(2):790-806.

Guo Y, Zhao G & Pietikäinen M (2016) Dynamic facial expression recognition with atlas construction and sparse representation. IEEE Transactions on Image Processing, 25(5):1977-1992.

Herrera Castro D, Kannala J & Heikkilä J (2016) Forget the checkerboard: practical self-calibration using a planar scene. Proc. IEEE Winter Conference on Applications of Computer Vision (WACV 2016), in press.

Hautala I, Boutellier J & Silvén O (2016) Programmable 28nm coprocessor for HEVC/H.265 in-loop filters. IEEE International Symposium on Circuits and Systems, accepted.

Hong X, Zhao G, Zafeiriou S, Pantic M & Pietikäinen M (2016) Capturing correlations of local features for image representation. Neurocomputing, 184:99-106.

Huang X, Kortelainen J, Zhao G, Li X, Moilanen A, Seppänen T & Pietikäinen M (2016) Multi-modal emotion analysis from facial expressions and electroencephalogram. Computer Vision and Image Understanding, in press.

Huang X, Zhao G, Hong X, Zheng W & Pietikäinen M (2016) Spontaneous facial micro-expression analysis using spatiotemporal completed local quantized patterns. Neurocomputing, 175:564-578.

Liu L, Lao S, Fieguth P, Guo Y, Wang X & Pietikäinen M (2016) Median robust extended local binary pattern for texture classification. IEEE Transactions on Image Processing, 25(3):1368-1381.

Liu Y-J, Zhang J-K, Yan W-J, Wang S-J, Zhao G & Fu X-L (2016) A main directional mean optical flow feature for spontaneous micro-expression recognition. IEEE Transactions on Affective Computing, in press.

Michalska M, Zufferey N, Boutellier J, Bezati E & Mattavelli M (2016) Efficient scheduling policies for dynamic dataflow programs executed on multi-core. International Workshop on Programmability and Architectures for Heterogeneous Multicores, in press.

Qi X, Li C-G, Zhao G, Hong X & Pietikäinen M (2016) Dynamic texture and scene classification by transferring deep image features. Neurocomputing, 171:1230-1241.

Qi X, Zhao G, Chen J & Pietikäinen M (2016) HEp-2 cell classification: The role of Gaussian scale space theory as a pre-processing approach. Pattern Recognition Letters, in press.

Qi X, Zhao G, Li C-G, Guo J & Pietikäinen M (2016) HEp-2 cell classification by combining multi-resolution co-occurrence texture and large region shape information. IEEE Journal of Biomedical and Health Informatics, in press.

Qi X, Zhao G, Shen L, Li Q & Pietikäinen M (2016) LOAD: Local orientation adaptive descriptor for texture and material classification. Neurocomputing, 184:28-35.

Wang H, Chai X, Hong X, Zhao G & Chen X (2016) Isolated sign language recognition with Grassmann covariance matrices. ACM Transactions on Accessible Computing, in press.

Xia X, Feng X, Peng J, Peng X & Zhao G (2016) Spontaneous micro-expression spotting via geometric deformation modeling. Computer Vision and Image Understanding, in press.

Zong Y, Zheng W, Huang X, Yan K, Yan J &, Zhang T (2016) Emotion recognition in the wild via sparse transductive transfer linear discriminant analysis. Journal on Multimodal User Interfaces, in press.

Zong Y, Zheng W, Zhang T & Huang X (2016) Cross-corpus speech emotion recognition based on domain-adaptive least squares regression. IEEE Signal Processing Letters, in press.

Akram S U, Kannala J, Kaakinen M, Eklund L & Heikkilä J (2015) Segmentation of cells from spinning disk confocal images using a multi-stage approach. In: Computer Vision, ACCV 2014 Proceedings, Lecture Notes in Computer Science, 9005:300-314.

Alvarez Casado C, Bordallo Lopez M, Holappa J & Pietikäinen M (2015) Face detection and recognition for smart glasses. Proc. IEEE International Symposium on Consumer Electronics (ISCE), 1-2.

Amara I, Granger E & Hadid A (2015) On the effects of illumination normalization with LBP-based watchlist screening. In: Computer Vision, ECCV 2014 Workshops, Lecture Notes in Computer Science, 8926:173-188.

Anina I, Zhou Z, Zhao G & Pietikäinen M (2015) OuluVS2: A multi-view audiovisual database for non-rigid mouth motion analysis. Proc. IEEE International Conference on Automatic Face and Gesture Recognition (FG 2015), Ljubljana, Slovenia, 1-5.

Bayramoglu N, Kannala J & Heikkilä J (2015) Human epithelial type 2 cell classification with convolutional neural networks. Proc. IEEE International Conference on Bioinformatics & Bioengineering (BIBE), Belgrade, Serbia, 1-6.

Bayramoglu N, Kannala J, Akerfelt M, Kaakinen M, Eklund L, Nees M, & Heikkilä J (2015) A novel feature descriptor based on microscopy image statistics. Proc. International Conference on Image Processing (ICIP 2015), Quebec City, Canada, 2695-2699.

Bhat S, Kannala J & Heikkilä J (2015) 3D Point Representation For Pose Estimation: Accelerated SIFT vs ORB. In: Image Analysis, SCIA 2015 Proceedings, Lecture Notes in Computer Science 9127:79-91.

Bordallo Lopez M, Nieto A, Boutellier J, Silvén O & Lopez Vilariño D (2015) Reconfigurable computing for future vision-capable devices. Proc. International Conference on Embedded Computer Systems, (SAMOS XV), 34-41.

Boulkenafet Z , Boutellaa E, Bengherabi M, & Hadid A (2015) Face verification based on Gabor region covariance matrices. In: Image Analysis, SCIA 2015 Proceedings,Lecture Notes in Computer Science 9127.

Boulkenafet Z, Komulainen J & Hadid A (2015) Face anti-spoofing based on color texture analysis. Proc. International Conference on Image Processing (ICIP 2015), Quebec City, Canada, 2636-2640.

Boutellaa E, Hadid A, Bengherabi M, & Ait-Aoudia S (2015) On the use of Kinect depth data for identity, gender and ethnicity classification from facial images. Pattern Recognition Letters, 68, Part 2:270-277.

Boutellaa E, Bengherabi M, Ait-Aoudia S & Hadid A (2015) How much information Kinect facial depth data can reveal about identity, gender and ethnicity? In: Computer Vision, ECCV 2014 Workshops, Lecture Notes in Computer Science, 8926:725-736.

Boutellaa E, Boulkenafet Z, Komulainen J & Hadid A (2015) Audiovisual synchrony assessment for replay attack detection in talking face biometrics. Multimedia Tools and Applications, 1-15.

Boutellaa E, Harizi F, Bengherabi M, Ait-Aoudia S & Hadid A (2015) Face verification using local binary patterns and generic model adaptation. International Journal of Biometrics 7(1):31-44.

Boutellier J & Ghazi A (2015) Multicore execution of dynamic dataflow programs on the Distributed Application Layer. Proc. IEEE Global Conference on Signal and Information Processing (GlobalSIP), 893-897.

Boutellier J & Nyländen T (2015) Programming graphics processing units in the RVC-CAL dataflow language. Proc. IEEE International Workshop on Signal Processing Systems (SiPS), 1-6.

Boutellier J, Ersfolk J, Lilius J, Mattavelli M, Roquier G & Silvén O (2015) Actor merging for dataflow process networks. IEEE Transactions on Signal Processing, 63(10):2496-2508.

Correia PL, Hadid A & Moeslund DB (2015) Editorial: Special Issue on "Soft Biometrics" 68, Part 2:217.

Ghazi A, Boutellier J, Anttila L, Juntti M & Valkama M (2015) Data-Parallel Implementation of Reconfigurable Digital Predistortion on a Mobile GPU. Proc. Asilomar Conference on Signals, Systems & Computers, 186-191.

Ghazi A, Boutellier J, Silvén O, Bhattacharyya S, Shahabuddin S, Juntti M & Anttila L (2015) Model-based design and implementation of an adaptive digital predistortion filter. Proc. IEEE International Workshop on Signal Processing Systems (SiPS), 1-6.

Hadid A, Ylioinas J, Bengherabi M, Ghahramani M, & Taleb-Ahmed A (2015) Gender and texture classification: A comparative analysis using 13 variants of local binary patterns. Pattern Recognition Letters, 68, Part 2:231-238.

Hadid A, Evans N, Marcel S & Fierrez J (2015) Biometrics systems under spoofing attack: an evaluation methodology and lessons learned. IEEE Signal Processing Magazine, 32(5):20-30.

Hannuksela J, Niskanen M & Turtinen M (2015) Performance evaluation of image noise reduction computing on a mobile platform. Proc. International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XV).

Hautala I, Boutellier J, Hannuksela J & Silvén O (2015) Programmable low-power multicore coprocessor architecture for HEVC/H.265 in-loop filtering. IEEE Transactions on Circuits and Systems for Video Technology, 25(7):1217-1230.

He Q, Hong X, Chai X, Holappa J, Zhao G, Chen X & Pietikäinen M (2015) OMEG: Oulu multi-pose eye gaze dataset. In: Image Analysis, SCIA 2015 Proceedings, Lecture Notes in Computer Science, 9127:418-427.

Huang X, Dhali A, Zhao G, Goecke R & Pietikäinen M (2015) Riesz-based volume local binary pattern and a novel group expression model for group happiness intensity analysis. Proc. the British Machine Vision Conference (BMVC 2015), Swansea, UK, 13 p.

Huang X, Wang S-J, Zhao G & Pietikäinen M (2015) Facial micro-expression recognition using spatiotemporal local binary pattern with integral projection. Proc. The IEEE International Conference on Computer Vision (ICCV) Workshops, 1-9.

Keskinarkaus A, Huttunen S, Siipo A, Holappa J, Laszlo M, Juuso I, Väyrynen E, Heikkilä J, Lehtihalmes M, Seppänen T & Laukka S (2015) MORE - a multimodal observation and analysis system for social interaction research. Multimedia Tools and Applications, 1-25.

Li K, Ghazi A, Boutellier J, Abdelaziz M, Anttila L, Juntti M, Valkama M & Cavallaro JR (2015) Mobile GPU Accelerated Digital Predistortion on a Software-defined Mobile Transmitter. Proc. IEEE Global Conference on Signal and Information Processing (GlobalSIP), 756-760.

Lin S, Wang L-H, Vosoughi A, Cavallaro JR, Juntti M, Boutellier J, Silvén O, Valkama M Bhattacharyya SS (2015) Parameterized sets of dataflow modes and their application to implementation of cognitive radio systems. Journal of Signal Processing Systems, 80(1):3-18.

Linna M, Kannala J & Rahtu E (2015) Online face recognition system based on local binary patterns and facial landmark tracking. In: Advanced Concepts for Intelligent Vision Systems, ACIVS 2015 Proceedings, Lecture Notes in Computer Science , 9386:403-414.

Liu L, Fieguth P, Pietikäinen M & Lao S (2015) Median robust local binary pattern for texture classification. Proc. International Conference on Image Processing (ICIP 2015), Quebec City, Canada, 1-5.

Liu M, Li S, Shan S, Wang R & Chen X (2015) Deeply learning deformable facial action parts model for dynamic expression recognition. In: Computer Vision, ACCV 2014 Proceedings, Revised Selected Papers, Part IV, Lecture Notes in Computer Science, 9006:143-157.

Liu X, Zhao G, Yao J & Qi C (2015) Background subtraction based on low-rank and structured sparse decomposition. IEEE Transactions on Image Processing, 24(8):2502-2514.

Michalska M, Boutellier J & Mattavelli M (2015) A methodology for profiling and partitioning stream programs on many-core architectures. Procedia Computer Science, Elsevier, 51:2962-2966.

Mustaniemi J, Kannala J & Heikkilä J (2015) Disparity estimation for image fusion in a multi-aperture camera. In: Computer Analysis of Images and Patterns, CAIP 2015 Proceedings, Lecture Notes in Computer Science, 9257:158-170.

Nixon MS, Correia PL, Nasrollahi K, Moeslund TB, Hadid A & Tistarelli M (2015) On soft biometrics Pattern Recognition Letters, 68, Part 2:218-230.

Nyländen T, Boutellier J, Nikunen K, Hannuksela J & Silvén O (2015) Low-power reconfigurable miniature sensor nodes for condition monitoring. International Journal of Parallel Programming, 43(1):3-23.

Nyländen T, Kultala H, Boutellier J, Hautala I, Hannuksela J & Silvén O (2015) Programmable data parallel accelerator for computer vision. Proc. 3rd IEEE Global Conference on Signal and Information Processing, 624-628.

Ouamane A, Bengherabi M, Hadid A & Cheriet M (2015) Side-information based exponential discriminant analysis for face verification in the wild. Proc. International Conference on Face and Gesture, Workshop on Face Biometrics in the Wild, 1-6.

Patel D, Zhao G & Pietikäinen M (2015) Spatiotemporal integration of optical flow vectors for micro-expression detection. In: Advanced Concepts for Intelligent Vision Systems, ACIVS 2015 Proceedings, Lecture Notes in Computer Science, 9386:369-380.

Pedone M, Bayro-Corrochano E, Flusser J & Heikkilä J (2015) Quaternion Wiener deconvolution for noise robust color image registration. IEEE Signal Processing Letters, 22(9):1278-1282.

Pedone M, Flusser J & Heikkilä J (2015) Registration of images with N-fold dihedral blur. IEEE Transactions on Image Processing, 24(3):1036-1045.

Pietikäinen M & Zhao G (2015) Two decades of local binary patterns: A survey. In: E Bingham, S Kaski, J Laaksonen & J Lampinen (eds) Advances in Independent Component Analysis and Learning Machines, Elsevier, 175-210.

Qi X, Shen L, Zhao G, Li Q & Pietikäinen M (2015) Globally rotation invariant multi-scale co-occurrence local binary pattern. Image and Vision Computing, 43:16-26.

Qiu Q, Chang Z, Draelos M, Chen J, Bronstein A & Sapiro G (2015) Low-cost gaze and pulse analysis using Realsense. Proc. 5th EAI International Conference on Wireless Mobile Communication and Healthcare (MobiHealth), 276-279.

Rezazadegan Tavakoli H, Atyabi A, Rantanen A, Laukka S, Nefti-Meziani S & Heikkilä J (2015) Predicting the valence of a scene from observers' eye movements. PLOS ONE, 10(9): e0138198.

Sarjanoja S, Boutellier J & Hannuksela J (2015) BM3D image denoising using heterogeneous computing platforms. Proc. Conference on Design and Architectures for Signal and Image Processing (DASIP), 1-8.

Särkkä S, Tolvanen V, Kannala J & Rahtu E (2015) Adaptive Kalman filtering and smoothing for gravitation tracking in mobile systems. Proc. International Conference on Indoor Positioning and Indoor Navigation (IPIN 2015), 1-7.

Taketomi T & Heikkilä J (2015) Focal length change compensation for monocular SLAM. Proc. International Conference on Image Processing (ICIP 2015), Quebec City, Canada, 4982-4986.

Taketomi T & Heikkilä J (2015) Zoom factor compensation for monocular SLAM. IEEE Virtual Reality (VR 2015), Arles, France, 293-294.

Tavakoli HR, Rahtu E & Heikkilä J (2015) Analysis of sampling techniques for learning binarized statistical image features using fixations and salience. In: Computer Vision, ECCV 2014 Workshops, Lecture Notes in Computer Science, 8926:124-134.

Tayanov V, Granger, E, Bordallo Lopez M & Hadid A (2015) Super-resolution pipeline for fast adjudication in watchlist screening. Proc. International Conference on Image Processing Theory, Tools and Applications (IPTA), 273-278.

Thevenot J, Chen J, Finnilä M, Nieminen M, Lehenkari P, Saarakkala S & Pietikäinen M (2015) Local binary patterns to evaluate trabecular bone structure from micro-CT data: Application to studies of human osteoarthritis. In: Computer Vision, ECCV 2014 Workshops, Lecture Notes in Computer Science, 8926:63-79.

Tretter A, Boutellier J, Guthrie J, Schor L & Thiele L (2015) Executing dataflow actors as Kahn processes. Proc. International Conference on Embedded Software (EMSOFT), IEEE, 105-114.

Varjo S & Hannuksela J (2015) Image based visibility estimation during day and night. In: Computer Vision, ACCV 2014 Workshops, Revised Selected Papers, Part III, Lecture Notes in Computer Science, 9010:277-289.

Varjo S, Kaikkonen V, Hannuksela J & Mäkynen A (2015) All-in-focus image reconstruction from in-line holograms of snowflakes. Proc. IEEE International Instrumentation and Measurement Technology Conference (I2MTC), 1096-1101.

Vilmi P, Varjo S, Sliz R, Hannuksela J & Fabritius T (2015) Disposable optics for microscopy and diagnostics. Scientific Reports, 5:16957.

Wang S-J, Yan W-J, Li X, Zhao G, Zhou C-G, Fu X, Yang M & Tao J (2015) Micro-expression recognition using color spaces. IEEE Transactions on Image Processing, 24(12):6034 - 6047.

Wang SJ, Yan WJ, Zhao G & Fu X (2015) Micro-expression recognition using robust principal component analysis and local spatiotemporal directional features. In: Computer Vision, ECCV 2014 Workshops, Lecture Notes in Computer Science, 8925:325-338.

Xu Y, Hong X, He Q, Zhao G & Pietikäinen M (2015) A task-driven eye tracking dataset for visual attention analysis. In: Advanced Concepts for Intelligent Vision Systems, ACIVS 2015 Proceedings, Lecture Notes in Computer Science, 9386:637-648.

Yaghoobi A, Rezazadegan-Tavakoli H & Röning J (2015) Affordances in visual surveillance. In: Computer Vision, ECCV 2014 Workshops, Lecture Notes in Computer Science, 8926:403-405.

Yan WJ, Wang SJ, Chen YH, Zhao G & Fu X (2015) Quantifying micro-expressions with constraint local model and local binary pattern. In: Computer Vision, ECCV 2014 Workshops,Lecture Notes in Computer Science, 8295:296-305.

Ylimäki M, Kannala J, Heikkilä J (2015) Optimizing the accuracy and compactness of multi-view reconstructions. In: Computer Analysis of Images and Patterns, CAIP 2015 Proceedings, Lecture Notes in Computer Science, 9257:171-183.

Ylimäki M, Kannala J, Holappa J, Brandt SS, Heikkilä J (2015) Fast and accurate multi-view reconstruction by multi-stage prioritized matching. IET Computer Vision, 9(4):576-587.

Ylioinas J, Kannala J, Hadid A & Pietikäinen M (2015) Face recognition using smoothed high-dimensional representation. In: Image Analysis, SCIA 2015 Proceedings, Lecture Notes in Computer Science, 9127:516-529.

Ylioinas J, Kannala J, Hadid A & Pietikäinen M (2015) Unsupervised learning of overcomplete face descriptors. Proc. 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 75-83.

Zong Y, Zheng W, Huang X, Yan J & Zhang T (2015) Transductive transfer LDA with Riesz-based volume LBP for emotion recognition in the wild. Proc. ACM International Conference on Multimodal Interaction (ICMI'15), 491-496.

Åkerfelt M, Bayramoglu N, Robinson S, Toriseva M, Schukov H-P, Härmä V, Virtanen J, Kaakinen M, Eklund L, Kannala J, Heikkilä J & Nees M (2015) Automated tracking of tumor-stroma morphology in microtissues identifies targets within the tumor microenvironment for therapeutic intervention. Oncotarget, 6(30):30035-30056.

Last updated: 22.6.2016