Infotech Oulu Annual Report 2012 - Center for Machine Vision Research (CMV)

Professor Matti Pietikäinen, Professor Janne Heikkilä, and Professor Olli Silvén, Department of Computer Science and Engineering , University of Oulu

mkp(at), jth(at), olli(at)


Background and Mission

The Center for Machine Vision Research (CMV) is a creative, open and internationally attractive research unit. It is renowned world-wide for its expertise in computer vision.

The center has a strong record, which now spans for over 31 years, of scientific merits in both basic and applied research on computer vision. It has achieved ground-breaking research results in many areas of its activity, including texture analysis, facial image analysis, geometric computer vision, and energy-efficient architectures for embedded systems. The mission of the center is to develop novel computer vision methods and technologies that create a basis for emerging innovative applications.

In February 2013 the CMV had three professors, one FiDiPro professor, 14 senior or postdoctoral researchers, and 26 doctoral students or research assistants. The unit is highly international: about 43% of our researchers (doctoral and post-doctoral) are from abroad. CMV has an extensive international collaboration network in Europe, the USA, and China. The mobility of the researchers to leading research groups abroad, and vice versa, is intense. Within the Seventh Framework Programme FP7, the CMV currently participates in the project consortium of Trusted Biometrics under Spoofing Attacks (Tabula Rasa). It also participates in two European COST actions.

The main areas of our research are computer vision methods, human-centered vision systems and vision systems engineering. The results of the research have been widely exploited in industry, and contract research forms a part of our activities.


Highlights and Events in 2012

The Center for Machine Vision Research (CMV) witnessed its most successful year ever in high impact journal acceptances in the year 2012. Altogether 16 journal articles were accepted for publication. Seven were co-authored articles with our international colleagues, and the remaining nine were written by our research staff. Out of the overall amount, as many as ten were published in highly prestigious journals.

CMV´s paper “Description of interest regions with local binary patterns” with Marko Heikkilä as the first author, and Prof. Matti Pietikäinen and Dr. Cordelia Schmid (INRIA Grenoble) as the co-authors, was given the Best Paper Award among all papers published in 2009 in the prestigious Pattern Recognition journal. The award ceremony was held at ICPR 2012, Tsukuba, Japan.

Professor Xilin Chen from the Chinese Academy of Sciences (CAS) joined CMV as FiDiPro Professor in August 2012. He will spend several short periods of time in Oulu during the FiDiPro (the Finland Distinguished Professor Programme) funding period of August 2012 – February 2016. The “Perceptual interfaces for intelligent human-computer interaction” project will combine the expertise of the world-renowned computer vision research groups in Finland and China.

The CMV Leader, Professor Matti Pietikäinen received The Pentti Kaitera Prize 2012 for his outstanding achievements in machine vision research as well as for his significant contribution to advancing welfare in Northern Finland.

Professor Matti Pietikäinen receiving the Pentti Kaitera Prize 2012.

As a new application area, CMV’s computer vision methods are now applied in biomedical science. The Finnish Funding Agency for Technology and Innovation (Tekes) granted two-year funding for an interdisciplinary project between Biocenter Oulu’s Tissue Imaging Center and CMV. The objective is to develop algorithm-based video image analysis tools and machine learning techniques that combine multidimensional image information with the different type of biomedical data.

In summer 2012, CMV had the pleasure of hosting in Oulu a two-month visit of one of the leading experts in vision-based human-computer interaction, Professor Matthew Turk from the University of California, Santa Barbara. Prof. Turk held the prestigious Fulbright-Nokia Distinguished Chair position in Information and Communications Technologies in 2011-2012.

Fulbright-Nokia Distinguished Chair 2011-2012, Professor Matthew Turk visiting CMV in summer 2012.

As an essential part of the visit, Prof. Turk gave a course on mobile computer vision within the Infotech Oulu Doctoral Program. In addition, similar PhD level courses were given by Adjunct Professor, Dr. Kari Pulli, from NVIDIA Research, and Dr. L’ubor Ladicky, from the University of Oxford, UK.

The CMV co-organized an International Workshop on Computer Vision with Local Binary Pattern Variants (LBP 2012) in conjunction with the Asian Conference on Computer Vision (ACCV 2012), in early November in Daejeon, Korea. The workshop provided the state of the art and the most recent developments in the use of LBPs and their variants in computer vision. The CMV Leader, Professor Matti Pietikäinen, held the keynote at the venue, as well as one of the three keynotes within the International Conference on Image and Signal Processing (ICISP 2012), in Agadir, Morocco, in late June.

CMV hosted the Technical Meeting of a European FP7 project Trusted Biometrics under Spoofing Attacks (shortly Tabula Rasa) in June. This was the first time the project partners had gathered in Oulu.

The developments in affective human robot interaction research raised wide interest in the Finnish media in mid-August, when professors Matti Pietikäinen and Juha Röning (ISG) showcased the social robot Minotaurus and its capabilities in the Academy of Finland´s media event. Numerous newspapers and the main tech-releases covered the Minotaurus robot understanding speech commands and responding with an avatar appearing on its display.


Scientific Progress

The current main areas of research are: 1) Computer vision methods, 2) Human-centered vision systems, and 3) Vision systems engineering. In addition to these main areas, our research activities have been recently extended to biomedical image analysis where we collaborate with Biocenter Oulu.

Computer vision methods

The group has a long and highly successful research tradition in two important generic areas of computer vision: texture analysis and geometric computer vision. In the last few years, the research in computer vision methods has been broadened to cover also two new areas: computational photography and object detection and recognition. The aim in all these areas is to create a methodological foundation for development of new vision-based technologies and innovations.

Texture analysis

Texture is an important characteristic of many types of images and can play a key role in a wide variety of applications of computer vision and image analysis. The CMV has long traditions in texture analysis research and ranks among the world leaders in this area. The Local Binary Pattern (LBP) texture operator has been highly successful in numerous applications around the world, and has inspired plenty of new research on related methods, including the blur-insensitive Local Phase Quantization (LPQ) method, also developed at CMV.

Effective characterization of texture images is always an important issue which requires the exploitation of spatial correlations between pixels. As two commonly used texture descriptors, Local Binary Patterns (LBPs) reflect the co-occurrence of binary comparisons among pixels within a local area, whereas Covariance Matrices (CovMs) statistically capture correlation among elementary features of pixels over a certain image region. Enhanced performance is expected if these two kinds of information could be combined in a compact descriptor. Unfortunately, though CovMs are capable of blending multiple informative features in compact and powerful descriptors, the discriminative LBP features could not be exploited as elementary features for CovMs directly, since ordinary LBPs are not numerical variables in Euclidean spaces. Hence local co-occurrence, brought by LBP-like features, and global correlation, brought by CovMs, could not be combined to achieve enhanced discriminative power. To address this problem, we developed a powerful descriptor, named COV-LBP. Firstly, a variant of LBPs in Euclidean spaces, named the LBP Difference feature (LBPD), which can be used for any statistical image region description was proposed. LBPD reflects how far one LBP lies from the LBP mean of a given image region. It is simple, descriptive, rotation invariant, and computationally efficient. Secondly, by applying LBPD in multiple commonly used elementary features mapped from the original image, we provided a bank of discriminative features optional for CovMs. Consequently, the information of LBPs and CovMs are embedded in a unified COV-LBP descriptor. The performance of COV-LBP was evaluated on textures present on three challenging texture databases: Outex_TC_00012, KTH-TIPS, and KTH-TIPS2a. The encouraging results showed that COV-LBP provides high discrimination and good robustness.

(a) (b) (c)

Different features extracted from the “Lena” image: (a) the original image; (b) the ordinary LBP feature image in the intensity channel; and (c) the signed LBP Difference image. LBPDs are complementary to the original elementary features.

The framework of COV-LBP. The set of LBPDs calculated in different feature channels, together with the set of the original features, provides a bank of powerful features optional for COV-LBP.

As one of the major problems in computer vision, texture classification has shown significant improvements; however, the extraction of effective features for texture image representation is still considered as a challenging problem. To obtain discriminative patterns, we presented a learning model which is formulated into a three-layered model. It estimates the optimal pattern subset of interest by simultaneously considering the robustness, discriminative power and representation capability of features. This model is generalized and can be integrated with existing LBP variants such as conventional LBP, rotation invariant patterns, local patterns with anisotropic structure, completed local binary pattern (CLBP) and local ternary pattern (LTP) to derive new image features for texture classification. The derived descriptors were extensively evaluated on two publicly available texture databases (Outex and CUReT) for texture classification, two medical image databases (Hela and Pap-smear) for protein cellular classification and disease classification, and a neonatal facial expression database (infant COPE database) for facial expression classification. Experimental results demonstrate that the obtained descriptors led to state-of-the-art classification performance.

A three-layered model for learning dominant local binary patterns.

Dynamic texture (DT) is an extension of texture to the temporal domain. How to segment DTs into disjoint regions is a very challenging problem. DTs might be different from their spatial mode (i.e., appearance) and/or temporal mode (i.e., motion field). To this end, we developed a framework based on the appearance and motion modes. For the appearance mode, we use a new local spatial texture descriptor to describe the spatial mode of DT; for the motion mode, we use the optical flow and the local temporal texture descriptor to represent the temporal variations of DT. In addition, for the optical flow, we use the Histogram of Oriented Optical Flow (HOOF) to organize them. To compute the distance between two HOOFs, we developed a simple, effective and efficient distance measure based on Weber Law. Furthermore, we also addressed the problem of threshold selection by proposing a method for determining thresholds for the segmentation method by offline supervised statistical learning. Experimental results show that our method provides very good segmentation results compared to the state-of-the-art methods in segmenting regions that differ in their dynamics. The results were recently published in IEEE Transactions on Image Processing. Following this work, we further studied how to improve the performance and efficiency of our approach. This was achieved by computing the histogram of the spatiotemporal local texture descriptor in one volume and employing the segmentation results of the previous frame for the segmentation of the current frame. The results were published at the ICPR 2012 conference.



Automatic segmentation of dynamic textures.

We also continued our research on dynamic texture synthesis. A new approach was proposed which is centered on synthesis in the spatial domain, unlike previous work that is mostly focused on the temporal domain. The method explores a 3D patch based synthesis, where the patch selection is accomplished by using spatiotemporal LBP-TOP features, instead of just using the intensity of pixels. The experiments showed that our approach provides a good description of the structure and motion of dynamic textures without generating visible discontinuities or artifacts.

In texture recognition, classes of textures are commonly obtained by creating databases of samples captured under controlled conditions; however when a given image has to be classified, there might be no guarantee of its visual quality and no knowledge of the process of acquisition; thus, such an image might have undergone several types of degradations, for example geometric or radiometric distortions, noise, and blur. We studied the problem of color texture classification under blur and illumination changes. We extended our local-phase-quantization (LPQ) descriptors to generate blur-robust features of color images. The extension is derived from a representation based on Clifford algebra that treats color as a quaternionic quantity. The main advantage of this representation is that it easily allows generating descriptors that are invariant to illumination changes and yet still preserve the blur-robustness of the grayscale LPQ descriptors. The proposed color texture descriptor achieves superior accuracy over its grayscale counterpart and other color texture descriptors like LBP. Furthermore, we observed remarkable performances in challenging scenarios of varying illumination, without the need to preprocess textures with color-constancy algorithms.

Computational photography

In computational photography, the aim is to develop techniques for computational cameras that give more flexibility to image acquisition and enable more advanced features to be employed, going beyond the capabilities of traditional photography. In 2012, we have studied three topics related to computational photography: microlens array imaging, blur invariant registration, and image priors.

In microlens array imaging, we have introduced a new imaging setup, where the microlenses capture directly the microscopic targets without additional refractive optics. The setup utilizes a single aperture between the sample and the illumination source so that no micro apertures are needed for individual micro lenses. We have shown that this type of setup reaches 8.8 µm resolution with hot embossed microlenses, and 5 µm with glass lenslets, and this resolution is already enough for detecting Schistosoma parasite eggs in urine samples.

The light field rendering methods are studied further, and more emphasis is put on obtaining accurate calibration of the individual lenses. Good calibration should enable better rendering quality, while also various super resolution approaches are under the loupe.

The concept of the new imaging setup using a microlens array, a single aperture and a light source.

Comparison of a light microscope with 5.2 x magnification (left) to the proposed setup (right) for detection of Schistosoma haematobium parasite eggs.

A common type of image degradation is blur. The source of the blur may come from camera shake, scene motion, wrong focus, atmospheric turbulence, sensor imperfection, low resolution and other factors. Recovering a sharp image from a blurred one is a notably difficult problem. A class of approaches that proved to be particularly successful is based on image fusion, where a set of differently blurred images is used to estimate the deconvolved image (multi-channel blind deconvolution, MBD). However, in order to perform MBD the blurred images must be registered, which is also a challenging problem. When the images are not correctly aligned, MBD fails, or produces visible artifacts in the output image. We developed a robust technique that is invariant to blur with point-spread-functions having an arbitrary degree of rotational symmetry. Exploiting the higher symmetry of the PSF allows us to increase the robustness of registration and outperform current methods.


(Left and middle) Two differently blurred images with a real out-of-focus blur. (Right) Result of MBD after blur invariant registration.

Prior models are useful in many image processing applications (e.g. denoising, inpainting, stereo, optical flow, etc.) They encapsulate our knowledge and assumptions about the structure of images. Image priors can be applied to any modality (e.g. color, infrared, depth). Each modality, however, has its own statistics and thus follows a different model. In the context of 3D reconstruction, we often have a color image with an aligned semi-dense depth map. We developed a joint prior for both intensity and depth, thus taking advantage of the implicit relations between the two channels. This prior can be used in many applications. For instance, we have applied it to the problem of depth map inpainting and upsampling.

Object detection and recognition

Humans can effortlessly recognize thousands of object classes, which is crucial for successful interpretation of visual content. Recent advances in computer vision have made automatic object detection more practical, and nowadays it is possible to automatically retrieve images which contain a particular object instance or objects from a certain object class. Our research on this subject in 2012 was concentrated on two important subtopics, namely fine grained object classification and salient object localization.

Conventional object classification considers classes that are clearly distinct from each other, e.g. cars, bicycles, cats, persons. Another, almost equally important task is to distinguish between fine-grained sub-classes of the same parent class. We call the resulting classification problem a fine-grained object classification. A good example is flower classification, where the task is to tell the exact species of the flower in the image. Another useful example is bird classification, where we try to identify the exact bird species. The both of these problems are difficult, since there exists a vast quantity of different flower and bird species in the world, and often there are only subtle differences between classes.

Compared to conventional object classification, the fine-grained version has some important characteristics. Usually the appearance difference that distinguishes one class from the other is small compared to the natural variations caused by other factors like natural variations and deformation within the class, background, lighting conditions, viewpoint, and scale. Because of these, it is important for a successful algorithm to be able to eliminate (in one way or another) as many of these variations as possible.

One of the most successful strategies in fine-grained classification has proven to be the following: 1) learn an accurate high level segmentation algorithm for the parent class, and 2) then learn a highly discriminative classifier that operates only with the segmented objects. Although, it might sound straightforward, it turns out that it is extremely difficult to develop accurate segmentation methods for the parent class. Moreover, the required training data (e.g. segmented flower images) is usually costly to obtain.

To this end, we have developed, together with the Visual Geometry Group at the University of Oxford, an automatic co-segmentation approach that is able to learn the necessary parameters using only weakly labeled images; namely images for which we know only the class label (no segmentation boundaries or bounding boxes etc.). This kind of data is rather cheap to obtain using for example a Google image search. Moreover, our approach promotes areas which are important in distinguishing between the fine-grained classes. In our work, we were able to show a considerable improvement with several fine-grained image classification datasets.

Example results of the segmentation method. The upper row has the original image, second row represents the competing state-of-the-art method and the bottom row shows the results of our approach.

During the last year, we exploited the ideas from neural adaptation in the human perception system in order to measure motion saliency in videos. Decorrelation techniques let us discard redundant data in a small temporal window. This will produce the areas of the image that are prone to motion. The system was prototyped in Matlab and tested on the Background Model Challenge (BMC) data set. The method ranked 3rd in the BMC competition held in conjunction with the 11th Asian Conference on Computer Vision (ACCV 2012).

We also exploited saliency for object tracking. To this end, we defined a local similarity number operator which measures the amount of saliency of a pixel in terms of the number of similar pixels in the surrounding area. Afterwards, a target patch is modeled using a joint Saliency-Color distribution and tracked. The proposed method outperforms similar techniques where the target patch is presented by textural-color and color distribution.

Examples of different target representation methods (from top to bottom: color patch, texture-color, and saliency color). As depicted, saliency-color patches are more uniform and focus more on the details of the specific target.

Geometric computer vision

Creating intelligent machines which are capable of complex information processing tasks is one of the great challenges of computer science. Thanks to the development of science, such machines are gradually coming closer to reality. However, in order to become reality, smart machines need better methods for observing their environment and interacting with it, and developing such methods requires knowledge of computer vision and geometry. Our research in geometric computer vision supports the goal of smart machines, and allows improvement in the ways that computers are able to interact with humans and the environment. Hence, geometric computer vision can be seen as a complementary field to our research in human-centered vision systems.

Our group has an extensive research background in various problems of geometric computer vision and we have continued our research efforts on several fronts. For example, we have recently developed a new multi-view stereo reconstruction method which acquires three-dimensional point cloud reconstructions of scenes from multiple photographs. This method was published in ICPR 2012, and it utilizes a prioritized matching strategy in which the most promising point correspondences between the different views are established first, and then they are iteratively expanded in the best-first order. To the best of our knowledge, our approach is the first one that uses prioritized matching and is able to directly utilize all the available input views. Comparison to the state of the art shows that our method produces point cloud reconstructions of comparable quality, but substantially faster. Hence, it is a useful tool for creating accurate image-based 3D models of scenes.

An example of a point cloud generated from three input views using our method. Prioritized matching expands the correct matches and produces an accurate point cloud (bottom). The lines represent a sparse set of correct (green) and incorrect (red) correspondences.

In addition to multi-view stereo, which refers to passive reconstruction methods solely based on conventional photographs, we have studied active depth sensing techniques by using a Kinect device that utilizes structured infrared light. Such active depth sensing techniques are particularly important in indoor environments where large texture-less surfaces typically cause problems for purely image-based reconstruction methods. As a first step, we have developed an approach to accurate geometric calibration of Kinect devices. Our approach includes a depth-dependent distortion model, and it allows more accurate modeling of the device than competing approaches. The proposed calibration method was recently published in the IEEE TPAMI journal, and the related software is available as an open-source Matlab toolbox. This toolbox has already created widespread interest in the research community, and its different versions have been downloaded more than 5,000 times.

When using active depth cameras, like the Kinect, for acquiring three-dimensional models of indoor environments, it is often necessary to combine long sequences of overlapping depth maps. A simple merging of the points results in much redundant data, which slows down further processing and requires more resources. We have recently worked on a method for creating a non-redundant point cloud with varying levels of detail. The method does not limit the captured volume or require any parameters from the user. Furthermore, overlapping measurements are used to refine the point estimates so that the accuracy of the resulting non-redundant point cloud is better than that of the individual input depth maps. The developed method helps us to produce accurate and compact models of indoor environments more efficiently.

We have also studied techniques for image-based camera localization in indoor environments. In our recent study, we focused on the indoor image matching problem in which the scene information is gathered from multi-modal (2D/3D) sensors. Today, extracting such information from real environments is rather effortless with the help of the new generation depth cameras and range scanners such as Kinect. In the literature, indoor scene matching is considered to be more challenging than the outdoor scene matching problem because outdoor scenes contain more discriminative and unique features leading to comparatively easy recognition. Besides, indoor scenes comprise many similar structures such as doors, windows, chairs, etc. Therefore we proposed combining local 2D and 3D descriptors and explored different ways of combining them. The local properties of 3D data are less distinguishing than their texture-based counterparts. To overcome this limitation we employed glocal (Global-Local) 3D descriptors. The proposed method achieved better performance than the state-of-the-art methods employing only 2D data.

A flowchart of the proposed image-based indoor localization method.

In future, we plan to continue our efforts in image-based modeling, as our recent results and the current state of the field provide a good basis for impressive future applications. For example, we aim to combine conventional multi-view stereo and modern active depth sensing techniques in order to provide better and more convenient tools for building image-based 3D models, and interacting with them.

Human-centered vision systems

In future ubiquitous environments, computing will move into the background, being omnipresent and invisible to the user. This will also lead to a paradigm shift in human-computer interaction (HCI) from traditional computer-centered to human-centered systems. We expect that computer vision will play a key role in such intelligent systems, enabling, for example, natural human-computer interaction, or identifying humans and their behavior in smart environments.

Face recognition and biometrics

CMV has continued to play a key role in the FP7 EU project Tabula Rasa (started in 2010 and ending in 2014), looking at the vulnerabilities of existing biometric recognition systems to spoofing attacks on a wide range of biometrics, including face, voice, gait, fingerprints, retina, iris, vein, electro-physiological signals (EEG and ECG) etc. CMV was the leader of the work package (WP) on the evaluation of biometric systems under spoofing attacks, and also contributed to the development of countermeasures to face and gait spoofing attacks. A joint paper with Prof. Mark Nixon’s group from the University of Southampton, discussing whether gait biometrics can be spoofed or not, was accepted for oral presentation at the International Conference on Pattern Recognition (ICPR 2012).

Without anti-spoofing measures, most of the state-of-the-art facial biometric systems are indeed vulnerable to attacks, since they try to maximize the discrimination between identities, instead of determining whether the presented trait originates from a real live client. Even a mere photograph of the enrolled person’s face, displayed as a hard-copy or on a screen, will fool the system. We have been approaching the problem of spoofing attacks from a texture analysis point of view, since fake faces usually contain recapturing defects, e.g. blur and spoofing medium artifacts that can be detected using texture features. As an initial countermeasure, we proposed using fusion of LBP and gray-level co-occurrence matrices (GLCM) based features, because the combined description provided an effective representation of the overall facial texture quality. Furthermore, we extended our micro-texture analysis based spoofing detection into the spatiotemporal domain and introduced a dynamic texture based face liveness description consisting of both facial appearance and dynamics. More specifically, local binary patterns from three orthogonal planes (LBP-TOP) were utilized for describing specific dynamic events, e.g. facial motion patterns and sudden characteristic reflections of planar spoofing media, and scenic cues which might differentiate real faces from fake ones. Since motion is an important visual cue in spoofing detection, a significant performance enhancement was obtained when the facial dynamics information was exploited in addition to facial appearance.

In 2012, we also continued our research on demographic classification with emphasis on novel local binary pattern variants, especially for gender and age classification. Among the significant achievements is a simple yet efficient extension to LBP which gives a significant improvement compared to the conventional LBP method. Our extensive experiments showed very promising results, not only in gender and age classification, but also in other problems such as texture classification and face recognition. The proposed extension is based on denser image sampling with respect to the LBP neighborhood’s center reference. The method turned out to perform well with other LBP variants, for example, with CLBP and LTP which are among the most powerful LBP variants.

CMV has co-organized a very successful international workshop on computer vision with local binary pattern variants (LBP 2012) which was held in Daejeon, South Korea, on November 5th, 2012, in conjunction with the Asian conference on computer vision (ACCV 2012). This workshop provided a clear summary of the state of the art, and discussed the most recent developments on the use of Local Binary Patterns and their variants in different computer vision applications. The workshop received a record number of 45 submissions. Based on thorough reviews by the program committee, 13 papers were finally selected. Besides the 13 interesting oral presentations, the workshop also included a keynote speech from a pioneer of LBP (Prof. Matti Pietikäinen from CMV) and a best paper award sponsored by KeyLemon - a leading face recognition software company.

Co-organized by CMV, a very successful international workshop on computer vision with local binary patterns variants was held in Daejeon, South Korea, on November 5th, 2012.

Recognition of facial expressions and emotions

Facial expression recognition is used to determine the emotional state of the face, regardless of its identity. Feature representation is an important research topic with facial expression recognition in video sequences. We proposed to use spatiotemporal monogenic binary patterns to describe the appearance and motion information of the dynamic sequences. Firstly, we use monogenic signals analysis to extract the magnitude, the real picture and the imaginary picture of the orientation of each frame, since the magnitude can provide much appearance information, and the orientation can provide complementary information. Secondly, the phase-quadrant encoding method and the local bit exclusive operator are utilized to encode the real and imaginary pictures from orientation in three orthogonal planes, and the local binary pattern operator is used to capture the texture and motion information from the magnitude through three orthogonal planes. Finally, both the concatenation method and multiple kernel learning method are exploited to handle the feature fusion. The experimental results on the Extended Cohn-Kanade and Oulu-CASIA facial expression databases demonstrate that the proposed methods perform better than the state-of-the-art methods, and are robust to illumination variations.

Visualization of the original image, the local monogenic magnitude binary pattern, the local monogenic real image binary pattern, the local monogenic imaginary image binary pattern.

In addition, for the dynamic facial expression recognition problem, we proposed a new scheme formulating the dynamic facial expression recognition problem as a longitudinal atlases construction and a deformable groupwise image registration problem. Longitudinal atlases of each facial expression are constructed by performing groupwise registration among all the expression image sequences of different subjects. The constructed atlases can reflect overall facial feature changes of each expression among the population, and can suppress the bias due to inter-personal variations. This method was extensively evaluated on the Cohn-Kanade, MMI, and Oulu-CASIA VIS dynamic facial expression databases. Experimental results demonstrate that our method consistently achieves the highest recognition accuracies among other methods under comparison on all the databases.

Formulating facial expression recognition as a longitudinal atlases construction and deformable groupwise image registration problem.

Facial occlusion is a challenging research topic in facial expression recognition (FER). It leads us to develop some effective facial representations and occlusion detection methods in order to extend FER to uncontrolled environments. It should be noted that most of the previous work is focused on these two issues separately, and on static images. We were thus motivated to propose a complete system consisting of facial representations, occlusion detection, and multiple feature fusion in video sequences. For achieving a robust facial representation due to the contributions of facial components to expressions, we proposed an approach deriving six feature vectors from eyes, nose and mouth components to form a facial representation. These features with temporal cues are generated by dynamic texture and structural shape feature descriptors. On the other hand, occlusion detection is realized by the traditional classifiers or model comparison. Recently, sparse representation has been proposed as an efficient method of combatting occlusion, while it is correlated with facial identity in FER, unless an appropriate facial representation is being used. Thus, we presented an evaluation that demonstrates that the proposed facial representation is independent of facial identity. We then exploited sparse representation and residual statistics to occlusion detection of the image sequences. As concatenating six feature vectors into one causes the curse of dimensionality, we proposed multiple feature fusion, consisting of a fusion module and weight learning. Experimental results on the Extended Cohn-Kanade database and simulated database demonstrate that our framework outperforms the state-of-the-art methods for FER in normal videos, and especially, in partial occlusion videos.

The proposed method of dynamic expression recognition against facial occlusion. (a) The procedure of the component-based facial expression representation. (b) An example of occlusion detection in the region of the eyes.

We also continued our research on micro-expression analysis. Micro expressions are short, involuntary facial expressions which reveal hidden emotions. They are important for understanding humans’ deceitful behavior. Currently attention is elevated in both academic fields and in the media. However, while general facial expression recognition (FER) has been intensively studied for years in computer vision, little research has been done in automatically analyzing facial micro-expressions. The biggest obstacle to date has been the lack of a suitable database. We built a novel Spontaneous Micro-expression Database SMIC, which includes 164 micro-expression video clips elicited from 16 participants. Micro-expression detection and recognition tests were carried out by using LBP-TOP as the feature descriptor and the SVM as the classifier and test performance are provided as baselines. SMIC provides sufficient source material for comprehensive testing of automatic systems for analyzing micro-expressions; this has not been possible with any previously published database. The SMIC database is now available from the CMV webpage (

We also proposed a new method for encoding local binary patterns using a re-parametrization (RP) of the second local order Gaussian Jet. The information provided by RP generates robust and reliable histograms, and is thus suitable for different facial analysis tasks. The proposed method has two main processes: the RP process, which is used to compute needed parameters in a video sequence, and the encoding process, which combines the textural information provided by the LBP and the robustness of the re-parametrization. We showed that this approach can be used for recognizing facial micro-expressions from videos, obtaining competitive performance on the Spontaneous Micro-expression Corpus (SMIC) and the YORK Deception Detection Test.

A block diagram summarizing the different steps for computing the encoding using re-parametrization.

A project dealing with multimodal emotion recognition for affective computing, funded by the National Agency for Technology and Innovation (Tekes), was in progress. This is a joint effort with Prof. Tapio Seppänen’s Biosignals team, aiming at fusing facial expression information with physiological signals. Some promising preliminary results with realistic data were presented in the EMBC 2012 conference.

Visual speech animation

Video-realistic speech animation plays an important role in the area of affective human-computer/robot interactions. The goal of such animation technology is to synthesize a visually realistic face that can talk just like as we do. In this way, it can provide a natural platform for a human user and a robot to communicate with each other. Besides that, the techniques also have potential applications, such as generating synchronized visual cues for audios in order to help hearing-impaired people better capture information, or making human characters in movies.

For this research, we first recorded a video corpus within which a human character is asked to make different utterances. His/her mouth is then cropped from the original speech videos and used to learn generative models for synthesizing novel mouth images. A generative model considers the whole utterance contained in a video as a continuous process and represents it using a set of trigonometric functions embedded within a path graph. The transformation that projects the values of the functions to the image space is found through graph embedding. Such a model allows us to synthesize mouth images at arbitrary positions in the utterance. To synthesize a video for a novel utterance, the utterance is first compared with the existing ones from which we find the phoneme combinations that best approximate the utterance. When selecting video segments for synthesis, we loosen the traditional requirement of using triphone as the unit to allow segments to contain longer natural talking motion. Dense videos are sampled from the segments, concatenated and downsampled to train a video model which enables efficient time-alignment and motion smoothing for the final video synthesis. Different viseme definitions are used to investigate the impact of visemes on the video realism of the animated talking face.

Facial expression is one of the most cogent, naturally pre-eminent means for humans to communicate emotions, to clarify and stress what is said, to signal comprehension, disagreement, and intentions. Human-machine interaction can benefit significantly from utilizing an “emotional” information channel in the form of facial expressions in addition to speech. In order to extend visual speech synthesis to emotional speech, a new corpus of video data was recorded where a human subject makes different utterances in a certain emotional state. Then different areas of the face from the recorded video can be taken into account in the animation stage to generate images for the novel video, preserving emotions of the face that are realistic and dynamic. The mouth area still remains the most important as it contains visual information both from expressions and speech.

Human tracking and action recognition

Tracking objects in a camera network requires a large set of testing data and methods for algorithm validation. In 2012, we published a dataset called CMV100, and a few AdaBoost-based baseline methods for object re-identification in camera networks. The dataset contains 100 tracked objects and more than 400 videos. It consists of the original surveillance videos, foreground masks and an extensive amount of tracking data. Various image descriptors (color, texture, shape, etc.) are also provided for each object. The dataset is publicly available at our website (, along with software tools for processing the tracking data.

Sample images from the CMV100 dataset.

As a part of the Future School Research Second Wave project, we have been developing a mobile multimodal recording system, called MORE. The system consists of several microphones and cameras that can be used to record events in different learning environments. The acquired material is synchronized and can be easily browsed and analyzed afterwards to support pedagogic and didactic purposes.

The multimodal recording system (MORE).

Affective human-robot interaction

Research on affective human-robot interaction (HRI) has been made with the support of the Ubiquitous computing and diversity of communication (MOTIVE) program of the Academy of Finland (2009-2012) and the European Regional Development Fund (2010-2013), in collaboration with the Intelligent Systems Group. An experimental HRI platform, called Minotaurus, working in a smart environment has been developed, including a Segway Robotic Mobility Platform (RMP 200), equipped with laptop computers, Kinect depth sensors and video cameras, microphones, magnetic field sensors, an avatar display, and a ubiquitous multi-camera environment.

The development of the platform was further continued in 2012 and a robotic arm was integrated into the platform. Computer vision methods for different tasks were developed and integrated, including methods for localization, obstacle detection, facial image analysis, audio-visual speech synthesis, and human-robot interaction. One of the highlights in 2012 was the wide press coverage in national print and online media, following the “science breakfast” organized by the Academy of Finland for media representatives.

In 2009, we started developing a machine vision algorithm library on top of the popular OpenCV library. Currently it contains a wide variety of different algorithms, and has been successfully used as a part of the robot’s vision system. We have also developed a generic configurable processing node for processing image sequences from different image sources, including networked security cameras and Microsoft’s Kinect sensors attached to the moving robot. The algorithm set, image sources and all the relevant tuning parameters are dynamically configurable over the network. This enables us to adapt to changing environment and processing needs. After processing images, a node provides information about the found objects and their properties to the other parts of the system. This information can then be used, for example, to navigate the robot towards humans, avoid obstacles and visualize the environment for users. Our camera network itself has been extended with new cameras and all the cameras are now calibrated. This enables us to estimate the 3D coordinates of all detected objects moving in the environment.

The Minotaurus platform for affective human-robot interaction.

The first demonstrations on the whole Minotaurus system, operating in a smart environment, were presented in December 2012.

Camera-based interfaces for mobile devices

Improving usability and user experience with handheld mobile devices is a challenging problem, given the limited amount of interaction hardware of the device. However, multiple built-in cameras and their small size are under-exploited assets for creating novel solutions that are ideal for pocket size devices, but may not make much sense with desktop computers. Studies into alternatives to mobile user interaction have, therefore, become a very active research area in recent years. A key advantage of using cameras as an input modality is that it enables recognition of the 3D context in real-time, and at the same time provides for single-handed operations in which the users’ actions are interpreted without touching the screen or keypad. For example, the user’s position and gaze can be measured, in order to display true 3D objects even on a typical 2D screen.

In the research area of interactive mobile applications, we have continued the research on multimodal gesture controlled user interaction. The user interface works with the already existing hardware in recent mobile devices. The gestures are recognized from the front camera and the touch screen. With the user interface, the user can move the mouse cursor, click on objects and scroll documents. The functions provided to the user depend on the distance between the hand and the device. For this purpose, we have developed a new finger detection and tracking system based on color features.

We have also studied implementation of motion-based segmentation. The approach is based on estimating the displacement of a set of feature points. The algorithm developed exploits the similarity of consecutive video frames and uses estimates of feature displacements to propagate segmentation information from frame to frame. Performance of the segmentation is improved by exploiting information about the uncertainty of displacement estimates. The method was utilized in our hand gesture controlled user interface.

Sparse segmentation of background and foreground motions.

On mobile platforms, we have investigated the combination of interactive imaging and energy-efficient high performance computing to enable new user interactions. Using cameras as an input modality provides single-handed operations in which the users’ actions are recognized without interactions with the screen or keypad. The solution analyses and compares the means to reach interactivity and performance with sensor fusion and asymmetric parallel processing, taking advantage of the multiple computing resources present on the current mobile platforms such as Graphics Processing Units and Digital Signal Processors. We have constructed an application prototype where the determination of the user’s position and gaze is analyzed in real time, a technique that enables the display of true 3D objects even on a typical 2D screen. In the developed interface, we have integrated a series of interaction methods where the user motion and camera input realistically control the viewpoint on a 3D scene. The head movement and gaze can be used to interact with hidden objects in a natural manner just by looking at them.

An example of interaction without using the touch screen. The user tilts the display for a second to show the bookmarks on a mobile browser.

Vision systems engineering

Vision systems engineering research provides guidelines for identifying attractive computing approaches, architectures, and algorithms for industrial systems. In practice, solutions from low-level image processing to even equipment installation and operating procedures are considered simultaneously. The roots of this expertise are in our visual inspection studies in which we met extreme computational requirements already in the early 1980’s, and we have contributed to the designs of several industrial solutions. We have also applied our expertise to applications intended for smart environments and embedded platforms.

The framework for a lumber tracing system, a research process that was started in 2011, has developed further. A notable improvement in tracing accuracy with a minimal increase in computational complexity was achieved using 1D projection signal based image alignment. Projection signals are generated from the statistical properties of the image. Using 1D signals instead of typical interest point based image registration helps to keep the computational complexity of the system low. Actual matching process after alignment is carried out using local descriptor matrices. The improved system was evaluated using several thousand images from actual sawmill process and produced near perfect matching results. The research has aroused interest among industrial partners.

Image of a board in the beginning (top) and in the end of the manufacturing process (bottom). Despite changes in color, the shape of the 1D signal (red line) formed from suitable properties stays relatively unaltered.

Camera based strength grading research was also continued. Earlier we have developed a solution that employs real-time feature extraction, classification, and the Finite Element Method (FEM) combined into an adaptive learning scheme. Now, methods for grain edge detection in low quality images under challenging lighting conditions have been explored. New ways to use grain based information in visual strength grading were also tested. The ultimate goal is to find the answer of whether camera based strength grading can achieve the same accuracy as conventional mechanic methods like bending machines. Good results were achieved for Finnish Pine, in which knots are common, and are in many cases one of the most important reasons for reduced strength qualities.

Original image (left) and the output of grain detector (right).

The work on the energy efficient architectures and signal processing topic has been carried out, strongly supported by the DORADO project funded by the Academy of Finland. The project creates tools for generating efficient embedded software/hardware solutions. Platform independent high-level specifications are used to describe parallelism at data, instruction, task and memory levels. The target is many-core systems that are becoming the key approach in improving computing throughput. The automation of the hardware design process is emphasized, ultimately for the generation of efficient many-core application-specific processors. The expected results are high impact techniques for designing and programming heterogeneous systems: automated, platform-independent development tool chains that exhibit “performance portability” across different computing platforms and platform variations.

Energy-efficient, and yet programmable, solutions for various applications have been developed. The H.265 HEVC video coding methodology that is going to be standardized in January 2013 contains a computationally very challenging part called the Adaptive Loop Filter (ALF). In our group, ALF was implemented with a programmable processor that is capable of real-time processing at HDTV resolutions. Similarly, a programmable, yet efficient solution for a ZigBee baseband radio was finished. The programmable ZigBee radio enables placing the communication infrastructure on the same programmable chip together with other applications; this opens up new possibilities for creating wireless sensor nodes for ubiquitous computing. Finally, a programmable, but efficient solution for real-time extraction of Local Binary Patterns was finalized in our group.

Our research on energy efficient signal processing for wireless sensor nodes has also continued. The research has been concentrating on node designs, but the same principles can be applied also in general purpose designs. A real-life demonstration together with VTT was carried out by deploying our signal processing module on a complete Flash FPGA based sensor node solution by VTT. The I/O interfaces for our transport triggered architecture (TTA) based signal processing module were implemented, and optimized designs for most common signal processing tasks such as FFT were developed and implemented using TTA.

Multiprocessing design automation was continued, together with the French INSA research institute in Rennes. INSA has adopted the work started in our group and has been continuing the development of multiprocessors for video processing, both independently and together with our group. In our group, this work has resulted in an automated tool chain that maps a program to several customized signal processors that can be placed on an FPGA board. Starting from 2013, the research topic of energy efficient architectures and signal processing will be strengthened through the US-Finnish cooperative project CREAM that combines energy-efficient computations with dataflow-based design automation.

Biomedical image analysis

Analysis of medical and biological images is an important application area for computer vision. We have recently started collaboration with Biocenter Oulu where the aim is to apply state-of-the-art computer vision algorithms to research problems in cell biology. Modern bioimaging results in an enormous amount of data, and efficient extraction of available information using existing computational image analysis tools has emerged as a significant bottleneck. In a joint project funded by Biocenter Finland and the University of Oulu, we have started to set up and develop a novel customizable image analysis service that could be provided to researchers working with biological images.

In late 2012, we also started a new project called “Algorithm-based combination and analysis of multidimensional video and open data” (ABCdata), funded by Tekes (strategic research opening). In this project, the objective is to analyze 3D microscopic image sequences, and develop tools for cell segmentation and tracking, as well as for detection of cellular events such as mitosis and apoptosis in conditions that mimic human tissues, which makes this research unique from the scientific point of view.

Cell migration analysis is an essential tool when making comparisons between the effects of different drugs for the treatment of diseases. Traditionally, the tracking task has been carried out by manually annotating the cells in time-lapse microscopy images. It is easy to understand that this kind of approach is very laborious and error prone when handling large numbers of cells over long time periods. Phase-contrast microscopy is the most commonly used imaging technique to visualize living cells, due to the simple configuration of microscopy instruments, low costs, and cell visualization without the use of fluorescent labels and phototoxicity. The aforementioned imaging solution, combined with computer based automatic analysis approaches, is a powerful method of characterizing cell migration in different culturing conditions. Given these motivations, we have developed an automatic cell segmentation and tracking method targeted at low magnification phase-contrast microscopy images. The system is able to segment and track a large number of cells in confluent cultures.

Cell tracking from phase-contrast images. Cell trajectories are superimposed over the image with different colors.

CMV is also a member of the Oulu BioImaging Network (OBI), which is a forum for promoting collaboration between the research groups and experts working in the bioimaging area at the University of Oulu and Oulu University Hospital. It is an associated partner to Euro-BioImaging, which aims at creating a coordinated and harmonized plan for the deployment of biomedical imaging infrastructure in Europe. In 2012, the first OBI Workshop was held at the Linnanmaa campus, where also the research activities of CMV were widely presented.

Exploitation of Results

Many researchers have adopted and further developed our methodologies. Our research results are used in a wide variety of different applications around the world. For example, the Local Binary Pattern (LBP) methodology and its variants are used in numerous image analysis tasks and applications, such as biomedical image analysis, biometrics, industrial inspection, remote sensing and video analysis. The researchers in CMV have actively published the source codes of their algorithms for the research community, and this has increased the exploitation of the results. For example, in 2011 we released a Matlab toolbox for geometric calibration of Kinect with an external camera; this has received much interest from the other researchers worldwide. By the end of 2012, it has been downloaded over 5,000 times.

The results have been also utilized in our own projects. For example, we collaborate with Prof. Tapio Seppänen’s Biomedical Engineering Group in the area of multimodal emotion recognition for affective computing, combining vision with physiological biosignals. Together with Prof. Seppänen and Dr. Seppo Laukka (Department of Educational Sciences and Teacher Education) and Prof. Matti Lehtihalmes (Faculty of Humanities) we are also participating in the FSR Second Wave project where we have developed a Mobile Multimodal Recording System (MORE) that will be used in classroom research in various schools.

Most of our funding for both basic and applied research comes from public sources such as the Academy of Finland and Tekes, but besides these sources, CMV also conducts research by contract which is funded by companies. In this way, our expertise is being utilized by industry for commercial purposes and even in consumer products, like mobile devices.

The CMV has actively encouraged and supported the birth of research group spin-outs. This gives an opportunity for young researchers to start their own teams and groups. Side results are the spin-out enterprises. According to our experience, their roots are especially in the strands of “free academic research”. There are currently altogether five research based spin-outs founded directly on the machine vision area. The number of spin-outs could be extended up to sixteen when taking into account the influence of the CMV´s thirty-year old history and the spin-out companies from the spin-out research groups in the area of computer science and engineering in total.

Future Goals

In recent months we have put substantial effort into preparing our research and operation plan for the coming years. This was needed, for example, for the Finnish Centre of Excellence in Machine Vision Research proposal that we submitted in response to the Academy of Finland’s Call for Centre of Excellence Programme 2014-2019. From very tough competition, our proposal was selected for the 2nd stage to be evaluated in 2013. If accepted, our resources for carrying out well focused cutting-edge research, for example, on perceptual interfaces for face to face interaction, multimodal analysis of emotions, 3D computer vision, and energy-efficient architectures for embedded vision systems, would be significantly strengthened. We also have plans to further deepen our collaboration with international and domestic partners. For this purpose, we are participating in new European project proposals. Close interaction between basic and applied research has always been a major strength of our research unit. The scientific output of the CMV has been increasing significantly in recent years. With this we expect to have much new potential for producing novel innovations and exploitation of research results in collaboration with companies and other partners.



professors, doctors


doctoral students






person years


External Funding



Academy of Finland

752 000

Ministry of Education and Culture

121 000


411 000

domestic private

175 000


201 000


1 660 000


Doctoral Theses

Sangi P (2013) Object motion estimation using block matching with uncertainty analysis. Acta Universitatis Ouluensis C 443.


Selected Publications

Bayramoglu N, Heikkilä J & Pietikäinen M (2012) Combining textural and geometrical descriptors for scene recognition. In: ECCV Workshops (CDC4CV), Lecture Notes in Computer Science 7584: 32-41.

Bordallo López M, Hannuksela J, Silvén O & Fan L (2012) Head-tracking virtual 3-D display for mobile devices. Proc. Computer Vision and Pattern Recognition Workshops (CVPRW), 27-34.

Bordallo López M, Hannuksela J, Silvén O & Vehviläinen M (2012) Interactive multi-frame reconstruction for mobile devices. Multimedia Tools and Applications, in press (online first).

Bordallo López M, Niemelä K & Silvén O (2012) GPGPU-based surface inspection from structured white light. Proc. SPIE 8295, Image Processing: Algorithms and Systems X; and Parallel Processing for Imaging Applications II, 829510.

Boutellier J, Lundbom I, Janhunen J, Ylimäinen J & Hannuksela J (2012) Application-specific instruction processor for extracting local binary patterns. Proc. Conference on Design and Architectures for Signal and Image Processing (DASIP), Karlsruhe, Germany, 1-8.

Boutellier J, Raulet M & Silvén O (2012) Automatic hierarchical discovery of quasi-static schedules of RVC-CAL dataflow programs. Journal of Signal Processing Systems, 6 p.

Chai Y, Rahtu E, Lempitsky V & Van Gool L & Zisserman A (2012) TriCoS: A tri-level class-discriminative co-segmentation method for image classification. In: Computer Vision, ECCV 2012 Proceedings, Lecture Notes in Computer Science 7572: 794-807.

Chan CH, Tahir M, Kittler J & Pietikäinen M (2013) Multiscale local phase quantisation for robust component-based face recognition using kernel fusion of multiple descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, in press.

Chen J, Zhao G & Pietikäinen M (2012) Unsupervised dynamic texture segmentation using local descriptors in volumes. Proc. 21st International Conference on Pattern Recognition (ICPR 2012), Tsukuba, Japan, 3622-3625.

Chen J, Zhao G, Salo M, Rahtu E & Pietikäinen M (2013) Automatic dynamic texture segmentation using local descriptors and optical flow. IEEE Transactions on Image Processing 22(1): 326-339.

Guo Y, Zhao G & Pietikäinen M (2012) Discriminative features for texture description. Pattern Recognition 45(10): 3834-3843.

Guo Y, Zhao G & Pietikäinen M (2012) Dynamic facial expression recognition using longitudinal facial expression atlases. In: Computer Vision, ECCV 2012 Proceedings, Lecture Notes in Computer Science 7573: 631-644.

Hadid A, Ghahramani M, Kellokumpu V, Pietikäinen M, Bustard J & Nixon M (2012) Can gait biometrics be spoofed? Proc. 21st International Conference on Pattern Recognition (ICPR 2012), Tsukuba, Japan, 3280-3283.

Hadid A & Pietikäinen M (2013) Demographic classification from face videos using manifold learning. Neurocomputing 100: 197-205.

Herrera Castro D, Kannala J & Heikkilä J (2012) Joint depth and color camera calibration with distortion correction. IEEE Transactions on Pattern Analysis and Machine Intelligence 34(10): 2058-2064.

Hong X, Zhao G, Pietikäinen M & Chen X (2012) Combining local and global correlation for texture description. Proc. 21st International Conference on Pattern Recognition (ICPR 2012), Tsukuba, Japan, 2756-2759.

Huang X, Zhao G, Pietikäinen M & Zheng W (2012) Spatiotemporal local monogenic binary patterns for facial expression recognition. IEEE Signal Processing Letters 19(5): 243-246.

Huang X, Zhao G, Zheng W & Pietikäinen M (2012) Towards a dynamic expression recognition system under facial occlusion. Pattern Recognition Letters 33(16): 2181-2191.

Kannala J & Rahtu E (2012) BSIF: binarized statistical image features. Proc. 21st International Conference on Pattern Recognition (ICPR 2012), Tsukuba, Japan, 1363-1366.

Komulainen J, Hadid A & Pietikäinen M (2013) Face spoofing detection using dynamic texture. In: ACCV 2012 Workshops, Part I (LBP 2012), Lecture Notes in Computer Science 7728: 146-157.

Kortelainen J, Huang X, Li X, Laukka S, Pietikäinen M & Seppänen T (2012) Multimodal emotion recognition by combining physiological signals and facial expressions: a preliminary study. Proc. the 34th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC’12), San Diego, CA, 5238-5241.

Liang J, Ye Q, Chen J & Jiao J (2012) Evaluation of local feature descriptors and their combination for pedestrian representation. Proc. 21st International Conference on Pattern Recognition (ICPR 2012), Tsukuba, Japan, 2496-2499.

Linder N, Konsti J, Turkki R, Rahtu E, Lundin M, Nordling S, Ahonen T, Pietikäinen M & Lundin J (2012) Identification of tumor epithelium and stroma in tissue microarrays using texture analysis. Diagnostic Pathology 2012, 7:22.

Lizarraga-Morales R, Guo Y, Zhao G & Pietikäinen M (2013) Dynamic texture synthesis in space with a spatio-temporal descriptor. In: ACCV 2012 Workshops, Part I (LBP 2012), Lecture Notes in Computer Science 7728: 38-49.

McCool C, Marcel S, Hadid A, Pietikäinen M, Matejka P, Cernocky J, Poh N, Kittler J, Larcher A, Levy C, Matrouf D, Bonastre J-F, Tresadern P & Cootes T (2012) Bi-modal person recognition on a mobile phone: using mobile phone data. Proc. 2012 International Conference on Multimedia and Expo Workshops, 635-640.

Määttä J, Hadid A & Pietikäinen M (2012) Face spoofing detection from single images using texture and local shape analysis. IET Biometrics 1(1): 3-10.

Nyländen T, Boutellier J, Nikunen K, Hannuksela J & Silvén O (2012) Reconfigurable miniature sensor nodes for condition monitoring. Proc. International Conference on Embedded Computer Systems (SAMOS), Samos, Greece, 113-119.

Pedone M & Heikkilä J (2012) Local phase quantization descriptors for blur robust and illumination invariant recognition of color textures. Proc. 21st International Conference on Pattern Recognition (ICPR 2012), Tsukuba, Japan, 2476-2479.

Pfister T & Pietikäinen M (2012) Automatic identification of facial clues to lies. SPIE Newsroom, 4 January.

Rahtu E, Heikkilä J, Ojansivu V & Ahonen T (2012) Local phase quantization for blur-insensitive image analysis. Image and Vision Computing 30(8): 501-512.

Rezazadegan Tavakoli H, Rahtu E & Heikkilä J (2013) Temporal saliency for fast background subtraction. In: ACCV 2012 Workshops, Part I (BMC 2012), Lecture Notes in Computer Science 7728: 321-326.

Ruiz-Hernandez JA, Crowley JL, Combe C, Lux A & Pietikäinen M (2012) Robust and computationally efficient face detection using Gaussian derivative features of higher orders. In: ECCV Workshops (BeFIT), Lecture Notes in Computer Science 7585: 567-577.

Takala V & Pietikäinen M (2012) CMV100: A dataset for people tracking and re-identification in sparse camera networks. Proc. 21st International Conference on Pattern Recognition (ICPR 2012), Tsukuba, Japan, 1387-1390.

Tresadern P, Cootes TF, Poh N, Matejka P, Hadid A, Levy C, McCool C & Marcel S (2013) Mobile biometrics: Combined face and voice verification for a mobile platform. IEEE Pervasive Computing 12(1): 79-87.

Varjo S, Hannuksela J & Silven O (2012) Direct imaging with printed microlens arrays Proc. 21st International Conference on Pattern Recognition (ICPR 2012), Tsukuba, Japan, 1335-1358.

Ylimäki M, Kannala J, Holappa J, Heikkilä J & Brandt S (2012) Robust and accurate multi-view reconstruction by prioritized matching. Proc. 21st International Conference on Pattern Recognition (ICPR 2012), Tsukuba, Japan, 2673-2676.

Ylioinas J, Hadid A, Guo Y & Pietikäinen M (2013) Efficient image appearance description using dense sampling based local binary patterns. In: ACCV 2012 Proceedings, Part III, Lecture Notes in Computer Science 7726: 375-388.

Ylioinas J, Hadid A & Pietikäinen M (2012) Age classification in unconstrained conditions using LBP variants. Proc. 21st International Conference on Pattern Recognition (ICPR 2012), Tsukuba, Japan, 1257-1260.

Zhao G, Li X, Pietikäinen M, Huang X & Pfister T (2012) Computer vision research for expression and micro-expression recognition. International Journal of Psychology, Special issue, Supplement (ISSN: 0020-7594) 47: 145-145.

Zhao G, Ahonen T, Matas J & Pietikäinen M (2012) Rotation-invariant image and video description with local binary pattern features. IEEE Transactions on Image Processing 21(4): 1465-1467.

Zhou Z, Zhao G, Guo Y & Pietikäinen M (2012) An image-based visual speech animation system. IEEE Transactions on Circuits and Systems for Video Technology 22(10):1420-1432.

Yviquel H, Boutellier J, Raulet M, Casseau E (2013) Automated design of networks of transport-triggered architecture processors using dynamic dataflow programs. Signal Processing: Image Communication (Special Issue on Reconfigurable Media Coding), Elsevier, to appear.

Last updated: 15.4.2014