Infotech Oulu Annual Report 2013 - Center for Machine Vision Research (CMV)

Professor Matti Pietikäinen, Professor Janne Heikkilä, and Professor Olli Silvén, Department of Computer Science and Engineering , University of Oulu

mkp(at), jth(at), olli(at)

Background and Mission

The Center for Machine Vision Research (CMV) is a creative, open and internationally attractive research unit. It is renowned world-wide for its expertise in computer vision.

The center has a strong record, which now spans for over 32 years, of scientific merits on both basic and applied research on computer vision. It has achieved ground-breaking research results in many areas of its activity, including texture analysis, facial image analysis, geometric computer vision, and energy-efficient architectures for embedded systems. The mission of the center is to develop novel computer vision methods and technologies that create the basis for emerging innovative applications.

In February 2014, the CMV had three professors, one Associate Professor and one FiDiPro Professor, 16 senior or postdoctoral researchers, and 25 doctoral students or research assistants. The unit is highly international: 50% of our researchers (doctors, PhD students) are from abroad. CMV has an extensive international collaboration network in Europe, the USA, and China. The mobility of the researchers between leading research groups abroad, and vice versa, is intense. Within the Seventh Framework Programme FP7, the CMV currently participates in the project consortium of Trusted Biometrics under Spoofing Attacks (TABULA RASA). It also participates in two European COST actions.

The main areas of our research are computer vision methods, human-centered vision systems and vision systems engineering. The results of the research have been widely exploited in industry, and contract research forms a part of our activities.

Highlights and Events in 2013

In September, the Center for Machine Vision Research (CMV) was re-selected to Infotech Oulu for the period of 2014-2017, attaining the highest score among all applicant groups. In January 2014, the results of the Research Assessment Exercise (RAE 2013) of the University were released. An international panel, aided with a bibliometric analysis made by Leiden University, ranked CMV with the highest score 6 (outstanding), representing the international cutting edge in its field.

CMV played a significant role in the ninth Conferment Ceremony of the University of Oulu in May. Professor Matti Pietikäinen acted as the Conferrer of Degrees for the Faculty of Technology. Dr. Juho Kannala was chosen as Primus Young Doctor - the first young doctor whose degree was conferred in the ceremony - chosen according to his excellent merits. The University honored distinguished scientists or influential members of society with the award of an Honorary Doctorate. CMV core partner, Prof. Stan Z. Li (Chinese Academy of Sciences), received this award for his contributions to image pattern recognition and biometrics.

Honorary Doctor, Professor Stan Li at the Conferment Ceremony.


Two of the CMV senior researchers strengthened their role as independent team leaders. Dr. Guoying Zhao received a tenure track position as Associate Professor at the University of Oulu from January 2014 until the end of 2018. Dr. Abdenour Hadid received five-year Academy of Finland Research Fellow funding from September 2013 onwards. Both positions were highly competative.

The IET Biometrics Premium Award 2013 was given to CMV. The Institution of Engineering and Technology (IET) presents one Premium Award per each for its journal to recognize the best research papers published during the previous year. CMV´s winning paper “Face spoofing detection from single images using texture and local shape analysis” is authored by Jukka Komulainen (né Määttä), Abdenour Hadid and Matti Pietikäinen. It was published in 2012 in the IET Biometrics journal, volume 1, issue 1. The winners were given a certificate at the 2013 IET Achievement Awards in London in November.

CMV has had a significant role in the TABULA RASA project, appraised as a success story by the European Commission and followed by a large media campaign. The consortium comprises 12 different organizations across seven countries that have worked together over a period of three years in improving the security of biometric systems. CMV has provided expertise in both face and gait recognition using Local Binary Patterns in developing ways to detect the spoofing attacks. The same LBP methodology has also been utilized by other TABULA RASA partners. In addition, CMV has led the work package in evaluating the vulnerabilities in current biometric systems.

CMV contributed significantly to the 18th Scandinavian Conference on Image Analysis (SCIA 2013), organized in June in Espoo, Finland. The CMV Leader, Prof. Matti Pietikäinen co-chaired the SCIA conference with Prof. Erkki Oja. CMV Vice Leader, Prof. Janne Heikkilä acted as one of the area chairs. Prof. Pietikäinen and Dr. Guoying Zhao lectured in a tutorial “Image and video analysis with local binary patterns”. CMV researchers presented altogether as many as ten papers.

Scientific Progress

The current main areas of research are: 1) Computer vision methods, 2) Human-centered vision systems, and 3) Vision systems engineering. In addition to these main areas, our research activities include biomedical image analysis where we collaborate with Biocenter Oulu.

Computer vision methods

The group has a long and highly successful research tradition in two important generic areas of computer vision: texture analysis and geometric computer vision. In the last few years, the research in computer vision methods has been broadened to cover also two other areas: computational photography, and object detection and recognition. The aim in all these areas is to create a methodological foundation for the development of new vision-based technologies and innovations.

Texture analysis

Texture is an important characteristic of many types of images and can play a key role in a wide variety of applications of computer vision and image analysis. The CMV has long traditions in texture analysis research, and ranks among the world leaders in this area. The Local Binary Pattern (LBP) texture operator has been highly successful in numerous applications around the world, and has inspired plenty of new research on related methods, including the blur-insensitive Local Phase Quantization (LPQ) method, also developed at CMV.

We proposed a simple and robust local descriptor, called the robust local binary pattern (RLBP). The basic LBP works successfully in many domains, such as texture classification, human detection and face recognition. However, an issue of LBP is that it is not so robust to the noise present in the image. We improved the robustness of LBP by changing the coding bit of LBP. Experimental results on the Brodatz and UIUC texture databases show that RLBP outperforms other widely used descriptors (e.g., SIFT, Gabor, MR8 and LBP) and other variants of LBP (i.e., completed LBP), especially when we add the noise in the images. In addition, experimental results on human face recognition also show a promising performance, comparable to the best known results on the Face Recognition Grand Challenge (FRGC) dataset.

Robust local binary pattern.


Recently, local quantized patterns (LQP) was proposed for using vector quantization to code complicated patterns with a large number of neighbors and several quantization levels. It uses a lookup table technique to map patterns into the corresponding indices. Since LQP only considers the sign-based difference, it misses some discriminative information. We proposed completed local quantized patterns (CLQP) for texture classification. The magnitude and orientation-based differences are utilized to complement the sign-based difference for LQP. In addition, vector quantization is exploited to learn three respective codebooks for local sign, magnitude and orientation patterns. For reducing the unnecessary computational time of initialization, we used preselected dominant patterns as the initialization in vector quantization. Our experimental results show that CLQP outperforms well-established features, including LBP, LTP, CLBP and LQP on a range of challenging texture classification problems and an infant pain detection problem.

Overview of completed local quantized patterns.



For improving the accuracy of LBP-based operators by including texture image intensity characteristics in the operator, we proposed the utilization of a shifted step function to minimize the quantization error of the step function to obtain more discriminative operators. Features obtained from the shifted step function are simply fused together to form the final histogram. This model is generalized, and can be integrated with other existing LBP variants to reduce quantization error of the step function for texture classification. The proposed method is integrated with multiple LBP-based feature descriptors, and evaluated on publicly available texture databases (Outex_TC_00012 and KTH-TIPS2b) for texture classification. Experimental results demonstrate that it not only improves the performance of operators with which it is integrated, but it also compares favorably to the state of the art in texture classification.

Dynamic texture (DT) is an extension of texture to the temporal domain. How to segment DTs is a challenging problem. We addressed the problem of segmenting DT into disjoint regions. DTs might be different from their spatial mode (i.e., appearance) and/or temporal mode (i.e., motion field). To this end, we developed a framework based on the appearance and motion modes. For the appearance mode, we use a new local spatial texture descriptor to describe the spatial mode of DT; for the motion mode, we use the optical flow and the local temporal texture descriptor to represent the temporal variations of DT. In addition, for the optical flow, we use the Histogram of Oriented Optical Flow (HOOF) to organize them. To compute the distance between two HOOFs, we developed a simple, effective and efficient distance measure based on Weber Law. Furthermore, we also addressed the problem of threshold selection by proposing a method for determining thresholds for the segmentation method by offline supervised statistical learning. Experimental results show that our method provides very good segmentation results compared to the state-of-the-art methods in segmenting regions that differ in their dynamics.

Illustration of DT’s; (a) DT’s are different in their spatial mode (i.e., appearance) but show a similar temporal mode (i.e., motion), (b) DT’s are different in their temporal mode but show a similar spatial mode, and (c) the similar temporal/spatial mode of DT’s are cluttered.


Video texture synthesis is the process of providing a continuous and infinitely varying stream of frames, which plays an important role in computer vision and graphics. However, a challenging problem remains in generating high quality synthesis results. Considering the two key factors that affect the synthesis performance - frame representation and blending artifacts -, we improved the synthesis performance from two perspectives: first, effective frame representation is designed to capture both the image appearance information in the spatial domain and the longitudinal information in the temporal domain. Second, artifacts that degrade the synthesis quality are significantly suppressed based on a diffeomorphic growth model. The proposed video texture synthesis approach has two major stages: the video stitching stage and the transition smoothing stage. In the first stage, a video texture synthesis model is proposed to generate an infinite video flow. To find similar frames for stitching video clips, we presented a new spatial-temporal descriptor to provide effective representation for different types of dynamic textures. In the second stage, a smoothing method is proposed to improve synthesis quality, especially from the point of view of temporal continuity. It aims to establish a diffeomorphic growth model to emulate local dynamics around stitched frames. The proposed approach is thoroughly tested on public databases and videos from Internet, and is evaluated in both qualitative and quantitative ways.

Overview diagram of the whole DT synthesis method, which consists of two main steps: video stitching using a Multiframe LBP-TOP signature (left), and transition smoothing using deformable image registration and growth model estimation (right).


Computational photography

In computational photography, the aim is to develop techniques for computational cameras that give more flexibility to image acquisition and enable more advanced features to be employed, going beyond the capabilities of traditional photography. These techniques often involve use of special optics and digital image processing algorithms that are designed to eliminate the degradations caused by the optical system and viewing conditions. In our recent work, new imaging solutions such as microlens array based sensing technology, and novel algorithms for facilitating image acquisition, coding and reconstruction have been the key areas of interest.

 In our research related to light field imaging, we have investigated techniques for compression of light field images. In particular, our work has addressed an asymmetric scenario where the encoder should have low computational complexity, allowing its implementation on resource-limited devices. In the first phase, we have surveyed potential techniques, such as sampling approaches, image transforms, distributed source coding and compressed sensing techniques for the purpose.

In addition to the further development of microlens array based detection of parasites on a mobile platform, the automatic parasite detection was studied in collaboration with the Institute for Molecular Medicine Finland and the Karolinska Institute. The on-chip imaging was successfully used to detect Schistosoma haematobium eggs. The future studies will be focused on improving the result image quality for data captured with the microlens array cameras and data captured with in-line holography.

Image registration is one of the most important and most frequently discussed image processing topics in the literature, and it is a crucial preprocessing step in all image analysis tasks in which the final information is obtained from a combination of various data sources, like image fusion, change detection, multichannel image restoration, superresolution, etc. In many cases, the images to be registered are inevitably blurred. The blur may originate from camera shake, scene motion, inaccurate focus, atmospheric turbulence, sensor imperfection, low sampling density and other factors. We developed an original registration method designed particularly for registering blurred images. Our method works for unknown blurs, assuming the PSF’s exhibit N-fold rotational symmetry. We proved experimentally its good performance which is not dependent on the amount of blur.

Two images blurred by two different symmetric PSF’s (left and middle). The result of multichannel blind deconvolution after registering the two images with our method (right).


Object detection and recognition

Just by glancing at an object, for example an apple or a building, a human is immediately aware of many of the object qualities. For instance, the apple may be red or green, and the building may be made of concrete or wood. These properties can be used to describe the objects and further qualify them. Currently, even the best systems for artificial vision have a very limited understanding of objects and scenes. For instance, state-of-the-art object detectors model objects as distributions of simple features (e.g., HOG or SIFT), which capture a blurred statistics of the two-dimensional shape of the objects. Color, material, texture, and most of the other object attributes are likely ignored in the process. Fine grained object classification and attributes have recently gained a lot of attention in computer vision, but the field is still in its infancy. For instance, currently there are only a few small databases.

Our research objective is to develop novel methods to reliably extract a diverse set of attributes from images, and to use them to improve the accuracy, informativeness, and interpretability of the object models. The goal is to combine advances in discrete-continuous optimization, machine learning, and computer vision, to significantly advance our understanding of visual attributes and produce new state-of-the-art methods for their extraction. We do this in three ways: by developing learning approaches, which utilize mid-level image segments to automatically find the combination of object parts that correspond to, possibly small, differences between two object classes (e.g. two bicycle models); by utilizing dependencies for learning complex attribute combinations using structured output models; by using crowd sourcing tools to discover a comprehensive vocabulary that is used by humans to describe objects when performing a particular task (e.g. browsing bicycle catalogue).

In 2012, CMV researchers participated in a workshop led by Prof. A. Vedaldi at Johns Hopkins University. As a part of the workshop, we began to collect a new extensive dataset that is intended to serve as a benchmark for detailed object attribute and part recognition. Part of this data was published in the Fine-Grained Visual Categorization workshop, organized in conjunction with the Conference on Computer Vision and Pattern Recognition (CVPR) 2013. This data also became a part of the ImageNet FGVC challenge, which was in conjunction with the International Conference on Computer Vision (ICCV) 2013.

Efficiency is one of the key issues for real-time object detection. Although nonlinear classifiers are more powerful than the linear ones, few existing methods integrate them as the weak classifiers into the commonly used boosting framework. The reason mainly lies in that the conventional nonlinear classifiers usually have high computational costs. To address this problem, we proposed an efficient nonlinear weak classifier, named the Partition Vector weak Classifier (PVC). PVC is a weighted combination of a series of additive kernel functions of the (input) feature vector with respect to a set of pre-defined vectors, namely the Partition Vectors (PVs). The PVC’s learning includes three key steps: encoding, hyper-plane learning, and decoding. The obtained classifiers can be further accelerated using piecewise constant functions, such that it ensures a computational cost proportional to the dimension of the features during evaluation, as do the conventional linear classifiers.

PVC learning. (a) Encoding maps samples in the original space to the implicit space. (b) Learning the hyper-plane using the encoded samples in the implicit space. (c) Decoding transforms the learnt hyper-plane to the original space.


We demonstrated our algorithm in detection tasks for multiple classes of objects, including pedestrians, cars, bicycles, and cows, as illustrated below. Experimental results show that the boosted PVC significantly improves both the learning and evaluation efficiency of nonlinear SVMs to the level of boosted linear classifiers, without losing any of the high discriminative power.

Illustration of some detection examples. The first three columns, the fourth and the fifth columns, the first two rows of the last column, and the last row of the last column show the detection results of pedestrian detection on the INRIA dataset, and the results of car, bike, dog, and cow detection on the PASCAL VOC 2007 dataset, respectively.


Geometric computer vision

Images are 2D projections of the 3D world, which makes inferring 3D information an ill-posed problem from a single viewpoint, and a challenging problem from multiple views. Geometric computer vision provides the tools for establishing the relationship between the image and the 3D scene. While the fundamental theory of geometric computer vision has been developed already in the previous century, still for example, automatic construction of 3D scene models from multiple photographs is a relevant problem, and is subject to active research. Furthermore, new depth cameras, such as the Kinect sensor, have boosted rapid progress in scene modeling. Intelligent machines that require 3D information from the environment are a natural application area for geometric computer vision. Another application area that has gained much attention in the last few years is mixed reality, where real and virtual objects co-exist in the same environment. Wearable computers such as Google Glass have created a strong demand for such technology. Mixed reality has been also the key driver in our recent work on geometric computer vision.

During the last year, we have been developing a method for creating reconstructions from multiple photographs. Our previous method published in ICPR 2012, which basically takes a set of images as input, and outputs a point cloud in three-dimensional space, was extended with a couple of improvements. The improvements make the point clouds both denser and more accurate, without notable loss in computational efficiency. Hence, compared with the state of the art, our method produces reconstructions of similar or better quality, and is significantly faster. During the year, we also studied methods for generating triangular meshes from point clouds. The goal in this part is to turn a point cloud into a compact and watertight mesh of connected triangles so that it could be used, for example, to create virtual reality models. The pipeline from a set of images to a triangular mesh is illustrated in the figure below.

Reconstruction pipeline from a set of images to a triangluar mesh.


Augmented reality is an exciting area of research that promises a number of applications such as games, driver assistance systems, map overlay, tourist guidance, and many more. A key component of any augmented reality system is the Simultaneous Localization and Mapping (SLAM) module. The SLAM module reconstructs the 3D structure of the scene while simultaneously estimating the camera position. We are researching a new SLAM framework that is able to handle both triangulated and non-triangulated features simultaneously. It allows the user to move the camera without restrictions and thus provides more freedom than the current state of the art.

The SLAM module creates a 3D point cloud of the scene from 2D images.


Establishing point-to-point correspondence between images is a fundamental problem in many applications such as augmented reality. In the past decade, many local image detection/description techniques have been developed to detect locations in images that are suitable for matching and to describe the visual properties of those points using the local image region around them. Robustness and computational efficiency are the two main criteria for choosing a particular local descriptor for an application. The need for real-time speed on video stream data has led to the emergence of many fast descriptors (Random Ferns, ORB etc) using simple pixel level comparisons which can be computed and matched efficiently. But they are not robust against camera pose variation. Robust features like SIFT/SURF have been successfully used under many challenging conditions. These robust features are computationally expensive. Our recent study aims at exploring different ways of accelerating the point-to-point correspondence of a matching process involving robust descriptors. Our initial experiments with SIFT vs. ORB have provided some hope of achieving fast point matching using SIFT descriptors while maintaining a superior accuracy over ORB descriptors. We are in the process of expanding the scope of the experiments to larger datasets involving widely varying scenarios.

Human-centered vision systems

In future ubiquitous environments, computing will move into the background, being omnipresent and invisible to the user. This will also lead to a paradigm shift in human-computer interaction (HCI) from traditional computer-centered to human-centered systems. We expect that computer vision will play a key role in such intelligent systems, enabling, for example, natural human-computer interaction, or identifying humans and their behavior in smart environments.

Face recognition and biometrics

CMV continued playing a key role in the FP7 EU project TABULA RASA which has recently been selected as a success story by the European commission. TABULA RASA aims at researching, developing and evaluating countermeasures for spoofing attacks against biometric systems. In this context, we have proposed and evaluated advanced countermeasures for face and gait biometric modalities. We also co-organized, jointly with UNICA from the University of Cagliari (Italy), a spoofing challenge at the International Conference on Biometrics (ICB 2013) which was held in Madrid in June 2013. The aim of the challenge was to demonstrate the effects of spoofing and anti-spoofing in fingerprint biometrics and to raise awareness of the spoofing threats to biometric systems.

An example of a mask that can be used to attack (spoof) a biometric system.


We hence continued exploring promising directions for face spoofing detection within the context of the TABULA RASA project, in addition to analysis of facial texture and motion patterns that have shown to be effective in our previous studies. As we humans rely mainly on scene and context information when performing spoofing detection, we have been investigating approaches for exploiting contextual information in face anti-spoofing. In our initial studies, histogram of oriented gradients (HOG) descriptors were used for describing distinctive discontinuities around the detected face and determining whether a natural upper-body profile or the boundaries of the spoofing medium is detected in the scene. The proposed countermeasure improved the state of the art and showed promising generalization capabilities also in cross-database evaluation. Moreover, it is reasonable to assume that no single superior technique is able to detect all known, let alone unseen, spoofing attacks. Therefore, we have been studying how different countermeasures could be combined in order to construct a flexible network of attack-specific spoofing detectors in which new techniques can be easily integrated to patch the existing vulnerabilities in no time when new countermeasures appear. Together with the IDIAP Research Institute in Switzerland, we developed and published an open-source face anti-spoofing framework that includes several countermeasures and strategies for combining them. The same software framework was also successfully utilized in the 2nd Competition on Counter Measures to 2D Face Spoofing Attacks organized within the context of ICB 2013

We also continued our research on recognizing human demographics (e.g. age and gender) from facial images with emphasis on local binary patterns (LBP). The most significant achievement in this domain is a method called the LBP kernel density estimate. Our extensive experiments showed very promising results especially in human age estimation, but also in texture classification and face recognition. The proposed method can be seen as an alternative to the widely used histogram representation, and it has potential in situations where the number of all possible local binary patterns producible by any given LBP operator exceeds the number of pixels in the image. The method provides an efficient way for preventing sparsity, which is a common problem with LBP histograms. The method also turned out to perform well with other LBP variants, for example, with CLBP, which is among the most powerful LBP variants.

Examples of automatically estimated age categories (ground truth in parantheses).


Recognition of facial expressions and emotions

The face is the key component in understanding the emotions, and this plays significant roles in many areas, from security and entertainment to psychology and education.

We proposed a method to detect facial action units in 3D face data by combining novel geometric properties and a new descriptor based on the Local Binary Pattern (LBP) methodology. The proposed method enables person and gender independent facial action unit detection. The decision level fusion is used by employing the Random Forests classifiers to combine geometric and LBP based features. Unlike the previous methods, which suffer from the diversity among different persons and normalize features utilizing neutral faces, our method extracts features on a single 3D face data. Besides, we show that an orientation based 3D LBP descriptor can be implemented efficiently in terms of size and time without degrading the performance. We tested our method on the Bosphorus database, and presented comparative results with the existing methods. Our approach outperformed existing methods, achieving a mean receiver operating characteristic area under curve (ROC AuC) of 97.7%.

a) Sample 2D intensity data from the Bosphorus database. Facial landmarks provided by the database (0-21) and points marked in this study (22-24) are shown with red and blue circles. b) Raw 3D data. c) Filtered 3D data and d) a CS-3DLBP mapped image.


Facial expression recognition (FER) has been predominantly utilized to analyze the emotional status of human beings. In practice, nearly frontal-view facial images may not be available. Therefore, a desirable property of FER would allow the user to have any head pose. Some methods on non-frontal-view facial images were recently proposed to recognize the facial expression by building a discriminative subspace in specific views. These approaches ignore (1) the discrimination of inter-class samples with the same view label and (2) the closeness of intra-class samples with all view labels. We proposed a new method to recognize arbitrary-view facial expressions by using discriminative neighborhood preserving embedding and multi-view concepts. It first captures the discriminative property of inter-class samples. In addition, it explores the closeness of intra-class samples with an arbitrary view in a low-dimensional subspace. Experimental results on BU-3DFE and Multi-PIE databases showed that our approach achieves promising results for recognizing facial expressions with arbitrary views.

Illustration of multi-view discriminative neighborhood preserving embedding for arbitrary-view FER.


It is commonly agreed that emotions are a multimodal procedure. Combining complementary information from different modalities may increase the accuracy of emotion recognition. In the AFFECT project, funded by TEKES, we have been investigating the fusion of different modalities e.g., spontaneous facial expressions as an external channel and electroencephalogram (EEG) as an internal channel, supplementing facial expressions for more reliable emotion detection in long continuous videos.

Analysis of visual speech

Human speech perception is a bi-modal process which makes use of information not only from what we hear (acoustic) but from what we see (visual). In machine vision, visual speech recognition (VSR) is the task of recognizing the utterances through analyzing the visual recordings of a speaker’s talking mouth without any acoustic input. Although visual information cannot in itself provide normal speech intelligibility, it may be sufficient within a particular context when the utterances to be recognized are limited. In such a case, VSR can be used to enhance natural human-computer interactions through speech, especially when audio is not accessible or is severely corrupted.

Our research is focused on the extraction of a set of compact and informative visual features for VSR. To do that, the generative latent variable model is adopted to model the inter-speaker variations of visual appearances and those caused by uttering. Moreover, we propose to use a path graph to capture the temporal relationships of video frames. The low-dimensional continuous curve embedded within the graph is used as prior knowledge when constructing prior distributions of latent variables. Our method has been compared with the state-of-the-art visual features and has achieved superior results.

Graphical representation of the generative latent variable model.


Illustration of using the embedded curve as the prior knowledge to construct prior distributions of latent variables.


Visual speech can also be used for determining the identity of a person. A novel local spatiotemporal directional descriptor was proposed for speaker identification by analyzing mouth movements. For this new descriptor, the directional local binary pattern features in three orthogonal planes are coded. In addition, besides sign features, magnitude information encoded as weight for the bins with the same sign value is developed to improve the discriminative ability. Moreover, decorrelation is exploited to remove the redundancy of features. Experimental results on the challenging XM2VTS database show the effectiveness of the proposed representation for this problem.

Illustration of directional coding of sign information.


Human tracking and action analysis

Even though much work has been done for action recognition, minor efforts have been dedicated to understanding emotion from analyzing action, e.g. people’s walking. We have collected an affective gait database and designed descriptors to be robust against rotation and scale variations that occur during recording gait data in the real world while individuals are truly affected emotionally.

In order to improve the user experience with a large touchscreen, we introduced gesture interaction based on a Kinect sensor in a wall-sized touchscreen. According to the distance between the user and the display, we created two interaction modes: ‘Near-Mode’ for touch interaction; ‘Far-Mode’ for gesture interaction. With this solution, the interaction is more user-friendly and young users or users in wheelchairs are also be able to interact with the large touchscreen applications.


Two interaction modes based on distance and gesture interaction in far-mode.


As a part of the Future School Research Second Wave project, we have been developing a mobile multimodal recording system, called MORE. The MORE system is designed for observation and analysis of social interactions in real life situations, for example, to support pedagogic purposes. The MORE system provides a unique way of recording and analyzing information that consists of a 360 degree panoramic video and multiple audio channels. Furthermore, the software developed allows previewing, editing, and exporting of interesting events from the collected data. The system also provides a server backend and web interface for collaborative work. It gives the possibility to annotate both video and audio, and to store comments to be viewed by other experts. From the analysis point of view, the solution is designed to combine advanced signal analysis techniques, such as speech processing, motion activity detection, as well as face detection, tracking and recognition. The aim of these approaches is to speed up the exploration and analysis of a large material base with (semi-)automatic methods. Finally, one of the main advantages of the MORE is the ease of setup, as well as the mobile configuration when used as a carry-along device.

The mobile multimodal recording system (MORE) is designed for observation and analysis of social interactions.


Affective human-robot interaction

Development of our experimental HRI platform, Minotaurus, was continued with the support of the European Regional Development Fund (2010-2014) in collaboration with the Intelligent Systems Group. Minotaurus consists of a Segway Robotic Mobility Platform (RMP 200) and a set of laptops, Kinect sensors, video cameras, laser scanners, microphones, magnetic field sensors, a robot arm, an avatar display and a ubiquitous multi-camera environment.

During the last year of the project, the work continued to integrate various components to function together. We also developed methods of controlling the robotic arm using observations from Kinect sensor. We have successfully demonstrated that Minotaurus can recognize various objects from a table surface, has the capability to plan how to pick up the detected object, and can execute the plan by controlling the robotic arm.

Minotaurus is also capable of detecting people from a distance and can understand some of their gestures. From a closer distance, it can detect and recognize familiar faces, analyze faces to detect facial expressions and the gender of the person. It can also understand both Finnish and English voice commands, and reply using spoken sentences with the same language. While the robot speaks, the mouth movements of the avatar are synthesized to match the generated speech. Minotaurus can also understand the environment and navigate to its target while avoiding obstacles by using a combined environment model generated from all the sensors.

Minotaurus and its capabilities have been successfully demonstrated at various private and public events like, for example, during the University Science Day and a robot-themed event at the science center Tietomaa. This way, the demonstrations have been arranged not only in our robotic lab, but in real environments, and the reactions of the audiences have been entirely positive and enthusiastic.

An overview of the perceptual system of Minotaurus and an illustration of the developed interaction modalities.


Vision systems engineering

Vision systems engineering research aims to identify attractive computing approaches, architectures, and algorithms for industrial machine vision systems. In this research, solutions ranging from low-level image processing even to equipment installation and operating procedures are considered simultaneously. The roots of this expertise are in our visual inspection studies in which we met extreme computational requirements already in the early 1980’s, and we have contributed to the designs of several industrial solutions. We have also applied our expertise to applications intended for embedded mobile platforms.

The framework for a lumber tracing system was developed using 1D projection signals together with local descriptors. The product identification theme was continued, and the next logical step in the wood refinement chain was to find the link between boards and log ends. This is a very challenging problem, where a small image patch within the log end has to be matched to the board end image. The log end can randomly rotate 360 degrees and the image patch can be located almost anywhere within the log end. As with the board side images, properly selected local descriptors have shown great potential for correct matching. An example log end – board end matching result is shown below.

In the research area of interactive mobile applications, we have studied multimodal gesture controlled user interaction. The methods developed utilize the multiple computing resources present on current mobile platforms such as GPUs. The gestures are recognized from the front camera and the touch screen. With the user interface, the user can move the mouse cursor, click on objects and scroll documents. The functions provided to the user depend on the distance between the hand and the device. For this purpose, we have developed a new finger detection and tracking system, based on color and motion features.

Geometrically consistent matches between the rotated log end image and the board end image.


Our work on motion-based object detection and tracking has also continued. The aim in this work is to integrate feature based sparse motion segmentation with a sampling based motion detection and tracking framework, which would lead to efficient solutions applicable in online dynamic scene analysis tasks. The method is designed for mobile platforms and can be utilized, for example, in gesture controlled user interfaces.

In the field of energy-efficient embedded computer vision, we have implemented several variants of the LBP operator in multiple mobile and custom processors. The embedded platforms used range from multicore-ARM and mobile GPUs to TTA processors and a hybrid SIMD/MIMD image co-processor. We have compared the different implementations in terms of computational performance and energy efficiency, while analyzing the different optimizations that can be made on each platform and its different available computing resources. In addition, we have released a software package providing a valuable tool for other researchers and developers.

Two computationally intensive multimedia applications - face detection and depth estimation - were implemented and optimized for parallel processing using the Portable computing language (PoCL) implementation of Open Computing Language (OpenCL). So far, the benchmarks have been implemented on desktop CPU and GPU. An initial design of an energy efficient multicore transport triggered architecture (TTA) processor that could achieve the same performance with significantly lower energy consumption has also been implemented, but not yet benchmarked.

The Energy Efficient Architectures and Signal Processing team of CMV has been working on design automation and energy efficient computing for signal processing applications. A remarkable new opening was the initiation of a joint US-Finnish research project CREAM, together with the Centre for Wireless Communications. During the first project year, the research focus has been on dataflow modeling and energy-efficient implementation of a digital pre-distortion filter for wireless mobile transmitters. One of our doctoral students, Amanullah Ghazi, also conducted a 2-month research visit to the University of Maryland on the basis of this project with Infotech financial support. Another 2-month research visit was made by Dr. Jani Boutellier to EPFL, Switzerland focusing on the topic of dataflow programming.

In the context of video processing, a programmable, energy-efficient processor for HEVC/H.265 adaptive loop filtering was developed. This work is also to be extended to further parts of the latest H.265 video compression standard. This research topic has been carried out, heavily supported by the Academy of Finland. In general, the project creates tools for generating efficient embedded software/hardware solutions. Platform independent high-level specifications are used to describe parallelism at data, instruction, task and memory levels. The target is many-core systems that are becoming the key approach in improving computing throughput.

Biomedical image analysis

In recent years, increasing resolving power and automation of microscopic image acquisition systems have resulted in an exponential increase in microscopic data set sizes. Manual analysis of these data sets is extremely labor intensive and hampers the objectivity and reproducibility of results. There is, therefore, a growing need for automatic image processing and analysis methods. Biomedical image analysis is an emerging application area in which we have collaborated with Biocenter Oulu for few years.

We have recently started a new project called “Algorithm-based combination and analysis of multidimensional video and open data” (ABCdata), funded by Tekes. In this project, the objective is to analyze 3D microscopic image sequences, and develop tools for cell segmentation and tracking, as well as for detection of cellular events such as mitosis and apoptosis in conditions that mimic human tissues, which makes this research unique from the scientific point of view.

One of the topics we have been investigating in the project during the last year is analysis of cancer progress. The ability of a cell to sense its environment and adapt to it and its morphological appearance is a crucial element in tumor progression. Therefore, analysis of these morphology changes and cell dynamic behavior in long term living cell imaging is a critical investigation in cell biology and drug development research. To monitor and quantify the cell dynamics in cancer biology in live cell microscopy, which is comprised of long image sequences, automated image analysis solutions are needed. Therefore we have been developing computer vision/ machine learning methods to address these needs. In this project, we employ phase contrast and fluorescent images taken from 3D models in which tumors reside and interact dynamically with the surrounding matrix and fibroblast cells. Recently, we proposed an automated method, based on a learning framework, for detecting tumor cells. The proposed method, which can be employed in different applications in biomedical image analysis, is able to distinguish different cell types in cell co-cultures and does not suffer from parameter tuning.

Sample images from our database. (Upper Left) Phase contrast image of a 3D culture containing tumor (roundish) and fibroblast cells (elongated). (Upper Right) Fluorescent image of the same culture. Green Fluorescent Protein (GFP) is used to label fibroblast cells. (Lower Left) Phase contrast and fluorescent images are superimposed and interpretation of the phase contrast image is easier. (Lower Right) Our learning based probabilistic tumor detection result. The colormap indicates confidence level.


Accurate cell segmentation is a prerequisite of any detailed analysis of microscopic images. Good segmentation results can greatly simplify many analysis tasks. During the previous year we have been working with GFP labelled squamous carcinoma cells (HSC-3) embedded in the 3D collagen matrix, and image stacks have been captured with a spinning disk confocal microscope. One of the major challenges with our data is separating cells that touch. Most approaches make some assumptions about the shape of the cells (usually cells are assumed to be round and their size to be within a narrow range). These approaches do not work well when cells have very flexible shapes and can have intensity variation within their body.

We have attempted to separate touching cells using a cascade of segmentation methods. Our method works better than basic segmentation methods alone, but it is still far from solving the difficult case of a dense cell sample with non-uniform intensity within the cell and very flexible cell shapes.

Two 3D input stacks (Above) and their segmentation results (Below).


Phase-contrast illumination is simple and is the most commonly used microscopic method to observe non-stained living cells. Together with automatic cell segmentation and motion analysis tools, even single cell motility in large cell populations can be analyzed. To develop better automatic tools for analysis of low magnification phase-contrast images in time-lapse cell migration sequences, we have developed a segmentation method that relies on the intrinsic properties of maximally stable extremal regions (MSERs). In order to analyze cell migration characteristics in time-lapse movies, MSER-based automatic cell detection was combined with our own Kalman filter based multi-object tracker that efficiently tracked individual cells even in confluent cell populations. The research was conducted in cooperation with Biocenter Oulu and the University of Jyväskylä. The results have been reported in a joint article recently published in the Journal of Microscopy.

Exploitation of Results

Many researchers have adopted and further developed our methodologies. Our research results are used in a wide variety of different applications around the world. For example, the Local Binary Pattern methodology and its variants are used in numerous image analysis tasks and applications, such as biomedical image analysis, biometrics, industrial inspection, remote sensing and video analysis. The researchers in CMV have actively published the source codes of their algorithms for the research community, and this has increased the exploitation of the results.

The results have also been utilized in our own projects. For example, we have collaborated with Prof. Tapio Seppänen’s Biomedical Engineering Group in the area of multimodal emotion recognition for affective computing, combining vision with physiological biosignals. Together with Prof. Seppänen and Dr. Seppo Laukka (Department of Educational Sciences and Teacher Education) and Prof. Matti Lehtihalmes (Faculty of Humanities) we have participated in the FSR Second Wave project where we have developed a Mobile Multimodal Recording System (MORE) that is now actively used in classroom research in various schools.

Most of our funding for both basic and applied research comes from public sources such as the Academy of Finland and Tekes, but besides these sources, CMV also conducts research by contract which is funded by companies. In this way, our expertise is being utilized by industry for commercial purposes, and even in consumer products, like mobile devices.

The CMV has actively encouraged and supported the birth of research group spin-outs. This gives an opportunity for young researchers to start their own teams and groups. Side results are the spin-out enterprises. According to our experience, their roots are especially in the strands of “free academic research”. There are currently altogether five research based spin-outs founded directly on the machine vision area. The number of spin-outs could be extended up to sixteen when taking into account the influence of the CMV´s thirty-year old history and the spin-out companies from the spin-out research groups in the area of computer science and engineering in total.

Future Goals

The very positive results obtained from the RAE 2013 and Infotech Oulu evaluations show that we are on the right track. We plan to carry out well focused cutting-edge research, for example, on novel image and video descriptors, perceptual interfaces for face to face interaction, multimodal analysis of emotions, 3D computer vision, and energy-efficient architectures for embedded vision systems. We also have plans to further deepen our collaboration with international and domestic partners. For this purpose, we are participating in new European project proposals. Close interaction between basic and applied research has always been a major strength of our research unit. The scientific output of the CMV has been increasing significantly in recent years. With this we expect to have much new potential for producing novel innovations and exploitation of research results in collaboration with companies and other partners.


professors, doctors


doctoral students






person years


External Funding



Academy of Finland

834 000

Ministry of Education and Culture

198 000


732 000

domestic private

80 000


102 000


1 946 000


Doctoral Theses

Guo Y (2013) Image and video analysis by local descriptors and deformable image registration. Acta Univ Oul C 451.

Remes J (2013) Method evaluations in spatial exploratory analyses of resting-state functional magnetic resonance imaging data. Acta Univ Oul C 468.

Sangi P (2013) Object motion estimation using block matching with uncertainty analysis. Acta Univ Oul C 443.

Selected Publications

Bustard JD, Carter JN, Nixon MS & Hadid A (2014) Measuring and mitigating targeted biometric impersonation. IET Biometrics, accepted.

Kaakinen M, Huttunen S, Paavolainen L, Marjomäki V, Heikkilä J & Eklund L (2014) Automatic detection and analysis of cell motility in phase-contrast time-lapse images using a combination of maximally stable extremal regions and Kalman filter approaches. Journal of Microscopy, 253(1):65-78.

Lei Z, Pietikäinen M & Li SZ (2014) Learning discriminant face descriptor. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(2):289-302.

Nyländen T, Boutellier J, Nikunen K, Hannuksela J & Silvén O (2014) Low-power reconfigurable miniature sensor nodes for condition monitoring. International Journal of Parallel Programming, accepted.

Pereira TdF, Komulainen J, Anjos A, De Martino JM, Hadid A, Pietikäinen M & Marcel S (2014) Face liveness detection using dynamic texture. EURASIP Journal on Image and Video Processing, 2014:2.

Yan WJ, Li X, Wang SJ, Zhao G, Liu YJ, Chen YH & Fu X (2014) CASME II: An improved spontaneous micro-expression database and the baseline evaluation. PLoS ONE, 9(1):e86041.

Zhou Z, Hong X, Zhao G & Pietikäinen M (2014) A compact representation of visual speech data using latent variables. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(1):181-187.

Abdelaziz M, Ghazi A, Anttila L, Boutellier J, Lähteensuo T, Lu X, Cavallaro JR, Bhattacharyya SS, Juntti M & Valkama M (2013) Mobile transmitter digital predistortion: feasibility analysis, algorithms and design exploration. Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, CA, USA, in press.

Akhtar Z, Rattani A, Hadid A & Tistarelli M (2013) Face recognition under ageing effect: a comparative analysis. In: Image Analysis and Processing, ICIAP 2013 Proceedings, Lecture Notes in Computer Science, 8157:309-318.

Bayramoglu N, Zhao G & Pietikäinen M (2013) CS-3DLBP and geometry based person independent 3D facial action unit detection. Proc. IAPR International Conference on Biometrics (ICB 2013), Madrid, Spain, 6 p.

Blaschko M, Kannala J & Rahtu E (2013) Non maximal suppression in cascaded ranking models. In: Image Analysis, SCIA 2013 Proceedings, Lecture Notes in Computer Science, 7944:408-419.

Borji A, Rezazadegan Tavakoli H, Sihite DN & Itti L (2013) Analysis of scores, datasets, and models in visual saliency modeling. Proc. International Conference on Computer Vision (ICCV 2013), Sydney, Australia, 921-928.

Boutellaa E, Bengherabi M, Boulkenafet Z, Harizi F & Hadid A (2013) Face verification using local binary patterns generic histogram adaptation and chi-square based decision. Proc. 4th European Workshop on Visual Information Processing (EUVIP 2013), Paris, France, 142-147.

Boutellaa E, Harizi F, Bengherabi M, Ait-Aoudia S & Hadid A (2013) Face verification using local binary patterns and maximum a posteriori vector quantization model. In: Visual Computing, ISVC 2013 Proceedings, Lecture Notes in Computer Science, 8033:539-549.

Boutellier J & Silvén O (2013) Towards generic embedded multiprocessing for RVC-CAL dataflow programs. Journal of Signal Processing Systems, 73(2):137-142.

Boutellier J, Ersfolk J, Ghazi A & Silvén O (2013) High-performance programs by source-Level merging of RVC-CAL dataflow actors. IEEE Workshop on Signal Processing Systems, Taipei, Taiwan, 360-365.

Boutellier J, Raulet M & Silvén O (2013) Automatic hierarchical discovery of quasi-static schedules of RVC-CAL dataflow programs. Journal of Signal Processing Systems, 71(1):35-40.

Chan CH, Tahir M, Kittler J & Pietikäinen M (2013) Multiscale local phase quantisation for robust component-based face recognition using kernel fusion of multiple descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(5):1164-1177.

Chen J, Zhao G, Salo M, Rahtu E & Pietikäinen M (2013) Automatic dynamic texture segmentation using local descriptors and optical flow. IEEE Transactions on Image Processing, 22(1):326-339.

Chen J, Kellokumpu V, Zhao G & Pietikäinen M (2013) RLBP: Robust local binary pattern. Proc. the British Machine Vision Conference (BMVC 2013), Bristol, UK, 10 p.

Chingovska I, Yang J, Lei Z, Yi D, Li SZ, Kähm O, Glaser C, Damer N, Kuijper A, Nouak A, Komulainen J, Pereira T & et al. (2013) The 2nd Competition on Counter Measures to 2D Face Spoofing Attacks. Proc. IAPR International Conference on Biometrics (ICB 2013), Madrid, Spain, 6 p.

Feng X, Lai Y, Peng J, Mao X, Peng J, Jiang X & Hadid A (2013) Extracting local binary patterns from image key points: application to automatic facial expression recognition. In: Image Analysis, SCIA 2013 Proceedings, Lecture Notes in Computer Science, 7944:339-348.

Ghahramani M, Zhao G & Pietikäinen M (2013) Incorporating texture intensity information into LBP-based operators. In: Image Analysis, SCIA 2013 Proceedings, Lecture Notes in Computer Science, 7944:66-75.

Ghazi A, Boutellier J, Hannuksela J, Silvén O & Janhunen J (2013) Low-complexity SDR implementation of IEEE 802.15.4 (ZigBee) baseband transceiver on application specific processor. Proc. SDR’13 WInnComm, Washington, DC, USA.

Ghazi A, Boutellier J, Hannuksela J, Silvén O & Shahabuddin S (2013) Programmable implementation of zero-crossing demodulator on an application specific processor. IEEE Workshop on Signal Processing Systems, Taipei, Taiwan, 231-236.

Ghiani L, Hadid A, Marcialis G & Roli F (2013) Fingerprint liveness detection using binarized statistical image features. Proc. IEEE Sixth International Conference on Biometrics: Theory, Applications and Systems (BTAS 2013), Arlington, VA, USA, 1-6.

Guo Y, Zhao G & Pietikäinen M (2013) Local configuration features and discriminative learnt features for texture description. In: Local Binary Patterns: New Variants and Applications (Eds. S Brahnam, LC Jain, L Nanni & A Lumini), Springer, 113-129.

Guo Y, Zhao G, Zhou Z & Pietikäinen M (2013) Video texture synthesis with multi-frame LBP-TOP and diffeomorphic growth model. IEEE Transactions on Image Processing, 22(10):3879-3891.

Hadid A & Pietikäinen M (2013) Demographic classification from face videos using manifold learning. Neurocomputing, 100:197-205.

Hadid A, Ghahramani M & Nixon M (2013) Improving gait biometrics under spoofing attacks. In: Image Analysis and Processing, ICIAP 2013 Proceedings, Lecture Notes in Computer Science, 8157:1-10.

Hadid A, Ylioinas J, Ghahramani M, Taleb-Ahmed A & Bengherabi M (2013) Review of recent local binary pattern variants with application to gender classification from still images. Proc. International Conference on Signal, Image, Vision and their Applications (SIVA 2013), Guelma, Algeria, 6 p.

Hautala I, Boutellier J & Hannuksela J (2013) Programmable low power implementation of the HEVC adaptive loop filter. Proc. The 38th IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), Vancouver, Canada, 2664-2668.

Heikkilä J, Rahtu E & Ojansivu V (2013) Local phase quantization for blur insensitive texture description. In: Local Binary Patterns: New Variants and Applications (Eds. S Brahnam, LC Jain, L Nanni & A Lumini), Springer, 49-84.

Herrera Castro D, Kannala J, Ladicky L & Heikkilä J (2013) Depth map inpainting under a second-order smoothness prior. In: Image Analysis, SCIA 2013 Proceedings, Lecture Notes in Computer Science, 7944:555-566.

Herrera Castro D, Kannala J, Sturm P & Heikkilä J (2013) A learned joint depth and intensity prior using Markov random fields. Third Joint 3DIM/3DPVT Conference (3DV 2013), 17-24.

Heulot J, Boutellier J, Pelcat M, Nezan J-F & Aridhi S (2013) Applying the adaptive hybrid flow-shop scheduling method to schedule a 3GPP LTE physical layer algorithm onto many-core digital signal processors. Proc. NASA/ESA Conference on Adaptive Hardware and Systems (AHS 2013), Turin, Italy, 123-129.

Hietaniemi R, Varjo S & Hannuksela J (2013) A machine vision based lumber tracing system. Proc. 8th International Conference on Computer Vision Theory and Applications (VISAPP), Barcelona, Spain, 2:98-103.

Holappa J, Heikkinen T & Roininen E (2013) Martians from Outer Space experimenting with location-aware cooperative multiplayer gaming on public displays. Proc. 12th International Conference on Mobile and Ubiquitous Multimedia (MUM 2013), Luleå, Sweden, 10 p.

Hong X, Zhao G, Ren H & Chen X (2013) Efficient boosted weak classifiers for object detection. In: Image Analysis, SCIA 2013 Proceedings, Lecture Notes in Computer Science, 7944:205-214.

Huang X, Zhao G & Pietikäinen M (2013) Emotion recognition from facial images with arbitrary views. Proc. the British Machine Vision Conference (BMVC 2013), Bristol, UK, 11 p.

Huang X, Zhao G, Hong X, Pietikäinen M & Zheng W (2013) Texture description with completed local quantized patterns. In: Image Analysis, SCIA 2013 Proceedings, Lecture Notes in Computer Science, 7944:1-10.

Komulainen J, Anjos A, Hadid A, Marcel S & Pietikäinen M (2013) Complementary countermeasures for detecting scenic face spoofing attacks. Proc. IAPR International Conference on Biometrics (ICB 2013), Madrid, Spain, 7 p.

Komulainen J, Hadid A & Pietikäinen M (2013) Face spoofing detection using dynamic texture. In: ACCV 2012 Workshops, Part I (LBP 2012), Lecture Notes in Computer Science, 7728:146-157.

Komulainen J, Hadid A & Pietikäinen M (2013) Context based face anti-spoofing. Proc. the IEEE Sixth International Conference on Biometrics: Theory, Applications and Systems (BTAS 2013), Washington, DC, 8 p.

Kyöstilä T, Herrera Castro D, Kannala J & Heikkilä J (2013) Merging overlapping depth maps into a nonredundant point cloud. SCIA 2013 Proceedings, Lecture Notes in Computer Science, 7944:567-578.

Li X, Pfister T, Huang X, Zhao G & Pietikäinen M (2013) A spontaneous micro-expression database: Inducement, collection and baseline. Proc. IEEE International Conference on Face and Gesture Recognition (FG 2013), 6 p.

Linder E, Grote A, Varjo S, Linder N, Lebbad M, Lundin M, Diwan V, Hannuksela J & Lundin J (2013) On-chip imaging of schistosoma haematobium eggs in urine for diagnosis by computer vision. PLoS Neglected Tropical Diseases, 7(12):e2547.

Liu H, Wang Z, Wang X, Zhao G & Qian Y (2013) Adaptive scene segmentation and obstacle detection for the blind. Journal of Computer-Aided Design & Computer Graphics (in Chinese), 25(12):1818-1825.

Lizarraga-Morales R, Guo Y, Zhao G & Pietikäinen M (2013) Dynamic texture synthesis in space with a spatio-temporal descriptor. In: ACCV 2012 Workshops, Part I (LBP 2012), Lecture Notes in Computer Science, 7728:38-49.

Matilainen M, Hannuksela J & Fan L (2013) Finger tracking for gestural interaction in mobile devices. In: Image Analysis, SCIA 2013 Proceedings, Lecture Notes in Computer Science, 7944:329-338.

Min R, Hadid A, & Dugelay JL (2013) Efficient detection of occlusion prior to robust face recognition. The Scientific World Journal, Article ID 519158, 10 p.

Ojansivu V, Linder N, Rahtu E, Pietikäinen M, Lundin M, Joensuu H & Lundin J (2013) Automated classification of breast cancer morphology in histopathological images. Diagnostic Pathology 2013, 8 (Suppl. 1):S29.

Pedone M, Flusser J & Heikkilä J (2013) Blur invariant translational image registration for N-fold symmetric blurs. IEEE Transactions on Image Processing, 22(9):3676-3689.

Pietikäinen M (2013) Texture recognition. In: Encyclopedia of Computer Vision (ed. K. Ikeuchi), Springer, in press, DOI 10.1007/978-0-387-31439-6.

Pietikäinen M, Turk M, Wang L, Zhao G & Cheng L (2013) Editorial: Special Section on Machine Learning in Motion Analysis: New Advances. Image and Vision Computing, 31(6-7):419-420.

Rezazadegan Tavakoli H, Rahtu E & Heikkilä J (2013) Temporal saliency for fast background subtraction. In: ACCV 2012 Workshops, Part I (BMC 2012), Lecture Notes in Computer Science, 7728:321-326.

Rezazadegan Tavakoli H, Rahtu E & Heikkilä J (2013) Saliency detection using joint temporal and spatial decorrelation. In: Image Analysis, SCIA 2013 Proceedings, Lecture Notes in Computer Science, 7944:707-717.

Rezazadegan Tavakoli H, Rahtu E & Heikkilä J (2013) Stochastic bottom-up fixation prediction and saccade generation. Image and Vision Computing, 31(9):686-693.

Rezazadegan Tavakoli H, Rahtu E & Heikkilä J (2013) Spherical center-surround for video saliency detection using sparse sampling. In: Advanced Concepts for Intelligent Vision Systems, ACIVS 2013 Proceedings, Lecture Notes in Computer Science, 8192:695-704.

Rezazadegan Tavakoli H, Shahram Moin M & Heikkilä J (2013) Local similarity number and its application to object tracking. International Journal of Advanced Robotic Systems, 10(184), 7 p.

Rintaluoma T & Silven O (2013) Lightweight resource estimation model to extend battery life in video playback. Proc. International Conference on Embedded Computer Systems (SAMOS 2013), 96-103.

Ruiz Hernandez JA, Crowley JL, Lux A & Pietikäinen M (2013) Histogram-tensorial Gaussian representations and its applications to facial analysis. In: Local Binary Patterns: New Variants and Applications (Eds. S Brahnam, LC Jain, L Nanni & A Lumini), Springer, 245-268.

Ruiz-Hernandez J & Pietikäinen M (2013) Encoding local binary patterns using the re-parametrization of the second order Gaussian jet. Proc. IEEE International Conference on Face and Gesture Recognition (FG 2013), 6 p.

Sangi P, Hannuksela J, Heikkilä J & Silvén O (2013) Sparse motion segmentation using propagation of feature labels. Proc. 8th International Conference on Computer Vision Theory and Applications (VISAPP), 2:396-401.

Shahabuddin S, Janhunen J, Bayramoglu MF, Juntti MJ, Ghazi A & Silven O (2013) Design of a unified transport triggered processor for LDPC/turbo decoder. Proc. International Conference on Embedded Computer Systems (SAMOS 2013), 288-295.

Tresadern P, Cootes TF, Poh N, Matejka P, Hadid A, Levy C, McCool C & Marcel S (2013) Mobile biometrics: Combined face and voice verification for a mobile platform. IEEE Pervasive Computing, 12(1):79-87.

Varjo S & Hannuksela J (2013) A mobile imaging system for medical diagnostics. In: Advanced Concepts for Intelligent Vision Systems, ACIVS 2013 Proceedings, Lecture Notes in Computer Science , 8192:215-226.

Wang L-H, Vosoughi A, Bhattacharyya S S, Cavallaro J R, Juntti M, Valkama M, Silven O & Boutellier J (2013) Dataflow modeling and design for cognitive radio networks. Proc. International Conference on Cognitive Radio Oriented Wireless Networks and Communications, Washington DC, United States.

Ylioinas J, Hadid A, Guo Y & Pietikäinen M (2013) Efficient image appearance description using dense sampling based local binary patterns. In: ACCV 2012 Proceedings, Part III, Lecture Notes in Computer Science, 7726:375-388.

Ylioinas J, Hadid A, Hong X & Pietikäinen M (2013) Age estimation using local binary pattern kernel density estimate. In: Image Analysis and Processing, ICIAP 2013 Proceedings, Lecture Notes in Computer Science, 8156:141-150.

Ylioinas J, Hong X & Pietikäinen M (2013) Constructing local binary pattern statistics by soft voting. In: Image Analysis, SCIA 2013 Proceedings, Lecture Notes in Computer Science, 7944:119-130.

Yviquel H, Boutellier J, Raulet M & Casseau E (2013) Automated design of networks of Transport-Triggered Architecture processors using dynamic dataflow programs. Signal Processing: Image Communication (Special Issue on Reconfigurable Media Coding), 28(10): 1295-1302.

Zhai Y, Zhao G, Alatalo T, Heikkilä J, Ojala T & Huang Xinyuan (2013) Gesture interaction for wall-sized touchscreen display. Proc. 2013 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp 2013), 175-178.

Zhao G & Pietikäinen M (2013) Visual speaker identification with spatiotemporal directional features. In: Image Analysis and Recognition, ICIAR 2013 Proceedings, Lecture Notes in Computer Science, 7950:1-10.

Last updated: 26.2.2015