Infotech Oulu Annual Report 2016 - Center for Machine Vision Research (CMV)

Professor Matti Pietikäinen, Professor Janne Heikkilä, Professor Olli Silvén, Associate Professor Guoying Zhao, and Adjunct Professor Abdenour Hadid
Faculty of Information Technology and Electrical Engineering, University of Oulu
mkp(at)ee.oulu.fi, jth(at)ee.oulu.fi, olli(at)ee.oulu.fi, gyzhao(at)ee.oulu.fi, hadid(at)ee.oulu.fi
http://www.oulu.fi/cmvs

Background and Mission

Machine Vision Group (MVG) is renowned world-wide for its scientific breakthroughs in machine vision. Many of its results, including the Local Binary Pattern, face analysis and geometric camera calibration methodologies, are highly cited and have been adopted for different types of problems and applications around the world.  The unit is internationally attractive, with one visiting FiDiPro Professor and one Fellow, several visiting scholars and an extensive international collaboration network, enabling a large number of joint publications in leading forums. 

The main research interests of MVG are in computer vision and machine learning, affective computing, multimodal image and signal analysis, low-energy computing, and applications in biomedical image analysis and intelligent human-computer interaction. Its main inter/cross disciplinary scientific collaborators are experts in medicine, biosciences, cognitive sciences, psychology and learning sciences. 

In spring 2017, the staff of MVG consists of three Professors, one Associate Professor, one FiDiPro Professors and one FiDiPro Fellow, and 16 senior or postdoctoral researchers (including two Academy Research Fellows), as well as 18 doctoral students or research assistants.

Scientific Progress

The current research of MVG can be divided following core areas: Image and video descriptors, Multimodal face biometrics, 3D vision, Multimodal emotion analysis, Low-energy computing, and Biomedical image analysis.

Highlights and Events in 2016

In 2016, two FiDiPro projects funded by Tekes – The Finnish Funding Agency for Innovation, were started. Two distinguished computer vision scientists, Jiri Matas (Czech Technical University in Prague) and Stefanos Zafeiriou (Imperial College London), will visit the University of Oulu on a regular basis in 2016-2019 and contribute to a joint research agenda. Both FiDiPro projects will also benefit a set of companies as project partners.

On the other hand, the four year term of  FiDiPro Professor Xilin Chen ended in June 2016.

MVG celebrated its 35th Anniversary with a seminar on Friday 16.12.2016 in Saalasti Hall at University of Oulu, see http://www.oulu.fi/cmvs/node/44106. The seminar consisted of invited talks given by experts who have achieved their PhD from the research group. In addition, talks were given by representatives of MVG’s central international and domestic collaborators Professor Rama Chellappa (University of Maryland, USA), Professor Erkki Oja (Aalto University) and Professor Jussi Parkkinen (University of Eastern Finland).

Professor Rama Chellappa (University of Maryland, USA) was one of the keynote speakers in the IPTA conference.

The sixth International Conference on Image Processing Theory, Tools and Applications (IPTA) was arranged  by MVG in Oulu on December 12-15. Matti Pietikäinen was the general chair and Abdenour Hadid program co-chair. The keynote talks were given by Prof. Rama Chellappa (University of Maryland), Maja Pantic (Imperial College London) and Karen Egiazarian (Tampere University of Technology). The tutorials were presented by Prof. Jiri Matas (Czech Technical University) and Dr. Stefanos Zafeiriou (Imperial College London). The conference was a great success gathering more than 130 participants from more than 31 different countries.

In 2016, MVG was awarded the Five Year Highest Impact Award at BTAS 2016 conference  held in Niagara Falls, USA. The awarded paper titled ”Face Spoofing Detection From Single Images  Using Micro-Texture Analysis” was written in 2011, and is co-authored by Jukka Komulainen, Abdenour Hadid and Matti Pietikäinen. Three years ago the journal extension of the paper got an IET 2013 Premium Award.

MVG members have been active in co-editing special issues in prestigious journals and co-organizing summer schools and international workshops: Dr. G. Zhao and Dr. S. Zafeiriou, together with Dr. I. Kotsia (from Middlesex University, UK), Dr. M. Nicolaou (Goldsmiths University of London, UK) and Prof. J. Cohn (from University of Pittsburgh/CMU, USA) have been co-editing the Special Issue on Human Behavior Analysis “in-the-wild” for IEEE Transactions on Affective Computing; Dr. G. Zhao and Prof. Heikkilä have co-organized Finnish DENIS summer school on Affective Computing; Dr. Z. Zhou and Dr. G. Zhao, together with Prof. R. Bowden (from University of Surrey, UK)
and Prof. T. Saitoh (Kyushu Institute of Technology, Japan) co-organized the workshop challenge on “Multi-view lipreading challenge” with ACCV 2016; Dr. X. Hong, Dr. G. Zhao, Dr. S. Zafeiriou and Prof. M. Pietikäinen, together with Prof. M. Pantic (from Imperial College London, UK) co-organized the 2nd workshop on “Spontaneous Facial Behavior Analysis” with ACCV 2016; Dr. J. Chen, Dr. G. Zhao and Prof. M. Pietikäinen, together with Prof. Z. Lei (from Institute of Automation, Chinese Academy of Sciences) and Dr. L. Liu (National University of Defense Technology, China) co-organized the workshop on “Robust Features for Computer Vision” with CVPR 2016; Dr. G. Zhao and Dr. S. Zafeiriou together with other researchers from different universities co-organized the workshop on Context-Based Affect Recognition and Affective Face “in-the-wild with CVPR 2016.

Prof. Heikkilä was appointed as a Senior Editor of Journal of Electronic Imaging in 2016. He also received a four-year research grant from the Academy of Finland for 3D vision based computer-mediated reality.

Image and Video Descriptors

Local Binary Patterns (LBP) have emerged as one of the most prominent and widely studied local texture descriptors. Truly a large number of LBP variants has been proposed, to the point that it can become overwhelming to grasp their respective strengths and weaknesses, and there is a need for a comprehensive study regarding the prominent LBP-related strategies. New types of descriptors based on multistage convolutional networks and deep learning have also emerged. In different papers the performance comparison of the proposed methods to earlier approaches is mainly done with some well-known texture datasets, with differing classifiers and testing protocols, and often not using the best sets of parameter values and multiple scales for the comparative methods. Very important aspects such as computational complexity and effects of poor image quality are often neglected.

In collaboration with D. Li Liu and her colleagues, we provided a systematic review of current LBP variants and proposed a taxonomy to more clearly group the prominent alternatives. Merits and demerits of the various LBP features and their underlying connections were also analyzed. We performed a large scale performance evaluation for texture classification, empirically assessing forty texture features including thirty two recent most promising LBP variants and eight non-LBP descriptors based on deep convolutional networks on thirteen widely-used texture datasets. The experiments were designed to measure their robustness against different classification challenges, including changes in rotation, scale, illumination, viewpoint, number of classes, different types of image degradation, and computational complexity. The best overall performance was obtained for the Median Robust Extended Local Binary Pattern (MRELBP) feature. For textures with very large appearance variations, Fisher vector pooling of deep Convolutional Neural Networks is clearly the best, but at the cost of very high computational complexity. The sensitivity to image degradations and computational complexity are among the key problems for most of the methods considered.

Image registration is one of the most important and most frequently discussed topic in the image processing literature, and it is a crucial preliminary step in all the algorithms in which the final result is obtained from a fusion of several images, e.g. multichannel image deblur, super-resolution, depth-of-field extension, etc.  In many cases, the images to be registered are inevitably blurred. We developed a novel method to match templates between images that is specifically insensitive to Gaussian blur. Degradations caused by Gaussian blur are frequently observed in real scenarios, for instance as a result of the effect of atmospheric turbulence. We successfully used Gaussian blur-invariant template matching to perform non-rigid registration between consecutive frames of astronomical video sequences, and we were able to stabilize the typical trembling motion caused by atmospheric turbulence.

  

Two consecutive frames of an astronomical video sequence. Atmospheric turbulence causes both Gaussian blur and non-rigid slight deformations. Gaussian blur-invariant template matching can be used to register the images and stabilize the trembling motion in the video sequence.

The Local Binary Pattern histograms from Three Orthogonal Planes (LBP-TOP) has shown promising performance in extracting spatial-temporal information for many video analysis tasks such as facial expression recognition as well as human activity analysis, as it extracts features from dynamic textures. Originally, calculation of LBP-TOP has to traverse all pixels to compute the unit operation (namely the 2D LBP operator) along XY, YT and XT planes respectively, the frequent use of nested loops in implementation shapely increases the computational costs. Recently, we improve the computation efficiency of LBP-TOP descriptors by reinterpreting the LBP-TOP descriptors as a 3-order tensor. We apply tensor unfolding from three-dimensional space to two-dimensional space for acceleration. So a video clip is regarded as a 3-order tensor, and we use the tensor unfolding method to obtain three two-dimensional concatenated matrices, where the basic LBP is then performed. Demand for loops in implementation is largely down, and thus the computational cost is substantially reduced. The computational time can be further saved, as parallel computations can be performed on two-dimensional matrices. We compared the computational time of the original LBP-TOP implementation and our fast LBP-TOP implementation on both synthetic and real data. The results show that our fast LBP-TOP implementation is quite time-saving (over 20 times on average) than the original one. The implementation codes of the proposed fast LBP-TOP can be downloaded at http://www.ee.oulu./research/imag/cmvs/les/code/Fast LBPTOP Code.zip

Visualization of the unfolding of a 3-order tensor.

 

Multimodal Face Biometrics

We have continued our intensive research on kinship verification from faces. Kinship verification consists in automatically predicting whether two persons have a biological kin relation by examining their facial attributes. While most of the existing works extract shallow handcrafted features from still face images, we approached this problem from spatio-temporal point of view and explored the use of both shallow texture features and deep features for characterizing faces. Promising results, especially those of deep features, were obtained on the benchmark UvA-NEMO Smile database. Our extensive experiments also showed the superiority of using videos over still images, hence pointing out the important role of facial dynamics in kinship verification. Furthermore, the fusion of the two types of features (i.e. shallow spatio-temporal texture features and deep features) showed significant performance improvements compared to state-of-the-art methods. In another work, we investigated for the first time the usefulness of color information in the verification of kinship relationships from facial images. For this purpose, we extracted joint color-texture features to encode both the luminance and the chrominance information in the color images. The kinship verification performance using joint color-texture analysis was then compared against counterpart approaches using only gray-scale information. Extensive experiments using different color spaces and texture features were conducted on two benchmark databases. Our extensive experimental results indicated that classifying color images consistently shows superior performance in three different color spaces. Finally, we noted and demonstrated that some recent kinship datasets were biased and should not be used as a benchmark for the evaluation of kinship verification algorithms.

The proposed approach exploring the use of both shallow texture features and deep features for characterizing faces.

Audiovisual speech synchrony detection is an important liveness check for talking face verification systems to make sure that the (pre-defined) content and timing of the given audible and visual speech samples match. The state-of-the-art real-time voice conversion techniques are capable of fooling both humans and automatic systems. Furthermore, the recent advances in facial reenactment have enabled real-time re-rendering the facial expressions and visual speech of the source actor on top of a video stream of the targeted person in a photo-realistic manner such that it seamlessly blends even with the real-world illumination. As a consequence, there exists nowadays virtually no technical limitations for combining transferable facial animation and voice conversion (or synthesis) to create an interactive audiovisual artifact that is able to spoof even advanced random challenge-response based liveness detection.

Sample pairs of original and corresponding synthesized visual speech video frames generated with underarticulated generative model based (top) and concatenative visual speech animation.

We investigated the performance of the state-of-the-art text-independent lip-sync detection techniques under presentation attacks consisting of original audio recordings of the targeted person and corresponding animated visual speech. Our experimental analysis with three different photo-realistic visual speech animation techniques revealed that generic synchrony models can be fooled even with underarticulated but synchronized lip movements. Thus, measuring audio-video synchrony or content alone is not enough for securing audiovisual biometric systems. Intuitively, the source actor or model directing the facial reenactment or voice conversion (or synthesis) process is unlikely to be able to mimic the speaking style of the targeted person. Our preliminary findings suggest that client-specific audiovisual speech synchrony models are indeed robust to high-effort attacks like animated visual speech. Since time-consuming data collection during enrollment phase is undesirable property for biometric systems, a generic lip-sync model could be gradually tuned with data captured during successful verification attempts of the user.

Face biometric systems are vulnerable to spoofing attacks. Such attacks can be performed in many ways, including presenting a falsified image, video or 3D mask of a valid user. A widely used approach for differentiating genuine faces from fake ones has been to capture their inherent differences in (2D or 3D) texture using local descriptors. One limitation of these methods is that they may fail if an unseen attack type, e.g. a highly realistic 3D mask which resembles real skin texture, is used in spoofing. The following figure shows an example of low quality mask (left) compared with a highly realistic 3D mask (right).

Comparison of a low quality mask (left) with 3D printing artifacts (shown in the enlarged region), and a high quality mask (right) with skin-like texture (shown in the enlarged region).

We proposed a robust anti-spoofing method by detecting pulse from face videos. Based on the fact that a pulse signal exists in a real living face but not in any mask or print material, the method could be a generalized solution for face liveness detection. The proposed method is evaluated first on a 3D mask spoofing database 3DMAD to demonstrate its effectiveness in detecting 3D mask attacks. More importantly, our cross-database experiment with high quality REAL-F masks shows that the pulse based method is able to detect even the previously unseen mask type whereas texture based methods fail to generalize beyond the development data. Finally, we propose a robust cascade system combining two complementary attack-specific spoof detectors, i.e. utilize pulse detection against print attacks and color texture analysis against video attacks.

The framework of our proposed anti-spoofing method basing on pulse detection from face.

We also continued our rEdit Linkesearch on face spoofing detection using color texture analysis. This is motivated by the fact that research on non-intrusive software-based face spoofing detection schemes has been mainly focused on the analysis of the luminance information of the face images, hence discarding the chroma component, which can be very useful for discriminating fake faces from genuine ones. Hence, we introduced a novel and appealing approach for detecting face spoofing using a color texture analysis. We exploited the joint color-texture information from the luminance and the chrominance channels by extracting complementary low-level feature descriptions from different color spaces. More specifically, the feature histograms were computed over each image band separately. Extensive experiments on the three most challenging benchmark data sets showed excellent results compared with the state of the art. More importantly, unlike most of the methods proposed in the literature, our proposed approach was able to achieve stable performance across all the three benchmark data sets. The promising results of our cross-database evaluation suggest that the facial color texture representation is more stable in unknown conditions compared with its gray-scale counterparts. Furthermore, we addressed for the first time the key problem of the variation in the input image quality and resolution in face anti-spoofing. In contrast to most existing works aiming at extracting multiscale descriptors from the original face images, we derived a new multiscale space to represent the face images before texture feature extraction. The new multiscale space representation was derived through multiscale filtering. Three multiscale filtering methods were considered including Gaussian scale space, Difference of Gaussian scale space and Multiscale Retinex. Extensive experiments on three challenging and publicly available face anti-spoofing databases demonstrated the effectiveness of our proposed multiscale space representation in improving the performance of face spoofing detection based on gray-scale and color texture descriptors.

3D Vision

Structure from motion algorithms have an inherent limitation that the reconstruction can only be determined up to the unknown scale factor. Modern mobile devices are equipped with an inertial measurement unit (IMU), which can be used for estimating the scale of the reconstruction. We propose a method that recovers the metric scale given inertial measurements and camera poses. In the process, we also perform a temporal and spatial calibration of the camera and the IMU. Therefore, our solution can be easily combined with any existing visual reconstruction software. The method can cope also with noisy camera pose estimates, typically caused by motion blur or rolling shutter artifacts. In the experiments, we show that the algorithm outperforms the state-of-the-art in both accuracy and convergence speed of the scale estimate. The accuracy of the scale is typically around 1% from the ground truth.

Metric scale of the reconstruction can be recovered using the algorithm.

We have developed a batch-based approach for robust reconstruction of scene structure and camera motion. A key part of the method is robust loop closure disambiguation using evidences from visual landmarks and initial raw odometry to perform drift correction. The most essential component is the energy function for global optimization which can faithfully reject wrong visual correspondences (arising from repetitive scenes like pictures, doors commonly found in an indoor environment) and simultaneously perform globally consistent loop closures.

We have also been working on depth map fusion that could be used for merging depth maps produced by the second generation Kinect (Kinect V2). We have improved our previous work by using three extensions. Pre- and post-filtering steps remove majority of the outliers in depth maps and some incorrect or misplaced measurements remaining in the final point cloud. Re-alignment of the covariances, which are used to measure the uncertainty of points, leads to better refinement of the point locations inside the fusion algorithm. With these extensions both accuracy and robustness of the fusion process can be significantly improved.

Multimodal Emotion Analysis

We have worked on a new dynamic facial expression recognition method for emotion analysis. Dynamic facial expression recognition is formulated as a longitudinal groupwise registration problem. The main contributions of this method lie in the following aspects: (1) subject-specific facial feature movements of different expressions are described by a diffeomorphic growth model; (2) salient longitudinal facial expression atlas is built for each expression by a sparse groupwise image registration method, which can describe the overall facial feature changes among the whole population and can suppress the bias due to large inter-subject facial variations; (3) both the image appearance information in spatial domain and topological evolution information in temporal domain are used to guide recognition by a sparse representation method. The proposed framework has been extensively evaluated on five databases for different applications: the extended Cohn-Kanade, MMI, FERA, and AFEW databases for dynamic facial expression recognition, and UNBC-McMaster database for spontaneous pain expression monitoring. This framework is also compared with several state-of-the-art dynamic facial expression recognition methods. The experimental results demonstrate that the recognition rates of the new method are consistently higher than other methods under comparison.

Illustration of two main steps of atlas construction: (a) Growth model estimation for each facial expression image sequence; (b) facial expression atlas construction from image sequences of the whole population based on longitudinal (i.e., temporal) atlas construction and sparse representation.

Micro-expression recognition (MER) is very challenging, not only because of the repressed facial appearance and the extremely short duration, but also because it is difficult to obtain enough data and reliable annotations. It thus raises a serious problem here, training a CNN model from micro-expression data is not feasible as the lack of data for micro-expression. To make things even worse, we even do not have enough data to fine-tune the network closer towards our datasets. We propose the first time to explore the possible use of deep learning for micro-expression recognition task. To solve the problem caused by lack of samples, we applied feature selection approaches which have been widely used in the computer vision community for decades to remove the irrelevant features for MER. We extended evolutionary algorithms to search for an optimal set of deep features. We evaluated our method on the two subsets of the SMIC dataset (the samples captured by the high-speed (HS) and normal (VIS) cameras) and CASME-II datasets. The experimental results show that the proposed method improves the accuracy of the baseline method substantially. Nevertheless, there is still much space for improving our method, compared with the state-of-the-art approach using hand-crafted features.

Framework of the proposed method.

Automatically recognizing pain from spontaneous facial expression is of increased attention, since it can provide for a direct and relatively objective indication to pain experience. Until now most of the existing works have focused on analyzing pain from individual images or video-frames, hence discarding the spatio-temporal information that can be useful in the continuous pain assessment. In this context, we investigated and quantified the role of the spatio-temporal information in pain assessment by comparing the performance of several baseline local descriptors used in their traditional spatial form against their spatio-temporal counterparts that take into account the video dynamics. For this purpose, we performed extensive experiments on two benchmark datasets. Our results indicated that using spatio-temporal information to classify video-sequences consistently shows superior performance when compared against the one obtained using only static information.

The considered approach for studying the role of the spatio-temporal information in pain assessment.

Automatic pain intensity estimation from videos possesses a significant position in healthcare and medical field. Traditional static methods prefer to extract features from frames separately in a video, which would result in unstable changes and peaks among adjacent frames. To overcome this problem, we propose a real-time regression framework based on the recurrent convolutional neural network (RCNN) for automatic frame-level pain intensity estimation. Given vector sequences of AAM-warped facial images, we used a sliding-window strategy to obtain fixed-length input samples for the recurrent network. We then carefully design the architecture of the recurrent network and modify the last layer of normal RCNN to output continuous-valued pain intensity. The proposed end-to-end pain intensity regression framework can predict the pain intensity of each frame by considering sufficiently large historical frames while limiting the scale of the parameters within the model. Our method achieves promising results regarding both accuracy and running speed on the published UNBC-McMaster Shoulder Pain Expression Archive Database.

The framework of the proposed pain intensity estimation approach.

Human emotion recognition is one of the most active areas of computer vision research. Most of the research efforts so far have focused on facial expression recognition. This limits the applicability of emotion recognition techniques because suitable frontal face data of sufficient quality cannot be captured when subjects are observed from a distance. However, human body movement also conveys information such as cues about feelings or mood of a person and can be observed from a distance. Earlier attempts at automated emotion recognition from human body movement have been limited to analyzing acted data. In contrast, our research has focused on real non-acted data. A database of 96 subjects affected by positive or negative feedback was collected from TV broadcast data and two baseline methods were used to recognize the affective state of a person. The baseline results are promising and encourage further study in this domain.

Emotion from gait.

Recent studies validated the feasibility of estimating heart rate from human faces in RGB videos. However, test subjects are often recorded under controlled conditions, as illumination variations significantly affect the RGB-based heart rate estimation accuracy. Our approach published at CVPR 2014 was a significant step towards real-world applications. Recently, Intel announced a low-cost RealSense 3D (RGBD) camera which is becoming ubiquitous in laptops and mobile devices, opening the door for new applications of computer vision. RealSense cameras produce RGB images with extra depth information inferred from a latent near-infrared (NIR) channel. We experimentally demonstrated, for the first time, that heart rate can be reliably estimated from RealSense near-infrared images. This enables illumination invariant heart rate estimation, extending the heart rate from video feasibility to low-light applications, such as night driving. With the (coming) ubiquitous presence of RealSense devices, the proposed method not only utilizes its near-infrared channel, designed originally to be hidden from consumers; but also exploits the associated depth information for improved robustness to head pose.

Videos captured under different illuminations.

Biomedical Image Analysis

Biological research depends heavily on imaging to answer important questions about cell behaviour and tissue/organism growth. Cell behaviour is influenced by many genes and proteins, and to extract meaningful insights from captured images it is often necessary to analyze large number of cells or samples. In recent years, advances in imaging techniques have enabled capture of large quantities of data, which cannot be fully analyzed manually. One of the key challenge with automated analysis is the difficulty of separating individual cells when they come in contact with each other due to low contrast, weak boundaries, and deformable shapes. This results in many ambiguous regions, in which it can be very challenging even for a biologist to accurately delineate cells. Cell proposals provide an efficient way of utilizing both spatial and temporal context to resolve most of the ambiguities.

We have developed a convolutional neural network (CNN) based method which provides individual cell segmentation proposals, which can be used for cell detection, segmentation and tracking. Our method (shown in the figure below) consists of two stages. First stage uses a CNN to propose cell candidate bounding boxes and their associated scores. Second stage uses another CNN and the boxes from the first stage to propose cell segmentation masks for each candidate. Cell segmentation or tracking can then be performed by selecting the optimal non-conflicting subset of proposals using Integer linear programming. We have tested our method on histology, fluorescence and phase contrast microscopy data and achieved state of the art cell segmentation performance.

Cell Segmentation Proposal Network: Top half shows the first network, which proposes N bounding boxes and their scores. Bottom half shows the second network which generate segmentation masks for the N proposals. Convolution (filter size is shown in the box), max-pooling, and ROI-Pooling + concatenation layers, with the number of feature maps on top of each layer, are shown. Proposed bounding boxes and segmentation masks after non-maxima suppression (NMS) are shown for a selected area from Fluo-N2DL-HeLa dataset.

Microscopic analysis of tumor tissues is necessary for a definitive diagnosis of cancer. Pathology examination requires time consuming scanning through tissue images under different magnification levels to find clinical assessment clues to produce correct diagnoses. Advances in digital imaging techniques offers assessment of pathology images using computer vision and machine learning methods which could automate some of the tasks in the diagnostic pathology work flow. Such automation could be beneficial to obtain fast and precise quantification, reduce observer variability, and increase objectivity.

In our recent work, we have proposed a general framework based on CNNs for learning breast cancer histopathology image features. The proposed framework is independent from microscopy magnification and faster than previous methods as it requires single training. Speed and magnification independence properties are achieved without sacrificing the state-of-the-art performance. Magnification independent models are scalable, new training images from any magnification level could be utilized and trained models could easily be tuned (fine-tuning) by introducing new samples. In this work, we have also proposed a multi-task CNN architecture to predict both the image magnification level and its benign/malignancy property simultaneously. The proposed model allows combining image data from many more resolution levels than four discrete magnification levels.

A malignant breast tumor acquired from a single slide seen in different magnification factors (top). Schematic presentation of our proposal for classifying breast histology images which is independent from the image magnification factor (bottom).

Laser tweezers (LT) is a powerful method in which one or several tightly focused laser beams capable of trapping and manipulating microparticles and live cells are utilized. LT have proven to be an effective tool for studying live cells interaction in vitro. Recently, LT have been applied to studying red blood cell (RBC) reversible aggregation - the process of cells clumping and dissociation that strongly determines the blood microcirculation. However, to fully assess the data on the interaction forces obtained with the LT it is crucial to have a possibility to evaluate the time of cells interaction and the surface of their overlapping during the measurement procedure. Therefore, we have developed an image-based analysis algorithm that can be used to profile the dependence of the cells overlapping surface area on the interaction time.

Recently, we also studied exudate segmentation in colour retinal fundus images. As the window to a person's body, retinal fundus contains rich anatomical structures, such as optic disk, vessels and macula. Exudate on the retinal fundus is an important manifestation of the presence of diabetic retinopathy. They appear as white/yellow soft structures and have variable sizes. The automation of the exudate segmentation in colour retinal fundus images is an important task in computer aided diagnosis and screening systems for diabetic retinopathy. In our work, we propose a location-to-segmentation strategy for automatic exudate segmentation in colour retinal fundus images, which includes three stages: anatomic structure removal, exudate location and exudate segmentation. In anatomic structure removal stage, matched filters based main vessels segmentation method and a saliency based optic disk segmentation method are proposed. The main vessel and optic disk are then removed to eliminate the adverse effects that they bring to the second stage. In the location stage, we learn a random forest classifier to classify patches into two classes: exudate patches and exudate-free patches, in which the histograms of completed local binary patterns are extracted to describe the texture structures of the patches. Finally, the local variance, the size prior about the exudate regions and the local contrast prior are used to segment the exudate regions out from patches which are classified as exudate patches in the location stage. The experimental results on e-ophtha EX dataset for the exudate level validation and DiaRetDB1 for image-level evaluation show the effectiveness of the proposed exudate segmentation method.

Exudate segmentation in colour retinal fundus images. (a) The input colour retinal fundus image, where exudate regions are in green box. (b) The field of view part in which the optic disc and the vessels are removed. (c) The detected exudate patches obtained by the proposed exudate location method. (d) The final segmentation result, in which the white regions are the exudate regions. (e) Zooming into the exudate regions in the original image. (f) Zooming into the regions detected as exudate regions.

Computer aided diagnosis (CAD) is an important issue, which can significantly improve the efficiency of doctors. We propose a deep convolutional neural network (CNN) based method for thorax disease diagnosis. We propose to align the images by matching the interest points between the images, and then enlarge the dataset by using Gaussian scale space theory. After that we use the enlarged dataset to train a deep CNN model and apply the obtained model to the diagnosis of new test data. Our experimental results show our method achieves very promising results.

Chest x-ray (chest radiographic images). (a) and (b) are normal chest radiographs. (c) is Q fever pneumonia (Q fever is a disease caused by infection with Coxiella burnetii).

We propose two vesselness maps and a simple to difficult learning framework for vessel segmentation which is ground truth free. The first vesselness map is the multiscale centreline-boundary contrast map which is inspired by the appearance of vessels. The other is the difference of diffusion map which measures the difference of the diffused image and the original one. Meanwhile, two existing vesselness maps are generated. Totally, 4 vesselness maps are generated. In each vesselness map, pixels with large vesselness values are regarded as positive samples. Pixels around the positive samples with small vesselness values are regarded as negative samples. Then we learn a strong classifier for the retinal image based on other 3 vesselness maps to determine the pixels with mediocre values in single vesselness map. Finally, pixels with two classifier supports are labelled as vessel pixels. The experimental results on DRIVE and STARE show that our method outperforms the state-of-the-art unsupervised methods and achieves competitive performances to supervised methods.

(a) A 5x7 centreline-boundary contrast filter with line structure. (b) The input image. (c) The centreline-boundary contrast vesselness map.

Indirect Immunofluorescence Imaging of Human Epithelial Type 2 (HEp-2) cells is an effective way to identify the presence of Anti-Nuclear Antibody (ANA). Strong illumination variation is a key challenge in the Human Epithelial Type 2 (HEp-2) cell classification task. Aiming to improve the robustness of the HEp-2 classification system to the illumination variation, we deeply explore discriminative and illumination robust descriptors. Specifically, we propose a novel Spatial Shape Index Descriptor (SSID) to capture spatial layout information of the second-order structures, and utilize a Local Orientation Adaptive Descriptor (LOAD), which was originally designed for texture classification, to the HEp-2 cell classification task. Both SSID and LOAD show strong discrimination and great complementarity to each other. Four different sets of experiments were carried out to evaluate SSID, LOAD and their combination. Our two submissions achieved superior performance on the new Executable Thematic of Pattern Recognition Techniques for Indirect Immunouorescence images analysis. Compared to the rank 1st method in the ICPR 2014 HEp-2 cell classification contest, both of our submissions achieved a better performance when only using the provided training data. Our approaches also demonstrated superior performance on a newly compiled large-scale HEp-2 data set with 63,445 cell images.

An illustration of strong illumination variations existing in the HEp-2 cells. The images in the red box (the images in the first row belong to the "Positive" type and the images in the second row come from the "Intermediate".) show that there is huge appearance variations between the "Positive" and "Intermediate" types from the same category. The images in the blue box (the images in each row come from one category.) show that even only in the "Positive" type, the illumination varies a lot among the images.

As well, most existing works on HEp-2 cell classification mainly focus on feature extraction, feature encoding and classifier design. Very few efforts have been devoted to study the importance of the pre-processing techniques. We analyze the importance of the pre-processing, and investigate the role of Gaussian Scale Space (GSS) theory as a pre-processing approach for the HEp-2 cell classification task. We validate the GSS pre-processing under the Local Binary Pattern (LBP) and the Bag-of-Words (BoW) frameworks. Under the BoW framework, the introduced pre-processing approach, using only one Local Orientation Adaptive Descriptor (LOAD), achieved superior performance on the Executable Thematic on Pattern Recognition Techniques for Indirect Immunofluorescence (ET-PRT-IIF) image analysis. Our system, u sing only one feature, outperformed the winner of the ICPR2014 contest that combined four types of features. Meanwhile, the proposed pre-processing method is not restricted to this work; it can be generalized to many existing works.

Vision Systems Engineering

Vision systems engineering research aims to identify attractive computing approaches, architectures, and algorithms for industrial machine vision systems. Future information infrastructures are anticipated to consist of large numbers of wireless sensors and actuators that do a lot more than simple parameter sensing and low rate messaging over wireless and wired links. They will be built with Internet-of Things networks designed to support advanced sensing modalities including imaging, and rely on machine learning in the functionalities.

We have shown with experimental implementations in 2016 that fully programmable, but application adapted GPU accelerators are extremely attractive, achieving more that 50-fold energy efficiencies in comparison to common off-the-shelf mobile GPUs with typical machine vision applications. They are viable parallel processing alternatives for dedicated hardwired, possessing potential to satisfy the throughput and energy efficiency requirements of advanced embedded perceptual interfaces, and machine learning algorithms.

Part of an application adapted GPU accelerator.

The sensors of may include ultra-low cost printed optics based lenslet cameras we have so far designed and investigated for microscopy uses. Those need significant computing resources for image reconstruction algorithms available from the dedicated GPU designs.

Printed optics microscope prototype attached to a tablet computer and details: USB camera sensor, lenslet array, sample holder and light source.

Exploitation of Results

Many researchers have adopted and further developed our methodologies. Our research results are used in a wide variety of different applications around the world. For example, the Local Binary Pattern methodology and its variants are used in numerous image analysis tasks and applications, such as biomedical image analysis, biometrics, industrial inspection, remote sensing and video analysis. The researchers in CMV have actively published the source codes of their algorithms for the research community, and this has increased the exploitation of the results.

The results have also been utilized in our own projects. For example, we have collaborated with Prof. Tapio Seppänen’s Biomedical Engineering Group in the area of multimodal emotion recognition and heart rate measurements for medical applications, combining vision with physiological biosignals. Together with Prof. Sanna Järvelä we have started collaboration on applying affective computing to technology-enhanced learning. We have also continued our collaboration with Biocenter Oulu in a project funded by the European Regional Development Fund where the aim has been to explore the opportunities provided by hyperspectral microscopy imaging. With Dr. Jerome Thevenot (Faculty of Medicine) we have been investigating the use of facial image analysis in medical diagnostics. With Prof. Osmo Tervonen (Faculty of Medicine) we have investigated deep learning based methodology for the diagnosis of thorax disease

Most of our funding for both basic and applied research comes from public sources such as the Academy of Finland and Tekes, but besides these sources, MVG also conducts research by contract which is funded by companies. In this way, our expertise is being utilized by industry for commercial purposes, and even in consumer products, like mobile devices.

The MVG has actively encouraged and supported the birth of research group spin-outs. This gives an opportunity for young researchers to start their own teams and groups. Side results are the spin-out enterprises. According to our experience, their roots are especially in the strands of “free academic research”. Over the years the MVG has contributed to the birth of several such spin-outs.

Future Goals

Our results from 2016 are positive, for example the number of publications in major forums has clearly increased. Having two new FiDiPro projects for distinguished  scientists from abroad will make an exciting progress possible also in coming years. We will continue to sharpen our strategies to meet the future demands and ensure enough research funding in an increasingly tough competition. We plan to carry out well focused cutting-edge research, for example, on novel image and video descriptors, multimodal face analysis and biometrics, multimodal analysis of emotions, 3D computer vision, biomedical image analysis, and energy-efficient architectures for embedded vision systems. Machine learning, especially deep learning, is playing a key role in today’s computer vision research. We will further strengthen our expertise in this area. We also have plans to further deepen our collaboration with international and domestic partners. We plan to participate in new European project proposals, and continue applying funding for breakthrough research from the Academy of Finland and the European Research Council (ERC). Close interaction between basic and applied research has always been a major strength of our research unit. The scientific output of the MVG has been increasing significantly in recent years. With this we expect to have much new potential for producing novel innovations and exploitation of research results in collaboration with companies and other partners.

Doctoral Theses

Varjo S (2016) A direct microlens array imaging system for microscopy. Dissertation, Acta Univ Oul C 588.

Ylioinas J (2016) Towards optimal local binary patterns in texture and face description. Dissertation, Acta Univ Oul C 597.

Selected Publications

Boulkenafet Z, Komulainen J & Hadid A (2017) Face anti-spoofing using speeded-up robust features and Fisher vector encoding. Signal Processing Letters, accepted.

Boulkenafet Z, Komulainen J, Li Lei, Feng X & Hadid A (2017) OULU-NPU: A mobile face presentation attack database with real-world variations. Proc. IEEE International Conference on Automatic Face and Gesture Recognition, accepted.

Chen J, Patel V, Kellokumpu V, Zhao G, Pietikäinen M & Chellappa R (2017) Robust local features for remote face recognition. Image and Vision Computing, accepted.

Chrysos G, Antonakos E, Snape P, Asthana A & Zafeiriou S (2017) A comprehensive performance evaluation of deformable face tracking "in-the-wild". International Journal of Computer Vision, accepted.

Fabris A, Nicolaou M, Kotsia I & Zafeiriou S (2017) Dynamic probabilistic linear discriminant analysis for video classification. IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), accepted.

Ke W, Chen J, Jiao J, Zhao G & Ye Q (2017) SRN: Side-output residual network for object symmetry detection in the wild. Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), accepted, 9 p.

Li X, Hong X, Moilanen A, Huang X, Pfister T, Zhao G & Pietikäinen M (2017) Towards reading hidden emotions: A comparative study of spontaneous micro-expression spotting and recognition methods. IEEE Transactions on Affective Computing, accepted.

Liu L, Fieguth P, Guo Y, Wang X & Pietikäinen M (2017) Local binary features for texture classification: Taxonomy and experimental study. Pattern Recognition, 62:135-160.

Liu Q, Hong X, Zou B, Chen J, Chen Z & Zhao G (2017) Hierarchical Contour Closure based Holistic Salient Object Detection IEEE Transactions on Image Processing, accepted.

Liu Q, Zou B, Chen J, Ke W, Yue K, Chen Z, Zhao G (2017) A location-to-segmentation strategy for automatic exudate segmentation in colour retinal fundus images. Computerized Medical Imaging and Graphics, 55:78-86.

Liu X, Yao J, Hong X, Huang X, Zhou Z, Qi C & Zhao G (2017) Background subtraction using spatio-temporal group sparsity recovery IEEE Transactions on Circuits and Systems for Video Technology, accepted.

Otani M, Nakashima Y, Rahtu E, Heikkilä J & Yokoya N (2017) Video summarization using deep semantic features. In: Computer Vision - ACCV 2016, Lecture Notes in Computer Science, in press.

Qi X, Zhao G, Li C-G, Guo J & Pietikäinen M (2017) HEp-2 cell classification via combining multi-resolution co-occurrence texture and large regional shape information. IEEE Journal of Biomedical and Health Informatics, 21(2):429-440.

Saarela U, Akram S, Desgrange A, Rak-Raszewska A, Shan J, Cereghini S, Ronkainen VP, Heikkilä J, Skovorodkin I & Vainio S (2017) Novel fixed z-direction (FiZD) kidney primordia and an organoid culture system for time-lapse confocal imaging. Development 144: 1113-1117.

Ye Q, Zhang T, Ke W, Qiu Q, Chen J , Shapiro G & Zhang B (2017) Self-learning scene-specific pedestrian detectors using a progressive latent model. Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), accepted, 9 p.

Akram S U, Kannala J, Eklund L & Heikkilä J (2016) Cell segmentation proposal network for microscopy image analysis. In: LABELS 2016, DLMIA 2016, Lecture Notes in Computer Science, 10008:21-29.

Akram S U, Kannala J, Eklund L & Heikkilä J (2016) Joint cell segmentation and tracking using cell proposals. Proc. IEEE International Symposium on Biomedical Imaging (ISBI), 920-924.

Akram S U, Kannala J, Eklund L & Heikkilä J (2016) Cell proposal network for microscopy image analysis. Proc. International Conference on Image Processing (ICIP 2016), 3199-3203.

Antonakos E, Snape P, Trigeorgis G & Zafeiriou S (2016) Adaptive cascaded regression. Proc. International Conference on Image Processing (ICIP 2016), 1649-1653.

Bayramoglu N & Alatan A (2016) Comparison of 3D local and global descriptors for similarity retrieval of range data. Neurocomputing, 183:13-27.

Bayramoglu N & Heikkilä J (2016) Transfer learning for cell nuclei classification in histopathology images. In: Computer Vision - ECCV 2016 Workshops, Lecture Notes in Computer Science, 9915:532-539.

Bayramoglu N, Kannala J & Heikkilä J (2016) Deep learning for magnification independent breast cancer histopathology image classification. Proc. International Conference on Pattern Recognition (ICPR 2016), 2440-2445.

Bhat K K S, Musti U & Heikkilä J (2016) Geometry based exhaustive line correspondence determination. Proc. IEEE International Conference on Robotics and Automation (ICRA 2016), 4341-4348.

Bordallo López M (2016) Mobile platform challenges in interactive computer vision applications. Multi-Core Computer Vision and Image Processing for Intelligent Applications, in press, IGI-Global.

Bordallo Lopez M, Boutellaa E & Hadid A (2016) Comments on the "Kinship Face in the Wild" data sets. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(11):2342-2344.

Boulkenafet Z, Komulainen J & Hadid A (2016) Face spoofing detection using colour texture analysis. IEEE Transactions on Information Forensics & Security , 11(8):1818-1830.

Boulkenafet Z, Komulainen J & Hadid A (2016) Scale space texture analysis for face anti-spoofing. Proc. IAPR International Conference on Biometrics (ICB 2016), 1-6.

Boutellaa E, Bordallo Lopez M, Ait-Aoudia S, Feng X & Hadid A (2016) Kinship verification from videos using texture spatio-temporal features and deep learning. Proc. IAPR International Conference on Biometrics (ICB 2016), 1-7.

Boutellaa E, Boulkenafet Z, Komulainen J & Hadid A (2016) Audiovisual synchrony assessment for replay attack detection in talking face biometrics. Multimedia Tools and Applications, 75(9):5329-5343.

Boutellier J & Hautala I (2016) Executing dynamic data rate actor networks on OpenCL platforms. Proc, IEEE International Workshop on Signal Processing Systems (SiPS 2016), 98-103.

Bykov A, Huttunen S, Mäkinen M & Meglinski I (2016) Imaging of biotissues with circularly polarized light for cancer detection. Proceedings of the 50th annual conference of the Finnish Physical Society.

Chen J, Lei Z, Liu L, Zhao G & Pietikäinen M (2016) Editorial: RoLoD - Robust local descriptors for computer vision. Neurocomputing, 184:1-2.

Chen J, Li X, Pietikäinen M, Chang Z, Qiu Q, Sapiro G & Bronstein A (2016) RealSense = Real heart rate: Illumination invariant heart rate estimation from videos. Proc. International Conference on Image Processing Theory, Tools and Applications (IPTA), 1-6.

Chen J, Qi X, Tervonen O, Silven O, Zhao G & Pietikäinen M (2016) Thorax disease diagnosis using deep convolutional neural network. Proc. 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC 2016), 2287-2290.

Flusser J, Farokhi S, Hoschl C, Suk T, Zitova B & Pedone M (2016) Recognition of images degraded by Gaussian blur. IEEE Transactions on Image Processing, 25(2):790-806.

Guo Y, Zhao G & Pietikäinen M (2016) Dynamic facial expression recognition with atlas construction and sparse representation. IEEE Transactions on Image Processing, 25(5):1977-1992.

Hautala I, Boutellier J & Silvén O (2016) Programmable 28nm coprocessor for HEVC/H.265 in-loop filters. Proc. IEEE International Symposium on Circuits and Systems (ISCAS 2016), 1570-1573.

Herrera Castro D, Kannala J & Heikkilä J (2016) Forget the checkerboard: practical self-calibration using a planar scene. Proc. IEEE Winter Conference on Applications of Computer Vision (WACV 2016), 9 p.

Hong X, Xu Y & Zhao G (2016) LBP-TOP: A tensor unfolding revisit. Proc. ACCV Workshops, 1:513-527.

Hong X, Zhao G, Zafeiriou S, Pantic M & Pietikäinen M (2016) Capturing correlations of local features for image representation. Neurocomputing, 184:99-106.

Huang X, Kortelainen J, Zhao G, Li X, Moilanen A, Seppänen T & Pietikäinen M (2016) Multi-modal emotion analysis from facial expressions and electroencephalogram. Computer Vision and Image Understanding, 147:114-124.

Huang X, Zhao G, Hong X, Zheng W & Pietikäinen M (2016) Spontaneous facial micro-expression analysis using spatiotemporal completed local quantized patterns. Neurocomputing, 175:564-578.

Jiang X, Lian J, Xia Z, Feng X & Hadid A (2016) Fast Chinese character detection from complex scenes. Proc. International Conference on Image Processing Theory, Tools and Applications (IPTA), 1-4.

Kellokumpu V, Särkiniemi M & Zhao G (2016) Affective gait recognition and baseline evaluation from real world samples. Proc. ACCV Workshops, 1:567-575.

Komulainen J, Anina I, Holappa J, Boutellaa E & Hadid A (2016) On the robustness of audiovisual liveness detection to visual speech animation. Proc. IEEE Eighth International Conference on Biometrics: Theory, Applications and Systems (BTAS 2016), 1-8.

Kukka H, Marjakangas P, Kellokumpu V & Ojala T (2016) Spontaneous device association using inaudible audio signatures. , 91-96.

Laskar Z, Huttunen S, Herrera Castro D, Rahtu E & Kannala J (2016) Robust loop closures for scene reconstruction by combining odometry and visual correspondences. Proc. International Conference on Image Processing (ICIP 2016), 2603-2607.

Li L, Feng X, Boulkenafet Z, Xia Z & Hadid A (2016) A robust face anti-spoofing approach using partial convolutional neural network. Proc. International Conference on Image Processing Theory, Tools and Applications (IPTA), accepted.

Li L, Ghazi A, Boutellier J, Anttila L, Valkama M & Bhattacharyya S (2016) Evolutionary multiobjective optimization for digital predistortion architectures. Proc. International Conference on Cognitive Radio Oriented Wireless Networks (CrownCom), 498-510.

Li L, Ghazi A, Boutellier J, Anttila L, Valkama M & Bhattacharyya S (2016) Design space exploration and constrained multiobjective optimization for digital predistortion systems. Proc. IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP), 182-185.

Li X, Komulainen J, Zhao G, Yuen PC & Pietikäinen M (2016) Generalized face anti-spoofing by detecting pulse from face videos. Proc. International Conference on Pattern Recognition (ICPR 2016) , 4244-4249.

Linder E, Varjo S & Thors C (2016) Mobile diagnostics based on motion? A close look at motility patterns in the schistosome life cycle. Diagnostics 6(2), 24, 1-22.

Liu L, Fieguth P, Wang X, Pietikäinen M & Hu D (2016) Evaluation of LBP and deep texture descriptors with a new robustness benchmark. In: Computer Vision, ECCV 2016 Proceedings, Lecture Notes in Computer Science, 9907:69-86.

Liu L, Fieguth P, Zhao G, Pietikäinen M & Hu D (2016) Extended local binary patterns for face recognition. Information Sciences, 358-359:56-72.

Liu L, Lao S, Fieguth P, Guo Y, Wang X & Pietikäinen M (2016) Median robust extended local binary pattern for texture classification. IEEE Transactions on Image Processing, 25(3):1368-1381.

Liu S, Yang B, Yuen P & Zhao G (2016) A 3D mask face anti-spoofing database with real world variations. Proc. CVPR Workshops, 1535-1543.

Liu S, Yuen PC, Zhang S & Zhao G (2016) 3D mask face anti-spoofing with remote photoplethysmography. In: Computer Vision, ECCV 2016 Proceedings, Lecture Notes in Computer Science, 9911:85-100.

Liu Y-J, Zhang J-K, Yan W-J, Wang S-J, Zhao G & Fu X-L (2016) A main directional mean optical flow feature for spontaneous micro-expression recognition. IEEE Transactions on Affective Computing, in press (available online).

Matilainen M, Sangi P, Holappa J & Silvén O (2016) OUHANDS database for hand detection and pose recognition. Proc. International Conference on Image Processing Theory, Tools and Applications (IPTA).

Melekhov I, Kannala J & Rahtu E (2016) Siamese network features for image matching. Proc. International Conference on Pattern Recognition (ICPR 2016), 378-383.

Michalska M, Zufferey N, Boutellier J, Bezati E & Mattavelli M (2016) Efficient scheduling policies for dynamic dataflow programs executed on multi-core. International Workshop on Programmability and Architectures for Heterogeneous Multicores.

Mustaniemi J, Kannala J & Heikkilä J (2016) Parallax correction via disparity estimation in a multi-aperture camera. Machine Vision and Applications 27(8): 1313-1323.

Otani M, Nakashima Y, Rahtu E, Heikkilä J & Yokoya N (2016) Learning joint representations of videos and sentences with web image search. In: Computer Vision - ECCV 2016 Workshops, Lecture Notes in Computer Science, 9913:651-667.

Patel D, Hong X & Zhao G (2016) Selective deep features for micro-expression. Proc. International Conference on Pattern Recognition (ICPR 2016), 2258-2263.

Qi X, Li C-G, Zhao G, Hong X & Pietikäinen M (2016) Dynamic texture and scene classification by transferring deep image features. Neurocomputing, 171:1230-1241.

Qi X, Zhao G, Chen J & Pietikäinen M (2016) HEp-2 cell classification: The role of Gaussian scale space theory as a pre-processing approach. Pattern Recognition Letters, in press (available online).

Qi X, Zhao G, Shen L, Li Q & Pietikäinen M (2016) LOAD: Local orientation adaptive descriptor for texture and material classification. Neurocomputing, 184:28-35.

Qi X, Zhao H, Chen J & Pietikäinen M (2016) Exploring illumination robust descriptors for human epithelial type 2 cell classification. Pattern Recognition, 60:420-429.

Saitoh T, Zhou Z, Zhao G & Pietikainen M (2016) Concatenated frame image based CNN for visual speech recognition. Proc. ACCV Workshops, 2:277-289.

Sangi P, Matilainen M, Silvén O (2016) Rotation tolerant hand pose recognition using aggregation of gradient orientations. 13th International Conference, ICIAR 2016 Proceedings, Lecture Notes in Computer Science, 9730:257-267.

Trigeorgis G, Nicolaou M, Zafeiriou S & Schuller B (2016) Deep canonical time warping. Proceedings of IEEE International Conference on Computer Vision & Pattern Recognition (CVPR 2016), 5110-5118.

Trigeorgis G, Snape P, Nicolaou M, Antonakos E & Zafeiriou S (2016) Mnemonic descent method: A recurrent process applied for end-to-end face alignment. Proceedings of IEEE International Conference on Computer Vision & Pattern Recognition (CVPR 2016), 4177-4187.

Wang H, Chai X, Hong X, Zhao G & Chen X (2016) Isolated sign language recognition with Grassmann covariance matrices. ACM Transactions on Accessible Computing, 8(4):1-21.

Wang SJ, Yan WJ, Sun T, Zhao G, Fu X (2016) Sparse tensor canonical correlation analysis for micro-expression recognition. Neurocomputing, 184:99-106.

Xia X, Feng X, Peng J, Peng X & Zhao G (2016) Spontaneous micro-expression spotting via geometric deformation modeling. Computer Vision and Image Understanding, 147:87-94.

Xia Z, Feng X, Peng J & Hadid A (2016) Unsupervised deep hashing for large-scale visual search. Proc. International Conference on Image Processing Theory, Tools and Applications (IPTA), 1-5.

Xia Z, Zhang W, Tan F, Feng X & Hadid A (2016) An accurate eye localization approach for smart embedded system. Proc. International Conference on Image Processing Theory, Tools and Applications (IPTA), 1-5.

Yang R, Bordallo López M, Boutella E, Tong S, Peng J, Feng X & Hadid A (2016) On pain assessment from facial videos using spatio-temporal local descriptors. Proc. International Conference on Image Processing Theory, Tools and Applications (IPTA), 1-6.

Ylioinas J, Poh N, Holappa J & Pietikäinen M (2016) Data-driven techniques for smoothing histograms of local binary patterns. Pattern Recognition, 60:734-747.

Zafeiriou L, Antonakos E, Zafeiriou S & Pantic M (2016) Joint unsupervised deformable spatio-temporal alignment of sequences. Proceedings of IEEE International Conference on Computer Vision & Pattern Recognition (CVPR 2016), 3382-3390.

Zafeiriou S, Papaioannou A, Kotsia I, Nicolaou M & Zhao G (2016) Facial affect "in-the-wild": A survey and a new database. Proc. CVPR Workshops, 1487-1498.

Zafeiriou S, Zhao G, Pietikäinen M, Chellappa R, Kotsia I & Cohn J (2016) Editorial of special issue on spontaneous facial behaviour analysis Computer Vision and Image Understanding, 147:50-51.

Zhou J, Hong X, Su F & Zhao G (2016) Recurrent convolutional neural network regression for continuous pain intensity estimation in video. Proc. CVPR Workshops, 1535-1543.

Zhou Y, Alabort-i-Medina J, Antonakos E, Roussos A & Zafeiriou S (2016) Estimating correspondences of deformable objects in-the-wild. Proceedings of IEEE International Conference on Computer Vision & Pattern Recognition (CVPR 2016), 5791-5801.

Zong Y, Zheng W, Huang X, Yan K, Yan J &, Zhang T (2016) Emotion recognition in the wild via sparse transductive transfer linear discriminant analysis. Journal on Multimodal User Interfaces, 10(2):163-172.

Zong Y, Zheng W, Zhang T & Huang X (2016) Cross-corpus speech emotion recognition based on domain-adaptive least squares regression. IEEE Signal Processing Letters, 23(5):585-589.

Last updated: 20.9.2017