Genomic and Statistical Analysis of Genotype-Phenotype Relationships

Project Leaders
Prof. Outi Savolainen, Ph.D.
Prof. Mikko J. Sillanpää, Ph.D.
Tanja Pyhäjärvi Ph.D.

Department of Biology, and Department of Mathematical Sciences, Biocenter Oulu and Faculty of Science, University of Oulu

Background and Significance

Understanding the genetic basis of phenotypic variation in relation to environmental variation is crucial in many areas of life sciences. Evolutionary geneticists search for the loci governing quantitative genetic variation. We ask what is the distribution of effects of individual variants (small, large, deleterious, beneficial)? What are their mutation rates, are the variant alleles common or rare, are they regulatory or structural variants, or are variants in non-coding regions? Does adaptation arise from existing variation or new mutations? Answering these questions requires identification of the loci responsible for variation. We can then examine the phenotypic effects of individual alleles. This also allows examination of the patterns of sequence variation to answer questions about natural selection. Prediction of phenotypes based on DNA-level information is also important. Plant and animal breeders aim to predict the genomic breeding value underlying phenotypes on the basis of SNP (single nucleotide polymorphism) markers. One of the goals of medical genetics is to predict disease phenotypes. A shared framework of population genetics theory underlies these efforts in different fields.

The possibilities of achieving the goals have increased rapidly. This has been driven partly by advances in sequencing and genotyping technologies. A major thrust has come from developments in population genetics theory, bioinformatics, and genetical statistics. Within the broad area of biometric and genetic analysis of quantitative variation, the Savolainen group studies the genetics of local adaptation of plants. Pyhäjärvi group studies the interplay of gene expression patterns, adaptive variation and quantitative traits. The Sillanpää group develops statistical tools for analyzing the genetic basis of quantitative genetic variation, using Bayesian approaches in particular.

Recent Progress

We have developed conceptually new methods for covariance matrix (or precision matrix / graph topology) inference (Kuismin and Sillanpää 2016; Kuismin et al. 2017), estimating heritability and genetic parameters from pedigree data using multiple-trait animal model and integrated nested Laplace approximation (INLA) (Mathew et al., 2016), and estimating heritability in connection with dynamic traits in twin data (He et al. 2016). For estimation in these pieces of work, we have used iterative methods to find a maximum point of target function.

In addition to the above, more applied work has also been published (Bari et al. 2016a,b; Lausser et al. 2017). We have considered how to collect covariate data in planning of future studies in meta-analytic framework (Karvanen and Sillanpää 2017).

The plant genetics and statistics groups finalized a joint association of genotype-phenotype variation in a set of European populations of Scots pine. The manuscript applies a new association method for genetically differentiated populations displaying local adaptation (Kujala et al. 2017).

The central project on associating pine sequence variation with phenotypic variation has been developing further genomic resources for pines (with exome capture). The patterns of genetic variation have been analysed in some 20 000 SNPs across two clines Europe based on haploid data (Tyrmi et al. in prep.). Analysis of diploid data is ongoing. Based on haploid data, the genetic structure in P. sylvestris is very weak and allele frequency distribution strongly biased towards rare alleles, indicating past population size changes and possible an effect of purifying selection. We have estimated the strength of stabilizing selection influencing timing of budset in different populations, and have examined the role of the potentially associated sequence variants (Kujala et. al. in prep.)

In a new EU-project starting in March, 2016 (GENTREE), work has started by sampling multiple tree species populations across Europe and developing a set of targeted genes for exome capture across these samples. We conducted the field collections in collaboration with LUKE (Natural Resources Institute Finland) for P. sylvestris, Picea abies and Betula pendula.

For population genetics of Arabidopsis lyrata, a large-scale sequencing analysis of range-wide flowering time sequence diversity was published (Mattila et al., 2016). An analysis of the demographic history of Arabidopsis lyrata has been completed based on whole genome sequences. This is a prerequisite for quantifying the strength of selection. We also identified some loci that likely have been targets of climatic selection in individual populations (Mattila et al. submitted). We have been developing the genetic mapping tools based on a variant of RAD sequencing to obtain dense maps to be able to examine the patterns of segregation distortion in crosses of diverged populations (Hämälä et al., in prep.).

For Pinus sylvestris gene expression and de novo transcriptome assembly, RNAseq data from multiple genotypes and tissues has been collected. First de novo transcriptome assembly and first round of differential expression analysis have been conducted.

Future Goals

In the future, we will concentrate on developmental work on covariance matrix estimation, robust variable selection tools which are not so sensitive to the distributional assumptions concerning outlying observations or missing data. We will also continue developmental work on our Bayesian multiple locus method and Gibbs sampling algorithms in joint analysis of association and linkage, using pedigree data. This should provide us with a chance to extract more information from the same amount of data than sole association or linkage analysis. We will also continue our work on Bayesian variable selection methods for semi-parametric and Gaussian process models.

In the context of a starting EU-project (GENTREE), UOULU will be analyzing further patterns of exome variation for adaptation. An important aspect will be analysis of patterns of linkage disequilibrium variation. A recently funded project (Genowood, Academy of Finland) will allow us to examine to conditions for genomic selection in Scots pine. A consortium l between the University of Helsinki, University of Oulu, and the Natural Resource Center (LUKE) first generates further genomic resources and then compares natural and breeding populations and aims at predicting the phenotypes give the genotypes.

With A. lyrata, we are started examining patterns of local adaptation on a smaller scale than before by comparing two altitudinal clines in Norway, at phenotype, whole genome sequence, gene expression, and methylation levels, aiming at resolving roles of genetic adaptation and plasticity.

Publications 2016-

Bari A, Chaubey YP, Sillanpää MJ, Stoddard FL, Damania AB, Alaoui SB, Mackay M. "Applied mathematics in genetic resources: Toward a synergistic approach combining innovations with theoretical aspects." in Applied Mathematics and Omics to Assess Crop Genetic Resources for Climate Change Adaptive Traits. Bari A, Damania A B, Mackay M, Dayanandan S. (editors). Oxford: CRC Press, Taylor & Francis Group, 2016.

Bari A, Khazaei H, Stoddard FL, Street K, Sillanpää MJ, Chaubey YP, Dayanandan S, Endresen DTF, De Pauw E, Damania AB. In silico evaluation of plant genetic resources to search for traits for adaptation to climate change. Climatic Change 134: 667-680, 2016.

He L, Sillanpää MJ, Silventoinen K, Kaprio J, and Pitkäniemi J. Estimating modifying effect of age on genetic and environmental variance components in twin models. Genetics 202(4): 1313-28, 2016.

Kuismin M, Sillanpää MJ. Use of Wishart prior and simple extensions for sparse precision matrix estimation. PLoS ONE 11: e0148171, 2016.

Lascoux M, Glémin S, Savolainen O.Local adaptation in plants. eLS DOI: 10.1002/9780470015902.a0025270, 2016.

Mathew B, Holand AM, Koistinen P, Leon J, Sillanpää MJ. Reparametrization-based estimation of genetic parameters in multi-trait animal model using Integrated Nested Laplace Approximation. Theor Appl Genet 129: 215-225, 2016.

Mattila T M, Aalto E A, Toivainen T , Niittyvuopio A, Piltonen S, Kuittinen H,  Savolainen O. Selection for population specific adaptation has shaped patterns of variation in the photoperiod pathway genes in Arabidopsis lyrata during post glacial colonization. Molecular Ecology 25: 581-597, 2016.

Savolainen O, Lascoux M. Genomics: Geography matters for Arabidopsis. Nature 537:314-315, 2016.

Karvanen J, and Sillanpää MJ. Prioritizing covariates in the planning of future studies in the meta-analytic framework. Biom J 59: 110-125, 2017.

Zhou Y, Duvaux L, Ren G, Zhang L, Savolainen O, Liu J. Incomplete lineage sorting versus introgression: origin of shared genetic variation between two closely related pines with overlapping distributions. Heredity 118:211-220, 2017.

Kuismin MO, Kemppainen JT, Sillanpää MJ. Precision matrix estimation with ROPE. Journal of Computational and Graphical Statistics (accepted), 2017.

Kujala S, Knürr T, Kärkkäinen K, Neale DB, Sillanpää MJ, and Savolainen O. Genetic heterogeneity underlying variation in a locally adaptive clinal trait in Pinus sylvestris revealed by a Bayesian multipopulation analysis. Heredity (accepted), 2017.

Lausser L, Schmid F, Platzer M, Sillanpää MJ, Kestler HA. Semantic multi-classifier systems for the analysis of gene expression profiles. Archives of Data Science Series A (accepted), 2017.

Research Group Members


Project Leaders:
Outi Savolainen, Ph.D., Professor (University of Oulu)
Mikko J. Sillanpää, Ph.D., Professor (University of Oulu)
Tanja Pyhäjärvi, Ph.D., Academy Research Fellow (University of Oulu)

Senior and Post-doctoral Investigators:
Helmi Kuittinen, Ph.D. (University of Oulu)
Sonja Kujala, Ph.D. (University of Oulu, Biocenter Oulu)
Nader Aryamanesh, Ph.D.  (University of Oulu)
Päivi Leinonen, Ph.D. (until Aug. 31st, 2016, Duke University, Academy of Finland)

Ph.D. Students:
Timo Knürr, M.Sc. (MTT Agrifood Research Finland)
Pinja Pikkuhookana, M.Sc. (University of Oulu)
Markku Kuismin M.Sc. (University of Oulu)
Tiina Mattila, M.Sc. (University of Oulu, Emil Aaltonen Foundation)
Jaakko Tyrmi, M.Sc. (EU-project ProCoGen, Academy of Finland)
Tuomas Hämälä, M.Sc. (Biocenter Oulu)

Laboratory Technicians:
Soile Alatalo (University of Oulu)

Main source of salary in brackets.

Foreign Scientists, 1

National and International Activities

Group Members Who Spent More Than Two Weeks in Foreign Laboratories During 2016

Päivi Leinonen, Duke University (8 mo)
Outi Savolainen University of California at Davis (1,5 mo)
Tiina Mattila, Stockholm University (2 wks)

EU Projects (present and progress)

GenTree, Partner, Task Leader (TP), 2016-2020

Last updated: 10/5/2017