Genomic and Statistical Analysis of Genotype-Phenotype Relationships

Project Leaders
Prof. Outi Savolainen, Ph.D.
Prof. Mikko J. Sillanpää, Ph.D.
Tanja Pyhäjärvi Ph.D.

Department of Biology, and Department of Mathematical Sciences, Biocenter Oulu and Faculty of Science, University of Oulu

Background and Significance

Understanding the genetic basis of phenotypic variation in relation to environmental variation is crucial in many areas of life sciences. Evolutionary geneticists search for the loci governing quantitative genetic variation. We ask what is the distribution of effects of individual variants (small, large, deleterious, beneficial)? What are their mutation rates, are the variant alleles common or rare, are they regulatory or structural variants, or are variants in non-coding regions? Does adaptation arise from existing variation or new mutations? Answering these questions requires identification of the loci responsible for variation. We can then examine the phenotypic effects of individual alleles. This also allows examination of the patterns of sequence variation to answer questions about natural selection. Prediction of phenotypes based on DNA-level information is also important. Plant and animal breeders aim to predict the genomic breeding value underlying phenotypes on the basis of SNP (single nucleotide polymorphism) markers. One of the goals of medical genetics is to predict disease phenotypes. A shared framework of population genetics theory underlies these efforts in different fields.

The possibilities of achieving the goals have increased rapidly. This has been driven partly by advances in sequencing and genotyping technologies. A major thrust has come from developments in population genetics theory, bioinformatics, and genetical statistics. Within the broad area of biometric and genetic analysis of quantitative variation, the Savolainen group studies the genetics of local adaptation of plants. The Pyhäjärvi group studies the interplay of gene expression patterns, adaptive variation and quantitative traits. The Sillanpää group develops statistical tools for analyzing the genetic basis of quantitative genetic variation, using Bayesian approaches in particular.

Recent Progress

The statistical group developed conceptually new methods for covariance matrix (or precision matrix / graph topology) inference (Kuismin and Sillanpää 2017; Kuismin et al. 2017a,b). Such methods can be utilised to construct gene networks based on gene expression data or population structure estimation from SNP-data. For estimation of SNP-heritability or genomic prediction purposes, we have introduced new genomic relationship matrix where LD (dependence between SNPs) is accounted for (Mathew et al. 2017a). We have presented R-package to estimate dynamic heritability in connection with dynamic traits in twin data (He et al. 2017). Additionally we have performed epistasis search for higher-order interactions in flowering time based on Bayesian estimation method in barley MAGIC population (Mathew et al. 2017b). Moreover, we have considered how to collect covariate data in planning of future studies in meta-analytic framework (Karvanen and Sillanpää 2017).

The plant genetics and statistics groups finalized a joint association of genotype-phenotype variation in a set of European populations of Scots pine. The manuscript applies a new association method for genetically differentiated populations displaying local adaptation (Kujala et al. 2017).

The patterns of genetic variation have been analysed in some 20 000 SNPs across two clines Europe based on haploid data (Tyrmi et al. in prep.). Multiple methods have been used to identify a set of genes under clinal natural selection. We have estimated the strength of stabilizing selection influencing timing of budset in different populations, and have examined the role of the potentially associated sequence variants (Kujala et. al. in prep.) In a collaborative study with groups at research institutes INIA (leader of this collaboration, Spain) and INRA (France), we have also compared the strength of selection in two pines, Pinus sylvestris (our species) and Pinus pinaster (a Mediterranean species) (Grivet et al. 2017). Further, in collaboration with a group at Lanzhou and Sichuan Universities, we have examined patterns and causes of shared genetic variation in two pines, P. hwangshanensis and P. massoniana.

In EU-project GENTREE, DNA sample collection and extraction of multiple tree species populations across Europe has been completed.  A set of targeted genes for exome capture has been developed and the DNA sequencing is ongoing. Our group was responsible for target selection in Betula pendula.

Within a new consortium (GENOWOOD), funded by the Academy of Finland and led by the University of Helsinki, we collaborate with LUKE (Natural Resource Institute Finland) to advance tools for applying genomic selection in forest trees. The project has started by developing of genomic resources and by obtaining phenotypic data on a long term pine experiment to examine effects of early phenotypic traits on later survival and growth, and the genetic basis of these effects.

For population genetics of Arabidopsis lyrata, an analysis of the demographic history of Arabidopsis lyrata has been completed based on whole genome sequences. This is a prerequisite for quantifying the strength of selection. We also identified some loci that likely have been targets of climatic selection in individual populations (Mattila et al.2017). We have been developing the genetic mapping tools based on a variant of RAD sequencing to obtain dense maps to be able to examine the patterns of segregation distortion in crosses of diverged populations (Hämälä et al. 2017).  We have also analysed the genomics of short range local adaptation in populations of Arabidopsis lyrata undergoing gene flow (Hämälä et al. in prep.).

RNAseq data from multiple genotypes and tissues of Pinus sylvestris have been used to produce a high quality de novo transcriptome assembly and super transcripts. Differential expression analysis has been used to identify patterns of tissue-specific gene-expression. SNP data from haploid and diploid material has been used to design exome capture probe set for P. sylvestris resequencing.

Future Goals

In the future, we will concentrate on developmental work on covariance matrix estimation, robust variable selection tools which are not so sensitive to the distributional assumptions concerning outlying observations or missing data. We will also continue developmental work on our Bayesian multiple locus method and Gibbs sampling algorithms in joint analysis of association and linkage, using pedigree data. This should provide us with a chance to extract more information from the same amount of data than sole association or linkage analysis. We will also continue our work on Bayesian variable selection methods for semi-parametric and Gaussian process models.

In the context of the EU-project (GENTREE), UOULU will be analyzing further patterns of exome variation for adaptation. An important aspect will be analysis of patterns of linkage disequilibrium variation. We develop further genomic resources and examine to conditions for genomic selection in Scots pine. A new H2020 project, B4EST is undergoing grant agreement negotiations. Our role in the project is to develop a new genotyping chip for P. sylvestris and other forest tree species and e.g. identify essential information for P. sylvestris breeding via genomic analysis in multi-environment and multi-trait settings.

With A.  lyrata, we are  examining patterns of local adaptation on a smaller scale than before by comparing two altitudinal clines in Norway, at phenotype, whole genome sequence, gene expression, and methylation levels, aiming at resolving roles of genetic adaptation and plasticity.

Publications 2017-

Grivet D, Avia K, Vaattovaara A , Eckert A, Neale DB, Savolainen O, González-Martínez SC. High rate of adaptive evolution in two widespread European pines. Mol Ecol 26(24):6857-6870, 2017.

He L, Pitkäniemi J, Silventoinen K, Sillanpää MJ. ACEt: An R-package for estimating dynamic heritability and comparing twin models. Behav Genet 47(6): 620-641, 2017.

Hämälä T, Mattila T, Kuittinen H, Savolainen O. Seed germination and its role in adaptation and reproductive isolation in Arabidopsis lyrata. Mol Ecol 26(13):3484-3496, 2017.

Karvanen J, Sillanpää MJ. Prioritizing covariates in the planning of future studies in the meta-analytic framework. Biometrical Journal 59: 110-125, 2017.

Kuismin MO, Ahlinder J, Sillanpää MJ. CONE: Community oriented network estimation is a versatile framework for inferring population structure in large scale sequencing data. G3: Genes, Genomes, Genetics 7: 3359-3377, 2017.

Kuismin MO, Kemppainen JT, Sillanpää MJ. Precision matrix estimation with ROPE. J Comp Graph Statistics 26: 682-694, 2017.

Kuismin MO, Sillanpää MJ. Estimation of covariance and precision matrix, network structure and a view towards systems biology. WIRES Computational Statistics 9: e1415, 2017.

Kujala ST, Knürr T, Kärkkäinen K,  Neale DB, Sillanpää MJ, Savolainen O.  Genetic heterogeneity underlying a locally adaptive clinal trait in Pinus sylvestris revealed by a Bayesian multipopulation analysis. Heredity  118(5):413-423, 2017.

Mattila T, Tyrmi J, Pyhäjärvi T, Savolainen  O. Genome-wide analysis of colonization history and concomitant selection in Arabidopsis lyrata.  Mol Biol Evol 34(10):26652677, 2017.

Mattila TM, Tyrmi J, Pyhäjärvi T, Savolainen O. Genome-wide analysis of colonization history and concomitant selection in Arabidopsis lyrata. Mol Biol Evol 34(10):2665-2677, 2017.

Zhou Y-F, Duvaux L, Ren G, Zhang L, Savolainen O, Liu J. Importance of incomplete lineage sorting and introgression in the origin of shared genetic variation between two closely related pines with overlapping distributions. Heredity 118(3): 211-220, 2017.

Mathew B, Leon J, Sillanpää MJ. A novel linkage-disequilibrium corrected genomic relationship matrix for SNP-heritability estimation and genomic prediction. Heredity doi:10.1038/s41437-017-0023-4. Epub ahead of print, 2017.

Mathew B, Leon J, Sannemann W, Sillanpää MJ. Detection of epistasis for flowering time using Bayesian multilocus estimation in a barley MAGIC population. Genetics 208(2):525-536, 2018.

Mathew B., J. Leon, and M. J. Sillanpää. A novel linkage-disequilibrium corrected genomic relationship matrix for SNP-heritability estimation and genomic prediction. Heredity (accepted).

Doctoral Theses 2017

Tiina Mattila: Post-glacial colonization, demographic history, and selection in Arabidopsis lyrata: genome-wide and candidate gene based approach.

Research Group Members

Project Leaders:
Outi Savolainen, Ph.D., Professor (University of Oulu)
Mikko J. Sillanpää, Ph.D., Professor (University of Oulu)
Tanja Pyhäjärvi, Ph.D., Academy Research Fellow (University of Oulu)

Senior and Post-doctoral Investigators:
Helmi Kuittinen, Ph.D. (University of Oulu)
Sonja Kujala, Ph.D. (University of Oulu, Biocenter Oulu, until Aug 31st, 2017)
Nader Aryamanesh, Ph.D.  (University of Oulu)
Isidro D. Ojeda Alayon (University of Oulu, Academy of Finland)

Ph.D. Students:
Timo Knürr, M.Sc. (MTT Agrifood Research Finland)
Pinja Pikkuhookana, M.Sc. (University of Oulu)
Markku Kuismin M.Sc. (University of Oulu)
Tiina Mattila, M.Sc. (University of Oulu, Emil Aaltonen Foundation)
Jaakko Tyrmi, M.Sc. (EU-project ProCoGen, Academy of Finland)
Tuomas Hämälä, M.Sc. (Biocenter Oulu)

Laboratory Technicians:
Soile Alatalo (University of Oulu)

Main source of salary in brackets.

Foreign Scientists, 2

National and International Activities

Group Members Who Spent More Than Two Weeks in Foreign Laboratories During 2017

Tiina Mattila  (University of Stockholm)

EU Projects (present and progress)

GenTree, Partner, Task Leader (TP), 2016-2020

Last updated: 6.7.2018