Regulatory Genomics and Functional Cancer Genetics

Project Leader
Gonghong Wei, Ph.D., Docent, Academy Research Fellow

Biocenter Oulu and Faculty of Biochemistry and Molecular Medicine, University of Oulu

Background and Significance

The transcription factors (TFs) in a given genome can be classified into distinct families by structurally conserved DNA-binding domains (DBDs), often with similar DNA recognition properties. However, the members of the family always display distinct functions and activities in various biological processes such as cancer and normal development. The likelihood and nature are different in the network hubs of gene transcriptional regulation, which is often composed of intertwined regulatory relationships between TF protein complexes and chromatinized gene regulatory elements, including promoters, insulators and enhancers (Wei et al. J Biochem 381:1-12, 2004 and Cell Res 15:292:292-300, 2005). Therefore, it is of crucial importance to identify the genome-wide chromatin locations of a given TF with biological significance. Meanwhile, we hypothesized that some cancer risk-associated single nucleotide polymorphisms (SNPs) identified by genome-wide association studies (GWASs) can disrupt TF–DNA binding at key enhancers and initiate aberrant expression of SNP-linked susceptibility genes. And this, in turn, may result in perturbation of TF regulatory networks that cause cancer initiation and progression.

To address these questions, we carried out genome-wide chromatin location analyses of several driver TFs such as TMPRSS2-ERG, HOXB13, FOXA1 and androgen receptor (AR) (Figure 1), which are often overexpressed and constitutively activated in many clinical prostate cancer tissues. In combination with global ChIP-seq studies of enhancer chromatin marks we have identified thousands of prostate cancer cell-type-specific enhancers. We are using integrative and complementary genome-wide approaches including ChIP-seq, RNA-seq, FAIRE-seq and 3C-derived methods in combination with novel and classic molecular and biochemical assays, and also up-to-date computational and statistical methods. Recently, we found that prostate cancer-associated key TFs including HOXB13, FOXA1 and ERG extensively cooperate with AR signaling to regulate target genes that are implicated in prostate cancer cell growth and tumor progression (data not shown). In addition, we found that our integrative genomic data seems to be working well in functional interpretation and mechanistic understanding of GWAS-discovered genetic variants in prostate cancer, including the SNP rs339331, through enhancing HOXB13 chromatin binding to drive up-regulation of the transcription factor gene RFX6, which confers a risk of prostate cancer (Huang et al. Nat Genet 46:126-35, 2014).

Figure 1. Genome-wide mapping of TF binding sites and enhancers in prostate cancer cells. Human chromosomes are shown around the outer ring. Other tracks contain ChIP-seq data of the TFs and enhancer marks as indicated.

Recent Progress

To map prostate cancer gene regulatory networks driven by key TFs, for the first time, we profiled genome-wide chromatin locations of HOXB13, which is known to be important as regards prostate development and tumor progression. We observed that prostate cancer GWAS SNPs are significantly enriched at HOXB13 binding sites (Huang et al. 2014). Interestingly, the common genetic variant rs339331 at the prostate cancer susceptibility 6q22 locus lies within a functional HOXB13 binding site, and is precisely located at a HOXB13 ChIP-seq peak summit. In a Japanese GWAS, the identification of rs339331 in RFX6 was reported as the SNP most associated with prostate cancer risk (P = 1.6 × 10-12; rs339331 T as strongest risk allele). Interestingly, the significant association between rs339331 and susceptibility to prostate cancer was further observed in African Americans, men of European ancestry and the Chinese population, suggesting that rs339331 is a potential genetic marker to evaluate prostate cancer risk across different ethnic groups. We provided several lines of evidence to show that HOXB13 and AR favor binding to the risk T allele at rs339331 in vivo. Linkage disequilibrium (LD) analysis based on the 1000 Genome Project indicates that GPRC6A and RFX6 are associated with rs339331 in a strong LD block (Huang et al. 2014). We provided further evidence to show that RFX6 is a plausible causative gene linked to rs339331 conferring a risk of prostate cancer. Consistently, eQTL analysis revealed that the rs339331 T risk allele was significantly correlated with higher RFX6 mRNA levels in a Swedish cohort of prostate cancer samples. Together with our recent finding that rs339331 is a DNA-binding motif disruptor for AR/HOXB13 heterodimer, we thus proposed an extended model in which enhanced chromatin binding of HOXB13 and AR signaling to the T risk allele at rs339331 results in increased RFX6 expression, conferring predisposition to prostate cancer (unpublished). We are currently working on the genes and pathways that are regulated by the TF RFX6, and we are also investigating the functional roles of RFX6 in other types of cancers (data not shown).  

We have also been working towards systems annotation of all prostate cancer risk-associated loci using integrative data sets of functional genomics, bioinformatics, statistics and high-throughput eQTL analysis with prostate cancer tissue samples (Whitington & Gao, 2016). In this study, we presented the first, deep and high-throughput characterization of gene regulatory mechanisms underlying prostate cancer risk loci. Our methodology integrates regulatory genomic data from over 300 prostate cancer ChIP-seq experiments (TFs and histones; Figure 2) with genotype and gene expression data from over 600 prostate cancer tissue samples. The analysis reveals new gene regulatory mechanisms affected by risk locus SNPs, including widespread disruption of ternary AR/FOXA1, AR/HOXB13 and FOXA1/HOXB13 complexes and competitive binding mechanisms. We identify 57 expression quantitative trait loci at 35 risk loci, and validation the eQTL analysis through analysis of allele-specific expression (Figure 2). We further validate predicted regulatory SNPs and target genes in prostate cancer cell line models. We finally made our integrated analysis to be accessed through an interactive visualization tool ( This knowledge resource reveals how genome sequence variation affects disease predisposition via gene regulatory mechanisms, and identifies relevant genes for downstream biomarker and drug development.

Meanwhile, my lab is also actively involved in several fruitful collaborations with other scientists from the University of Oulu, identifying a novel functional genetic variant that predisposes people to primary hip and knee osteoarthritis (Taipale et al., 2016), and gene differential expression analysis and defined integrin-mediated signaling that impact on papillary renal cell carcinoma subtypes (Zhang et al., 2016). In addition, we have been working greatly with several international institutions, including the Medical College of Wisconsin, USA, for a focused functional study of up to 10 prostate cancer risk loci (Du et al., 2016); Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China, for investigating how the NAD+-dependent class III protein deacetylase SIRT1 functions in the deacetylation of the cardiac TF Nkx2.5, which in turn affects the transcriptional activity of Nkx2.5 (Tang et al., 2016); and Institute of Biophysics, Chinese Academy of Sciences, Beijing, China, regarding structural and functional study of TF STAT6, which is constitutively activated in multiple types of cancer (Li et al., 2016). We made great contributions to interpret the disease-associated STAT6 mutations with respect to the STAT6 protein-DNA interaction interface observed in our crystal structures. Using cancer genome sequencing data, we found 52 of 165 unique mutations that can be mapped on STA6 structures. Interestingly, the majority of these mappable mutations are within the STAT6 DBD, suggesting that some mutations may alter STAT6 target gene recognition and involve in cancer progression (Li et al., 2016). We are going to set up further collaborative projects to explore the mechanisms underlying the roles of STAT6 mutations in tumorigenesis and to see if some STAT6 mutants reprogram their gene regulatory networks in vivo in different types of cancer.

Figure 2. Identification of regulatory SNPs and eQTL genes at prostate cancer risk loci. (a) Cis-regulatory module (CRM) annotation, eQTL analysis and combined visualization. Our analysis investigates eQTL associations and CRM annotations at the LD proxy SNPs corresponding to prostate cancer risk GWAS SNPs, which are obtained using 1000 genomes data. We aggregate functional genomics data to generate a CRM annotation resource, with a focus on prostate cancer ChIP-seq and TF PWM data. We use this resource to GWAS lead and annotate LD proxy SNPs. Our eQTL analysis makes use of prostate tumor gene expression and corresponding normal tissue genotype data, from TCGA and in-house cohorts, and focuses on imputed genotypes at LD proxies, determined using 1000 genomes. Our interactive analysis tool takes the SNP annotation and eQTL association results as input together with the CRM annotation resource, enabling a combined prioritization of functional variants for integrated experimental validation. (b) Venn diagram indicating the number of lead SNPs accounted for by DNase, eQTL, or paired ChIP-seq and PWM evidence. Red values in brackets indicate number of loci in the given subset corresponding to predicted AR/FOXA1 or AR/HOXB13 heterodimers. (c) An example of rs11351679/11q13.3 disrupts an occupied AR/FOXA1 heterodimer binding site. Feature visualization tracks are shown (upper panel) for a 1kb region surrounding the SNP, including PhastCons (dark green), LNCaP (light blue), and ChIP-seq signal intensity tracks for selected TF (pink) and histone modification (purple) experiments. ChIP-seq experiments matching disrupted PWMs are indicated by green boxes. DNase and ChIP-seq signal tracks are scaled from 0 to 50 and are truncated at higher values. PhastCons is scaled from 0 to 1. The reference and alterative allele genomic sequences for the SNP are shown (lower panel), with the matching PWM. The AR/FOXA1 heterodimer PWM exhibiting the sequence match comprises AR and FOXA1 monomer binding sites separated by 5bp. (d) Allele specific luciferase validation of putative regulatory SNP rs11351679.

Future Goals

The mechanisms by which the aberrant expression of TFs contribute to cancer development is generally not understood, even for the well-studied transcriptional regulators, such as AR, several ETS factors and HOX family members. GWASs have identified thousands of SNPs associated with predisposition to various diseases including cancer. However, the molecular mechanisms underlying the causal actions and biological effects of these SNPs remain poorly understood. We will continue to address these questions and aim to see how these SNPs affect TF binding to key enhancers, which in turn alter the SNP-associated gene expression that confers cancer susceptibility and progression. We will carry on systematic analysis of gene regulatory networks downstream of the TFs using classical molecular and biochemical methods, as well as state-of-the-art functional genomics and systems biology approaches – the combined strength of genomics, genetics and bioinformatics.

Selected Publications

  1. Tang X, Ma H, Han L, Zheng W, Lu YB, Chen XF, Liang ST, Wei GH, Zhang ZQ, Chen HZ, Liu DP. SIRT1 deacetylates the cardiac transcription factor Nkx2.5 and inhibits its transcriptional activity. Sci Rep. 6:36576, 2016.
  2. Li J, Rodriguez JP, Niu F, Pu M, Wang J, Hung LW, Shao Q, Zhu Y, Ding W, Liu Y, Da Y, Yao Z, Yang J, Zhao Y, Wei GH, Cheng G, Liu ZJ, Ouyang S. Structural basis for DNA recognition by STAT6. Proc Natl Acad Sci U S A. 113(46):13015-13020, 2016.
  3. Zhang K, Lee HM, Wei GH, Manninen A. Meta-analysis of gene expression and integrin-associated signaling pathways in papillary renal cell carcinoma subtypes. Oncotarget. 7(51):84178-84189, 2016.
  4. Whitington T, Gao P, Song W, Ross-Adams H, Lamb AD, Yang Y, Svezia I, Klevebring D, Mills IG, Karlsson R, Halim S, Dunning MJ, Egevad L, Warren AY, Neal DE, Grönberg H, Lindberg J, Wei GH, Wiklund F. Gene regulatory mechanisms underpinning prostate cancer susceptibility. Nature Genetics. 48:387-397, 2016
    Highlighted in: Prostate Cancer Risk Loci Are Associated with Gene Regulatory Mechanisms. Cancer Discovery 6:OF13, 2016
  5. Du M, Tillmans L, Gao J, Gao P, Yuan T, Dittmar RL, Song W, Yang Y, Sahr N, Wang T, Wei GH, Thibodeau SN, Wang L. Chromatin interactions and candidate genes at ten prostate cancer risk loci. Scientific Reports. 6:23202, 2016
  6. Taipale M, Jakkula E, Kämäräinen OP, Gao P, Skarp S, Barral S, Kiviranta I, Kröger H, Ott J, Wei GH, Ala-Kokko L, Männikkö M. Targeted re-sequencing of linkage region on 2q21 identifies a novel functional variant for hip and knee osteoarthritis. Osteoarthritis Cartilage. pii: S1063-4584(15)01390-4, 2015
  7. Heinonen H, Lepikhova T, Sahu B, Pehkonen H, Pihlajamaa P, Louhimo R, Gao P, Wei GH, Hautaniemi S, Jänne OA, Monni O. Identification of several potential chromatin binding sites of HOXB7 and its downstream target genes in breast cancer. Int J Cancer. 137:2374-2383, 2015
  8. Chen H, Yu H, Wang J, Zhang Z, Gao Z, Chen Z, Lu Y, Liu W, Jiang D, Zheng SL, Wei GH, Issacs WB, Feng J, Xu J. Systematic enrichment analysis of potentially functional regions for 103 prostate cancer risk-associated loci. Prostate. 75:1264-1276, 2015
  9. Munne PM, Gu Y, Tumiati M, Gao P, Koopal S, Uusivirta S, Sawicki J,Wei GH, Kuznetsov SG. TP53 supports basal-like differentiation of mammary epithelial cells by preventing translocation of deltaNp63 into nucleoli. Sci Rep. 4:4663, 2014
  10. Huang Q, Whitington T, Gao P, Lindberg JF, Yang Y, Sun J, Väisänen MR, Szulkin R, Annala M, Yan J, Egevad LA, Zhang K, Lin R, Jolma A, Nykter M, Manninen A, Wiklund F, Vaarala MH, Visakorpi T, Xu J, Taipale J, Wei GH. A prostate cancer susceptibility allele at 6q22 increases RFX6 expression by modulating HOXB13 chromatin binding. Nature Genetics. 46:126-135, 2014

Highlighted in:
** HOXB13, RFX6 and prostate cancer risk. Nature Genetics. 46:94-95, 2014
** Prostate cancer: HOXB13 and a SNP collaborate to increase riskNature Reviews Urology. 11:64, 2014
** A Prostate Cancer–Associated SNP Increases HOXB13 Binding. Cancer Discovery  4:268, 2014
** Recommended by Faculty of 1000: ★★ Very Good, good for teaching, new finding. In F1000Prime, 27 Jan 2014; DOI: 10.3410/f.718228195.793490008
** New evidence highlights the mechanism by which a single nucleotide polymorphism enhances prostate cancer progression. Oncology Central  Jan 17 2014
** New insight into Prostate Cancer susceptibility. CancerIndex Feb 1 2014
** Uusi geneettinen säätelymekanismi eturauhassyöpäriskin taustalla.Duodecim Feb 4 2014

In the news:
** Mechanism affecting risk of prostate cancer found. Science Daily  Jan 10 2014
** Eturauhasyövälle altistava geenimuutos tunnistettu. KALEVA Jan 10 2014
** Eturauhassyöpään johtava geenimuutos löytyi Oulussa. HELSINGIN SANOMAT Jan 10 2014
** Eturauhassyövän riskitekijä löytyiHELSINGIN SANOMAT  Jan 11 2014

  1. Jolma A, Yan J, Whitington T, Toivonen J, Nitta KR, Rastas P, Morgunova E, Enge M, Taipale M, Wei G, Palin K, Vaquerizas JM, Vincentelli R, Luscombe NM, Hughes TR, Lemaire P, Ukkonen E, Kivioja T, and Taipale J. DNA Binding specificities of human transcription factors. Cell. 152:327-339, 2013
  2. Wei GH, Badis G, Berger MF, Kivioja T, Palin K, Enge M, Bonke M, Jolma A, Varjosalo M, Gehrke AR, Yan J, Talukder S, Turunen M, Taipale M, Stunnenberg HG, Ukkonen E, Hughes TR, Bulyk ML, Taipale J. Genome-Wide Analysis of ETS Family DNA-Binding in vitro and in vivo. EMBO J. 29:2147-2160, 2010
  3. Jolma A, Kivioja T, Toivonen J, Cheng L, Wei G, Enge M, Taipale M, Vaquerizas JM, Yan J, Sillanpää MJ, Bonke M, Palin K, Talukder S, Hughes TR, Luscombe NM, Ukkonen E, Taipale J. Multiplexed massively parallel SELEX for characterization of human transcription factor binding specificities. Genome Research. 20:861-873, 2010
  4. Tuupanen S, Turunen M, Lehtonen R, Hallikas O, Vanharanta S, Kivioja T, Björklund M, Wei G, Yan J, Niittymäki I, Mecklin JP, Järvinen H, Ristimäki A, Di-Bernardo M, East P, Carvajal-Carmona L, Houlston RS, Tomlinson I, Palin K, Ukkonen E, Karhu A, Taipale J, Aaltonen LAThe common colorectal cancer predisposition SNP rs6983267 at chromosome 8q24 confers potential to enhanced Wnt signaling. Nature Genetics. 41:885-890, 2009
  5. Wei GH, Zhao GW, Song W, Hao DL, Lv X, Liu DP, Liang CC. Mechanisms of human gamma-globin transcriptional induction by apicidin involves p38 signaling to chromatin. Biochemical and Biophysical Research Communications.363:889-894, 2007
  6. Wei GH, Liu DP, Liang CC. Chromatin domain boundaries: insulators and beyond. Cell Research. 15:292-300, 2005
  7. Wei GH, Liu DP, Liang CC. Charting gene regulatory networks: strategies, challenges and perspectives. Biochemical Journal. 381:1-12, 2004

Doctoral Theses 2016

Ping Gao: Systems and mechanistic understanding of genetic predisposition to prostate cancer. FBMM, 2016 (graded as Pass with distinction, received the Best Doctoral Thesis 2016 Award)

Research Group Members

Project Leader:
Gonghong Wei, Ph.D. (Academy of Finland)

Senior and Post-doctoral Investigators:
Hang-Mao Lee, Ph.D. (Foundation and University of Oulu)
Ping Gao, Ph.D. (University of Oulu)

Ph.D. Students:
Qin Zhang, M.Sc. (Academy of Finland and Foundation)
Nikolaos Giannareas (University of Oulu)
Jihan Xia (University of Oulu)
Sufyan Suleman, B.Sc. (Academy of Finland and Foundation)

Project Researcher:
Yuehong Yang, M.Sc. (Academy of Finland)

Main source of salary in brackets.

Foreign Scientists, 8

Last updated: 24/4/2017