Deciphering functional effects of genetic variants on cancer predisposition and progression using an informatics approach

Thesis event information

Date and time of the thesis defence

Place of the thesis defence

F101 (Aapistie 7A)

Topic of the dissertation

Deciphering functional effects of genetic variants on cancer predisposition and progression using an informatics approach

Doctoral candidate

MSc Qin Zhang

Faculty and unit

University of Oulu Graduate School, Faculty of Biochemistry and Molecular Medicine, Disease Networks

Subject of study

Biochemistry and Molecular Medicine


Professor Matti Nykter, Tampere University


Professor Gonghong Wei, University of Oulu

Visit thesis event

Add event to calendar

Deciphering functional effects of genetic variants on cancer predisposition and progression using an informatics approach

Cancer is a genetic disease characterized by abnormal and unlimited cell division that eventually invades adjacent tissues and metastasizes to other tissues and organs. Genome-wide association studies (GWASs) have discovered 900 risk-associated single nucleotide polymorphism (SNP) loci across various cancer types in over 600 studies published since 2005, which provide compelling genetic clues to cancer predisposition. Growing evidence has indicated that most of these GWAS loci harbor genetic variants within gene cis-regulatory elements (CREs). These sequence variations often influence target gene expression, thereby conferring cancer susceptibility and contributing to tumor progression. Among all cancer types, prostate cancer (PCa) is the second most frequently diagnosed malignancy in men and ranks fifth of the top leading causes of cancer deaths in males worldwide. Genetic factors play fundamental roles in PCa susceptibility, accounting for the 57% heritability of familial PCa risk. To date, GWASs have identified more than 300 low-penetrance loci harboring >1000 PCa risk- and/or -aggressiveness associated SNPs. However, we have only a limited understanding of the underlying molecular and biological mechanisms of these GWAS-reported SNP loci and how they contribute to cancer predisposition and progression. Owing to the complexity of the human genome, the vast majority of candidate risk SNPs residing in non-protein-coding genomic regions are often featured as gene regulatory elements, which makes it challenging to dissect the underlying biological mechanisms. Given that the sequence-specific DNA binding proteins, namely transcription factors (TFs), usually recognize a typical 6- to 10-nucleotide consensus site, we hypothesized that the gene regulatory element-containing SNPs possess potential roles in altering the chromatin binding of key TFs, which in turn lead to aberrant expression of cancer-associated genes and thus confer cancer susceptibility.

The aim of this study was to prioritize potential functional variants conferring cancer risk from abundant cancer GWAS associations, thereby facilitating functional study, and producing validation prerequisite for investigating the regulatory mechanisms of these prioritized risk SNPs, ultimately providing new insights into clinical translation and application. We thus developed a deep and high-throughput bioinformatic pipeline to screen potential functional variants and characterize gene regulatory mechanisms underpinning cancer susceptibility. Our workflow integrates abundant cistrome and epigenome data profiled by chromatin immunoprecipitation sequencing (ChIP-Seq) assays, genotype, and gene expression data from different sources of human organs or tissues, and gene expression profiling of normal tissues and tumor specimens across various cancer types. At the gene level, this analysis revealed a set of pan-cancer expression quantitative trait loci (eQTL) genes significantly enriched in protein-protein interaction (PPI) networks and certain immune-relevant pathways. At the SNP level, the characterized pan-cancer SNPs are highly enriched in active enhancer or super-enhancer (SE) regions. The pipeline also prioritized dozens of potential causal SNPs that may modulate TF chromatin binding and hence affect target gene expression. With additional bioinformatic inputs, we functionally validated five predicted regulatory loci and target genes associated with PCa risk: rs3217869/CCND2 at 12p13, rs2853669/TERT at 5p15, multiple causal SNPs at the 17q12/HNF1B locus, additional functional SNPs at the 17p13.3/VPS53/FAM57A/GEMIN4 locus, and rs339331/RFX6 at 6q22. We found that eQTLs, including rs3217869, rs2853669, SNPs at the 17q12/HNF1B locus, and rs339331/6q22 confer PCa susceptibility through influencing target gene expression in an allele-specific manner. Risk SNP rs3217869 within CCND2 in the RTK/ERK pathway was found to be significantly associated with PCa aggressiveness. The risk allele A of rs3217869 is associated with decreased expression of CCND2, and CCND2 downregulation correlates with PCa progression to an advanced stage and metastasis. We also interpreted the mechanisms underlying rs2853669, a SNP residing within the TERT promoter region. Variations at rs2853669 confer PCa risk in different ethnic groups evidenced by multi-cohort large association studies. Mechanistically, rs2853669 displayed an allele-specific chromatin binding to E2F1 and MYC under different androgen levels, thereby promoting PCa progression. TERT was activated via enhanced E2F1 binding to the C allele at rs2853669 under hormone deprivation, while androgen stimulation promoted MYC binding to the T allele at rs2853669, thereby upregulating TERT expression and possibly leading to PCa severity.

The 17q12/HNF1B cancer risk locus has been reproducibly reported by GWASs in multiple cancer types. Mechanistically, we unraveled an extensive germline-somatic interplay at this locus, which may contribute to PCa progression through the HNF1B co-option of TMPRSS2-ERG (T2E). Comprehensive bioinformatic and functional experiments confirmed that multiple 17q12 causal SNPs residing in close proximity to HNF1B alter HNF1B expression in a T2E fusion-dependent manner. Moreover, we found another PCa risk-associated locus, 17p13.3/VPS53/FAM57A/GEMIN4, confers PCa susceptibility, mechanistically involved in a cooperative gene regulatory controlled by HNF1B and ERG. Finally, our functional SNP prioritization pipeline revealed an additional layer of gene regulatory mechanism underlying the rs339331/6q22 locus via the pioneer TF GATA2. In this study, our systematic bioinformatic analysis indicated that GATA2 copy number amplification and upregulation are associated with PCa progression and poor prognosis. PPI and experimental analysis identified SMAD4, a key mediator of the TGFβ signaling pathway, as a bona fide interaction partner of GATA2, which are both essential for PCa cell growth and survival. We demonstrated that the T risk allele of rs339331 enhances chromatin binding of GATA2 and SMAD4, leading to increased expression of RFX6 that is associated with PCa cell growth and tumor progression.

In conclusion, we built up a computational pipeline to systematically prioritize potential functional SNPs through incorporating high-throughput sequencing data and pan-cancer GWAS associations. This integrated analysis helps elucidate how genomic variation influences cancer predisposition via different gene regulatory mechanisms and identifies relevant target genes, providing useful clues for further functional validation. We comprehensively interpreted the molecular mechanisms and biological impacts underlying five PCa risk-associated loci implicating PCa predisposition and progression, which may shed light on the identification of biomarkers and therapeutic targets and benefit clinical care to improve personalized medicine for PCa diagnosis and treatment.
Last updated: 1.3.2023