CROSS-STUDY of Infant Gut Microbiome

Project information

Project duration


Project coordinator

University of Oulu

Project description

The objective of COMBO-project is to develop and validate novel ways to analyze bacterial 16S data from next-generation sequencing of pediatric and neonatal fecal. By using machine learning, we aim to find patterns that can be used to predict the impact of delivery mode in cross-study setting and subsequent health of children in several cohort studies. Previous studies have shown that machine learning models can generalize to sets of samples across studies. Based on previous studies, we expect to be able to predict the delivery mode based on 16S generated data and with reduced generalization performance to cross-study samples. Analyzing and recognizing these patterns may find novel bacteria not previously related to the conditions or phenotypes.

Microbiome studies are based on bacterial 16S and metagenomic data, which is high dimensional, hard to interpret and often not directly relatable to cross-study data with conventional methods. In recent years, machine learning (ML) methods have gained prominence in the field of microbiology and medicine. Machine learning can be used to find patterns from high dimensional data to predict target variables from unseen samples, such as the disease state of the patient. In the field of pediatrics, predicting the subsequent health of a newborn is of high interest to develop effective prevention strategies by understanding the pathogenesis of the disease. Optimal machine learning methods to analyze microbiome data are still unclear and overall seldomly used. Finding these effective methods for applying machine learning to microbiome problems is of key interest to the field. Finding these effective methods for applying machine learning to microbiome problems is of key interest to the field.

Societal effects and impact

Our ultimate goal is to help in developing preventive methods for important clinical diseases using machine learning data. For example, obesity and asthma have severe economic implications for society. Building accurate models that can be used to predict these states can reveal previously unknown mechanics and connections related to the given state. Impact of delivery mode on the microbiome is well studied, but direct validation on outside samples is lacking in the field.

COMBO -project is the largest study of its kind with a diverse set of populations. This project could reveal novel methods to utilize the increasing amount of microbiome data. Our study presents a method to validate microbiome findings on global samples. This project could reveal novel methods to study varying not only clinical problems, but those using any kind of microbiome data.

Project key personnel

MSc, Petri Vänni, recent graduate from Oulu University. He earned his degree in Genetics with a minor in information sciences. He was awarded the highest grade for his master’s thesis with the subject: Predictive modeling of K12 probiotics impact on children’s oral microbiome: a machine learning approach. Co-founder of Genobiomics Ltd, a bioinformatics company specialized in microbiome research.

Docent Terhi Tapiainen, supervisor of the thesis, principal investigator (PI), Associate Professor of Pediatrics, has more than 20 years of experience in clinical and translational research in pediatrics with more than 70 original peer-reviewed publications. Docent Tapiainen has received Academy of Finland Clinical researcher funding for the microbiome project in 2019-2022, and the grant from Pediatric Research Foundation, 400 000 € in 2019-2021.

Docent Tejesvi Mysore, co-supervisor of the thesis co-PI, has more than 15 years of experience in the field of molecular microbiology. Docent Tejesvi has published several microbiome articles in the field of pediatric microbiome research